How did the Free IF Playoffs affect the IFDB Top 100?

I’m curious if the voters in the Free IF Playoffs tended to add new ratings to IFDB as they played games they hadn’t played before, and if the rankings in the IFDB Top 100 changed significantly from the beginning to the end of the playoffs.

(Somewhere in the many playoffs threads, I think there was a reference to the IFDB Top 100 changing, but I don’t know where, now.)

4 Likes

Here’s the initial seed vs the current relative positions:

Initial Seed Current Position Difference
Counterfeit Monkey 1 1 0
Anchorhead 2 2 0
Superluminal Vagrant Twin 3 3 0
Worldsmith 4 4 0
The Wizard Sniffer 5 5 0
Toby’s Nose 9 6 3
Eat Me 8 7 1
Will Not Let Me Go 7 8 -1
Cragne Manor 6 9 -3
Lost Pig 11 10 1
Alias ‘The Magpie’ 16 11 5
Worlds Apart 14 12 2
Savoir-Faire 12 13 -1
Junior Arithmancer 15 14 1
A Beauty Cold and Austere 10 15 -5
Coloratura 17 16 1
The Impossible Bottle 13 17 -4
The Gostak 19 18 1
Stay? 18 19 -1
Repeat the Ending 28 20 8
Blue Lacuna 22 21 1
Treasures of a Slaver’s Kingdom 23 22 1
A Long Way to the Nearest Star 20 23 -3
Violet 24 24 0
Spider and Web 27 25 2
Known Unknowns 21 26 -5
Dr Ludwig and the Devil 30 27 3
Cannery Vale 32 28 4
Spy Intrigue 25 29 -4
Endless, Nameless 31 30 1
Birdland 29 31 -2
Make It Good 33 32 1
Absence of Law 26 33 -7
The Mulldoon Legacy 35 34 1
City of Secrets 36 35 1
Midnight. Swordfight. 34 36 -2
Bronze 37 37 0
According to Cain 39 38 1
Turandot 42 39 3
Foo Foo 46 40 6
Zozzled 38 41 -3
And Then You Come to a House Not Unlike the Previous One 40 42 -2
4x4 Archipelago 44 43 1
Sub Rosa 48 44 4
Photopia 49 45 4
Weird City Interloper 41 46 -5
Chlorophyll 50 47 3
Of Their Shadows Deep 55 48 7
The Weight of a Soul 59 49 10
Excalibur 54 50 4
Slouching Toward Bedlam 52 51 1
Harmonia 47 52 -5
The Impossible Stairs 53 53 0
The Axolotl Project 45 54 -9
The Spectators 60 55 5
Suveh Nux 62 56 6
The Lurking Horror II: The Lurkening 58 57 1
Cryptozookeeper 43 58 -15
Inside the Facility 51 59 -8
With Those We Love Alive 56 60 -4
Digital: A Love Story 64 61 3
The Shadow in the Cathedral 63 62 1
Beautiful Dreamer 61 63 -2
Magical Makeover 57 63 -6

Weight of the Soul climbed the most, while Cryptozookeeper suffered the biggest relative drop. Worth noting that such change can happen just from a single rating.

Below the drop, I’ve got their absolute positions. If the playoffs were run again, four games would swap in and out respectivelty:

Absolute Rankings
Initial Seed Current Position Difference
Counterfeit Monkey 1 1 0
Anchorhead 2 2 0
Superluminal Vagrant Twin 3 3 0
Worldsmith 4 4 0
The Wizard Sniffer 5 5 0
Toby’s Nose 9 6 3
Eat Me 8 7 1
Will Not Let Me Go 7 8 -1
Cragne Manor 6 9 -3
Lost Pig 11 10 1
Alias ‘The Magpie’ 16 11 5
Worlds Apart 14 12 2
Savoir-Faire 12 13 -1
Junior Arithmancer 15 14 1
A Beauty Cold and Austere 10 15 -5
Coloratura 17 16 1
The Impossible Bottle 13 17 -4
The Gostak 19 18 1
Stay? 18 19 -1
Repeat the Ending 28 20 8
Blue Lacuna 22 21 1
Treasures of a Slaver’s Kingdom 23 22 1
A Long Way to the Nearest Star 20 23 -3
Violet 24 24 0
Spider and Web 27 25 2
Known Unknowns 21 26 -5
Dr Ludwig and the Devil 30 27 3
Cannery Vale 32 28 4
Spy Intrigue 25 29 -4
Endless, Nameless 31 30 1
Birdland 29 31 -2
Make It Good 33 32 1
Absence of Law 26 33 -7
The Mulldoon Legacy 35 34 1
City of Secrets 36 35 1
Midnight. Swordfight. 34 36 -2
Bronze 37 37 0
According to Cain 39 38 1
Turandot 42 39 3
Foo Foo 46 40 6
Zozzled 38 41 -3
And Then You Come to a House Not Unlike the Previous One 40 42 -2
4x4 Archipelago 44 43 1
Sub Rosa 48 44 4
Photopia 49 45 4
Weird City Interloper 41 46 -5
Chlorophyll 50 47 3
Of Their Shadows Deep 55 48 7
The Weight of a Soul 59 49 10
Excalibur 54 50 4
Slouching Toward Bedlam 52 51 1
Harmonia 47 52 -5
The Impossible Stairs 53 53 0
The Axolotl Project 45 54 -9
The Spectators 60 55 5
Suveh Nux 62 56 6
The Lurking Horror II: The Lurkening 58 57 1
Cryptozookeeper 43 58 -15
What Heart Heard Of, Ghost Guessed 59 NEW
Fairest 60 NEW
All Thing Devours 61 NEW
Inside the Facility 51 62 -11
Dr Horror’s House of Terror 63 NEW
With Those We Love Alive 56 64 -8
Digital: A Love Story 64 65 -1
Grimnoir 66
A Murder in Fairyland 67
Mentula Macanus: Apocolocyntosis 68
Bogeyman 69
Babel 70
The Shadow in the Cathedral 63 71 -8
Tangorea Deep 72
Oppositely Opal 73
Erstwhile 74
Beautiful Dreamer 61 75 -14
First Things First 76
Illuminismo Iniziato 77
The Missing Ring 78
Swan Hill 79
Magical Makeover 57 80 -23
15 Likes

Just a note: The seed rankings given for each game reflected their Top 100 ranking order within the set of qualifying games. Because non-free games on the Top 100 list were disqualified, many seed rankings are several places higher than their Top 100 ranking before the start of the tournament.

There were relatively few new ratings added on IFDB. As Joey notes, the Top 100 ranking can be very sensitive to new ratings – most especially for games with relatively few ratings (i.e. under 30). This is for two reasons:

  1. The all-IFDB average mean rating is very high (reported at 3.92 in this week’s ranking). This value is used as a “buffer” in all weighted average calculations used for determining rankings; each game is given 13 ratings at that value in addition to its actual rankings.

  2. The difference between the weighted averages of games is quite small, less than .01 after the top 12.

What this means in practice is that:

  • New games can’t appear at all until they get at least 13 votes.
  • At that point, their net average of actual ratings is averaged with the buffer value. A new game X with thirteen 5-star ratings would end up with a weighted average of 4.46, which would place it at #14 (just below Cragne Manor) in this week’s ranking.

Any subsequent rating below 4-star will cause a rapid drop. If game X got 3 stars as its 27th rating, its new weighted average would be 4.40, which would drop it to around #26 (down 13 places) on this week’s ranking. If the 27th rating was 2-star, then it would instead have a weighted average of ~4.37 and drop to somewhere around #34 (down 20 places).

For this sample game, even a 4-star will cause a significant drop, because it is comparable to adding to the buffer votes. If game X got 4 stars as its 27th rating, its new weighted average would be ~4.44, which would drop it to around #18 (down 4 places) on this week’s rankings.

Adding another 5-star rating won’t move it much in the overall rankings. If game X’s 27th rating was 5-star, the weighted average would only rise to 4.48, putting it at #12 (up two places).

For games that have a lot of ratings, the buffer votes are diluted out to the point where they don’t matter much, but that’s even more true for any individual new ratings. It would take a significant influx of new low ratings to move their weighted average.

Counterfeit Monkey and Anchorhead are pretty secure in their current positions. The shortest path for reaching #2 would be for a new game to get 5-stars for its first 23 votes. It would take 46 unbroken 5-star ratings to reach #1.

As mentioned above, as far as I can tell there were not many new ratings added to IFDB as a direct result of the tournament. Mid-game poll results suggest that most people didn’t play or try that many new games in the tournament. My hypothesis is that most people playing had already entered their ratings for most games on which they voted, or refrained from doing so if they had only tried the game and not completed it.

4 Likes

I was actively tracking movement on the Top 100 throughout the tournament. Following are games that shifted at least 3 places on the Top 100 (anything less being hard to separate from statistical noise) between the start of the tournament and now:

SEED	CONTESTANT						NET MOVE
#59		The Weight of a Soul			+11
#28		Repeat the Ending				+10
#55		Of Their Shadows Deep			+8
#16		Alias the Magpie				+6
#62		Suveh Nux						+6
#32		Cannery Vale					+5
#46		Foo Foo							+5
#60		The Spectators					+5
#30		Dr Ludwig and the Devil			+4
#48		Sub Rosa						+4
#49		Photopia						+4
#54		Excalibur						+4
#50		Chlorophyll						+3
#9		Toby’s Nose						+3

#6		Cragne Manor					-4
#13		The Impossible Bottle			-5
#20		A Long Way to the Nearest Star	-5
#40		And Then You Come to a House...	-5
#10		A Beauty Cold and Austere		-6
#25		Spy Intrigue					-6
#38		Zozzled							-6
#47		Harmonia						-6
#21		Known Unknowns					-7
#41		Weird City Interloper			-7
#56		With Those We Love Alive		-8
#63		The Shadow in the Cathedral		-8
#26		Absence of Law					-9
#45		The Axolotl Project				-11
#51		Inside the Facility				-12
#61		Beautiful Dreamer				-15
#43		Cryptozookeeper					-17
#57		Magical Makeover				-25

Interestingly, Excalibur did not even appear on the June 10 2024 edition of the Top 100, but it subsequently returned. My guess is that one or more poor ratings were deleted by IFDB admins, though @mathbrush indicated that he wasn’t aware of any recent deletions around that time.

5 Likes

Yeah, I personally am not aware of any removed ratings but maybe the other mods took action while I was on vacation.

It was humbling to have my game Absence of Law fall so far in the rankings. But I can’t control public opinion of my past games, so I will continue to try to make better games in the future! I have one I’m working on right now that I will put in parser comp or IFcomp next year.

9 Likes

I would not take it that seriously, @mathbrush. From my records there were three new ratings for the game over the course of the tournament: Two were 4-star and one was 2-star. As I tried to make clear above, even the 4-star ratings hurt your ranking! From what I can see, your histograph of ratings for Absence of Law looks pretty darn good.

It really seems to me that the IFDB ratings system is steadily losing its value over time. The all-IFDB average used for buffering was 3.91 near the start of the tournament – it’s now 3.92, which means it has increased a non-negligible amount during just the last two months. (I guess everyone has been trained to think that anything less than five stars means you didn’t like the game?) In my view it’s an unfolding tragedy of the commons.

6 Likes

During the full course of the tournament, I played (and rated) 16 new-to-me games! I also went back and added ratings for a few I’d played already and hadn’t rated. But of course that still isn’t that many in the grand scheme of things.

5 Likes

I didn’t rate very many games on IFDB based on the FIFP because I didn’t finish very many games from FIFP. I wouldn’t rank games on IFDB if I didn’t finish them, but I did vote for games after “initial impressions” which I tried to be open about how that might not end up being the “same vote” if I’d finished both the games in the matchup. But I do believe to rank on IFDB one should do their very best to try to complete the game first. I did rank a handful of the shorter games on there, such as my rating and review for Turandot.

What I did get out of the FIFP was a pretty great overview of the “Best of IF” since I took a sabbatical from the hobby around 2003 or 2004 or so until this year. That was a pretty great way to get a taste of quality IF in the last two decades, so thank you @otistdog for all the work setting it up.

-v

3 Likes

Oh yeah, when I say “played” I mean fully played/completed. I didn’t make it to several that I would have liked to (and still intend to play), because I only wanted to start ones I knew I could finish in time.

4 Likes