General reminder to fill out the Post-Competition Survey with your suggestions!
This is through that survey that the IFComp committee discuss potential changes for future edition of the IFComp! Thank you
@moderators can the non-survey discussions (the ones where someone said “i don’t want to put this in the survey” or “this is too long for the survey”) be moved to another thread(s) so that the survey discussion can be here? it seems the only survey-related discussion aside from the OP is #8, #41, and #62. i feel like the “bias” discussion could be its own thread since it’s generated the most conversation.
FWIW dfranke did touch on it here:
I don’t agree with that analysis, but I’m worried we’re beating a dead horse at this point so I’ll leave it there. Plus the other thread in this discussion - helping games with few ratings get more of them - feels more clear cut and possibly easier to tackle!
FWIW, I’d think it twice before getting the game unlisted. I understand the disappointment with the ranking, but consider there were players that genuinely enjoyed it and took their time to write positive reviews.
Certain features of the game (both in content and in the stripped down presentation) are polarizing, as evidenced by the bimodal distribution. There’s always some entry of this kind. Of course, having taken part in the testing, I must say yours is the 2024 entry I was most invested in. It might have been somewhat underrated, but it’s a substantial work and it has certain cult appeal to it. It would be a pity to have it removed from the comp.
You’re asking me to drag this down to the object level and make some claim as to how the outcome would have been different if only the judges had been more enlightened, and I’m not going to do that. I don’t know what would have changed and it would quite arrogant if I thought I did. My concern is with improving the process, not with altering the results to be more to my liking.
Yes, let’s focus more on that topic. My opinion stands that my proposed rules for blind reviewing would improve things, but I’m clearly in the minority there and I accept that. Other measures aimed at balancing the attention that games get from reviewers could improve things too, and seem to stand more chance at consensus.
I think people are just confused as to what the problem is here, and in particular if it’s severe enough to warrant the solutions you’ve suggested.
I agree that in a perfect world we’d see a more or less equal number of votes per entry but there’s so much more going into that besides author recognition. (Game length is one factor that I don’t think you’ll ever be able to control for, and download-only entries also take a hit here.)
Last place was a game submitted under a pseudonym. If your argument is that forcing known authors to use pseudonyms would even out the amount of attention games get, I don’t see how this evidence supports your point.
I don’t have an actual spreadsheet or anything to back this up, but my feeling is that having fewer votes actually confers a slight advantage. In the same way that (if I understand correctly) an extremely niche game might manage to get a single rating of 10 and soar straight to first place, a game that attracts 100 ratings but a couple of 1s or 2s among them might get bumped below one of similar quality that just never reached anyone who seriously disliked it. I suspect this effect will be most pronounced when there’s stiff competition right at the top of the scale, since a game that’s getting mostly 10s can’t get an 11 no matter how many more people judge it, but it may well get a 5 or a 6 that brings its average down.
Scrolling through the results for recent years seems to back this up - with the largest numbers of votes almost never appearing among the top placing games - though I realise that shorter games tend to get more votes AND tend to get lower ratings that more ambitious works, so it’s not exclusively down to more judges widening the range of scores given.
As it happened in the past (sorry to be that guy), I believe this thread about “recognized authors” (or whatever is the case this time) getting higher attention/scores is unfair to the winners of this Comp.
While it’s arguable that getting more votes doesn’t exactly push you up the scale (in all honestly, an average of 80-90 votes is enough to balance any bias, imo), the subtext is that a game with an average of 8.51 shouldn’t have won.
I don’t like this kind of reasoning. My 1.5 cents.
To add: the reverse is true, imo, on the other side of the scale.
That’s a good point. What I was trying to get across was more or less the opposite: that an entry that attracts more votes is being more seriously tested. In practice I don’t think we’re likely to see a game with few votes rocket to the top either, since anything being rated highly enough to do that is likely to attract more attention through word of mouth.
Other things being equal, larger samples are better than smaller ones, but enlarging a sample with unrepresentative data points makes it worse. I think most of us agree that it’s bad if authors go around canvassing for votes from like-minded people, even though by doing so they’re enlarging the sample. Likewise, I think most of us agree that it’s bad if a controversial but good-faith entry gets brigaded by people appearing out of the woodwork just to dunk on it, even though, again, this enlarges the sample. My contention is that whenever an entry receives far above the median number of ratings, that should be cause for concern even when the reasons for it are more subtle and there’s no bad faith involved. It always means that something is distorted, and those extra ratings are probably adding more noise than signal.
This gets tricky, though. Imbalances in rating counts are a symptom; they aren’t the disease, and a lots of things that treat the symptom could end up making the disease worse. If the problem in some particular case is that 20 bad-faith judges showed up to downrate a game, then absolutely the last thing we would want to do is divert away 20 good-faith judges so that the rating count is balanced. Naive solutions that simply encourage judges to focus their attention on games that don’t have a lot of ratings yet are going to have exactly that problem. So let’s be more careful than that.
Note about Rule #4 for authors:
- Authors may not encourage competition judges to violate the rules that pertain to them (as listed above). This includes, but is not limited to, the rule requiring judges to cast all ratings in good faith.
In other words: while you are free to talk about your entries in public, please avoid suggesting to judges, directly or indirectly, how they ought to fill out their ballots. Competition voting rules and guidelines already instruct judges to rate entries according to their own tastes and principles, based on their individual experiences with the works. Please do not ask them to act in any other way.
Also, the Committee does check votes and will discard votes from Judges that were not cast in good faith.
On pseudonym, I agree with Marco, and point toward an important case in point:
The Meteor, the Stone and a long glass of Sherbet
written under a pseudonym by Graham “Lord Inform” Nelson.
IMVHO deserved its first place, aside the author’s name, but if was entered under his actual name, the bias in favour will be much higher, isn’t ?
hence, the pseudonym rule is here for a reason, a very valid one.
Best regards from Italy,
dott. Piergiorgio.
The last-place was a joke/experiment that everyone was vacuously eligible to rate within seconds of playing. It’s not surprising that it got a lot of votes and that a lot of those were negative; it’s also clearly an outlier and not a good example to look towards when testing hypotheses or proposing policies.
Sure, I’m not saying otherwise. My point was simply that dfranke was using it as evidence for their argument, when it clearly is not.
I think any effort spent on making the competition bigger – in terms of players/judges – is going to be far more valuable than any effort spent on making the competition fairer. And that’s in part because if you want to make the competition fairer, by far the best way to do so is to make it bigger.
Let’s take this year’s #10 game, which is… Winter-Over, which has a mean of 7.21, a standard deviation of 1.47, and 61 votes. The standard error of the mean can be estimated at about 1.47/SQRT(61) = 0.19. So with about 90% probability the ‘real’ score of Winter-Over lies between 6.83 and 7.59. That’s the range from the #14 game to the #7 game.
Taking this year’s #20 game, which is… The Maze Gallery. This has a mean of 6.45, a standard deviation of 2.09, and 40 votes. Here the standard error of the mean is a fairly large 0.33. So there’s about 90% probability that the ‘real’ score of this game lies between 5.79 and 7.11. That’s the range between the #38 and the #11 game! (And I’m not even taking into account that all those other games also have a standard error.)
So, I’d say that this competition is small enough that if you care about outcome fairness, then getting in more judges – which makes the standard error smaller – is the way to go.
I think IF Comp is way fairer than the Nobel Prize of Literature, though, which is awarded by a tiny cabal of members of the Swedish Academy.
Exactly. Also a lot more forgiving, imo.
Haha, true enough. IFComp: already much fairer than most major awards that actually have a bearing on people’s careers!
I’m not sure if this belongs in a post-comp feedback (it’s more to do with the archive), but I’d love to see the search terms (length, system, choice/parser/other) search bars stay if that was possible. Those things are a huge QOL help for me when looking for games to play both during and after the comp.
That definitely belongs in the post-comp feedback form!