IFComp 2024 Post-Competition [general feedback and survey discussion]

This is one issue I actually would like to see improved myself! One step in this direction was shifting away from alphabetized game lists to randomized. It used to be that the first game in the list would get way more reviews and ratings than any other, so people would do absurd things to get the first game, like using punctuation or numbers. Once randomization happened, that went away.

So finding new ways to get an evener amount of reviews per game seems like a good goal to me. Some are just hopeless (like games that require you to build them from python or only work on windows as executables), but most I think have room for growth.

12 Likes

Just mathematically, it’s pretty hard for that to happen since the rating scale is bounded at both ends. A game which averages an 8.5 can’t possibly have a stddev higher than 3.21, while a game which averages a 5.5 can have a stddev up to 4.5. Of course if the 8.5 game really did hit that max it would probably still get the banana, but nothing is ever really going to come out that extreme.

I think it’s pretty hard for that to happen because the overall vote count organically corrects for it. We don’t see a 5.5 game with a stddev of 4.5 win first place because the voting process as it exists is resilient to those kinds of aberrations.

I’m not following your point, or maybe you’ve misunderstood mine. What I’m saying that it’s mathematically difficult for a game to achieve both the mean typical of a competition winner, and the stddev typical of a banana winner. The closer a game’s mean is to 5.5, the easier it is for it have a high stddev.

1 Like

Right. And I’m saying the more controversial a winner is, the lower its overall score and the higher its stddev will be. A first-place winner at 8.5 with a stddev of 1.1 seems pretty reliable. So I don’t think we need to make new rules for a problem that arguably doesn’t exist.

Just a thought, but maybe the random shuffle could also take number of current ratings into account? So you get a list of all the games starting with 0 ratings sorted randomly, then games with 1 rating sorted randomly, etc.

1 Like

It certainly isn’t the case that controversial works necessarily do badly; this year’s banana winner placed 11th, after all. It’s just hard for them to do well enough to win. The mathematical pressure isn’t up or down; it’s toward the median. The most remarkable example is probably The Gostak, which won the banana with a mean score of 5.35, a score distribution that looked damn-near uniform from 1 to 10, and 23 years later is one of the most highly-respected IF games ever.

To get back to your original point, I still don’t think that’s a problem that needs to be solved, and I still don’t think that mandating pseudonyms is a good solution.

3 Likes

And in fact, the only competition which is certain to be bias-free is one in which everyone ends up tied.

7 Likes

Let’s focus on more ideas for this, since I’m clearly in the rough on what would be my preferred approach. One thought which comes to mind would be to dynamically adjust shuffles (including personal shuffles) over the course of the competition, so that games that don’t have many reviews yet are weighted toward the top of the list. I would also try to balance attention according what kind of reviewers have looked at it so far. If a game has so far gotten disproportionate attention from reviewers who tend to rate everything highly, try to put it in front of some grumpier critics (like me…), and vice versa. This could be a good antidote to any hype-induced effects.

1 Like

This seems amusingly exploitable, btw. Some day somebody is going to create a competition entry which only runs on VMS/Alpha, the whole game will an epic tragedy about the rise and fall of DEC, and both reviewers who manage to run the thing will think it’s the most amazing and nostalgic work of all time.

4 Likes

We already have a minor pressure in that regard - the Google doc list of reviews highlights games with few reviews, and also highlights the average number of reviews. I don’t feel like the Comp needs to put its finger on the scale in this regard since the community tries to help.

Also, I’d be annoyed if my personal shuffle changed during the voting period. At most maybe the Comp could put a little icon saying, “More ratings wanted”, but then there’s a whole bunch of balancing to be done there as well.

5 Likes

Daniel, seems that you forgot a critical detail :smiley: :

Albeit encouraging to wait the release of isekai, I have not only allowed, but also encouraged the release of derivative fanfic, in the spirit of the japanese dojinshi scene.
Hence, if an IF :smiley: is set in Railei, isn’t automatic that I’m the author :rofl:

Best regards from Italy,
dott. Piergiorgio.

1 Like

Yeah, but it is a lot more likely…

2 Likes

If this bothers you then maybe make a separate “ratings wanted” (or perhaps call it “smart shuffle”) tab that sits next to the current sort options.

1 Like

The one year I was an IFComp judge, one author told me off for not considering other reviews when I wrote mine (I found some bugs that others apparently hadn’t - name of author who said it withheld because after a discussion, the author agreed they’d overstepped a line and resolved not to repeat the error).

New authors are also often judged a little more gently on certain norms (particularly grammar and adherence to hinting norms) than established ones, so requiring a change of psuedonym would potentially cause problems with this if any author’s change of psuedonym was not noted by some of the judges (not all of them are on intfiction.org).

The problem of figuring out how to make new entrants feel welcome is a reasonable one, worth expending some effort. Judging (and reacting to judging) is difficult enough without it being turned into a guessing game about how far from other reviews one is allowed to be without censure.

A tool that allowed sorting and filtering by a bunch of metrics (such as author-estimated length, platform or author-provided tags) could solve the randomisation issue, since the sorting/filtering could be on a different tab to the randomised one (combining them would probably be a lot of work for not a lot of additional gain).

4 Likes

I don’t have time to play most games during the judging period anymore. In fact, I don’t get too much farther than the five mandatory games the rules require me to play.

The list of games that I do play is heavily biased—I will indeed play games by authors I recognize and whose past work I respect, as well as any whose blurbs catch my eye, or that are getting a lot of positive buzz on the forums. I’m not convinced that this bias is all that harmful: I don’t automatically give a high score to the games that I play, nor to “big name” authors (in fact I wager I’m harsher on games by people who have already proven able to produce excellent work, compared to newcomers).

I could see it being a problem if a well-known author canvasses votes from their fans, who strategically give that author a 10 and all competitors a 1. Nothing in the data this year or in the past suggests to me that this is more than a hypothetical problem, though.

Finally it is worth keeping in mind that there are well-known quirks in the kinds of games that tend to do well in the ifcomp; and that many games that have ended up on “best of all time” lists did not win the competition the year they were entered. I would not take the competition results as any kind of objective assessment of a game’s artistic merit.

9 Likes

If we’re going to go pseudonyms, we might as well not even list author names on the list until finished. That may not work though as stuff leaks out unless people have been very secretive with their development :stuck_out_tongue: It also has a downside that if you see an author you like in the comp, many people often go looking for more of their work. If you have to go back later and remember which game and match it up with an author… probably less likely.

I get that famous authors have a bit of a perceived unfair advantage here, but it is what it is and I’ve seen plenty of games placing highly from previously u:)

Yeah, as someone with very little free time, I struggled to get through the entries I managed to read as it was and didn’t come remotely close to looking at even half of them sadly.

Pretty sure this is not allowed in the rules? (Or at least heavily frowned upon if not?) I’ve always been a bit careful about not pushing say people from the chociescript forums to come over and vote high for the CSG entries because that would be unfair (if it worked out that way, this is very theoretical without much crossover with the CSG audience here, still I wouldn’t want it to be seen as unfairly biasing results.)

Still, I can see your point that those with existing fan bases are likely to attract more readers that already love their work and rate it highly. Not convinced there is a good solution that benefits everyone.

Apart from the niche games, I’d argue this is probably going to be tricky to do. Although they don’t tend to rate as well, I’d suspect short games will get more reviewers and really long games (especially the parsers?) would generally get less. I could be wrong on this, but it just feels like more people are likely to knock over a 15 min game than a 1.5hr+ one.

2 Likes

I get that 6 weeks can feel like an AGE for the authors – waiting anxiously every day for a review that might come in. However, I think it’s about right to maximise the chances of getting more reviews, and it really isn’t a long time for those poor souls who make it their mission to play ALL the games.

Not sure pseudonyms will help much – although I actually got marked down by one reviewer for not having images in my game, just because I did last year!

I personally think IFComp needs to do everything in its power to widen the comp as far as possible - attract new players, new authors, new reviewers (and all those folk who love games and books but have never heard of IF). That means promotion, word of mouth, marketing, and just generally making the whole thing as fun as possible – if that also means a few of the rules have to be relaxed to encourage engagement, so be it.

As has been said above, ultimately this is an amateur event, with amateur authors, reviewers and organisers. And yet, compare this with the Academy Awards (a very close comparison I’m sure you’d agree) and I think there’s far more rigour here and far less group-think! And of course, the awards ceremony is way more entertaining.

14 Likes

This analysis makes a lot of sense. Has the evolution of the comp rules over these decades made it more fun? I think it has, and certainly it has grown in the number of entrees/judges/participants. If the authors didn’t think it was fun (most of the time), I imagine they’d go somewhere else to market their games.

The competitors take great pride in their entries, and are bolstered by reviews which are either positive or at least gracious, and understandably upset when reviews come across as catty or unjustly biased.

In terms of enforcing ‘fairness’ for a competition which invites such a wide range of themes, styles, and design philosophies, I don’t think anyone could ever nail down a rubric that would be universally applicable. When I started reviewing, in 2003 or thereabouts, all of the games were parser games, with no graphics or text styling, coded in one of two systems. I had a fiddly rubric which didn’t even work very well for me at that time.

After a few years I threw away my fiddly rubric and assigned holistic scores: Do I respect it as a sample of writing? Do I admire elements of the coding and design? Is it interactive? and mostly do I enjoy playing it? I can’t possibly play every game in comp, so I try to balance between playing games that draw me in with their clever blurbs, and those which are fed to me by the randomizer.

7 Likes

@dfranke you’re the only one who is pushing this idea that the judges have groupthink bias toward popular authors that should be curtailed. You haven’t addressed mathbrush’s point (which I agree with) that this discussion seems to be talking around the source of the problem you have.

So let me ask directly. What in this competition that is IFComp 2024, has spurred your claim that people are overly biased toward certain authors? Which game placed high that you don’t think should’ve if the judges weren’t so apparently biased? Who is the person you’re thinking of who is “positively recognized” in a way that causes groupthink? be specific.

5 Likes