IFcomp 2018 reviews

In fact, this was exactly how SpringThing started, to fill the perceived gap in longer games caused by IFComp’s popularity. Eventually the number of entries dwindled until Aaron Reed removed the entry fee and changed focus away from only long games, and now it seems to be thriving. And despite the change of focus, there’s still a good number of longish games in SpringThing each year.

If you want to try harder to encourage even more longer games than already appear in SpringThing and IFComp, I think you’d need quite a lot of money for incentives; I just don’t think there are enough people out there willing to make longer games otherwise. (I’m not even sure there are enough people out there willing to play longer games anymore! I know I’m not… I have enough of a backlog of unplayed games as it is.)

There’s another thing going on here too, I think. The entries in the comp have many different qualities, and people are allowed to vote by whatever criterion they wish. This is stated upfront in the comp rules, and it’s not controversial in any way. A person who votes based on, let’s say, how much effort they think went into making each game, is going to rank the entries in a completely different way than a person who votes based on humour, or grammar, or immersiveness, pacing, polish, or political relevance. Many people simply rank the entries based on how much they enjoyed them, and that probably correlates more with good storytelling than with diligent implementation. Which is perhaps unfair, but it’s a fact of life.

The implicit assumption here – that enjoying good storytelling is somehow more “unfair” than enjoying diligent implementation – bothers me.

The implicit assumption here – that enjoying good storytelling is somehow more “unfair” than enjoying diligent implementation, or that telling stories is less valid as a goal or a skill – bothers me.

Surely lft simply meant that certain types of games are consistently ranked lower than other types, and that this feels unfair if that happens to be the type of game you prefer to make. If it were the case (this is purely hypothetical) that parser-based games are consistently ranked lower by a majority of judges than choice-based game, then there is a sense in which this would be unfair. In the same way lft tells us that if polished puzzle games are consistently ranked lower than story-centred games, then there is a sense in which this is unfair. This in no way implies that telling stories is less valid as a goal or skill then implementing puzzle games. Indeed, the invocation of fairness strongly suggest that lft believes that all genres and the skills needed to make them are on a par and ought, ideally, be judged on the same level.

Thank you Victor, that is spot on!

I did not intend to imply that telling stories is somehow, objectively, a less valid goal or skill than coding a robust game world. I apologize to all the excellent storytellers in this community if it came across that way.

To elaborate: I wrote that the IFComp encourages judges to rate the entries based on any criterion they wish. Certainly an individual judge is allowed to feel that to them, personally, storytelling is more important than coding a robust game world, or less important for that matter. Therefore, by the rules, an individual judge is allowed to rank the items such that either of these aspects weighs heavier.

Now, it might be the case that a majority of the judges rates one of these goals (e.g. storytelling) higher than the other. This will be reflected in the result of the comp.

So, under the assumption that both goals/skills are equally valid, but judging is based on what the majority considers to be important, there is a sense in which the process is unfair.

It sounds like you’re trying to say validity is objective while judging is subjective.

But if hard work is objectively better, shouldn’t Birmingham IV, which took 30 years to make, Be the winner? Space Punk Moon Tour, which had 20 hours of work a week for a year?

I’m not sure there’s a disagreement here… We are all (in this thread) taking as read that different people like different things; and the comp rules encourage each judge to vote for what they like.

Is it just a matter of whether the word “unfair” is deployed to describe “unequal results” vs “an injustice”?

I think so. I took Linus’s comments as that the results may always be biased a certain way, and that is no big deal, based on factors we can’t control. In other words, “not perfectly fair” instead of “so unfair and it needs correcting.”

I see any bias as similar to standardized testing being unfair to certain people, even though there’s no evil laughing villain making it unfair, and no matter how many experts tweak it. It won’t show some people’s talents, so we need other avenues to correct that. And we do, with judge feedback, reviews, etc.

Are you allowed to donate prizes to the IFComp prize pool that are earmarked only for specific categories of game? e.g. “$100 for the highest placed parser game” kind of thing? This would incentivize the type of game you want to see more of.

But it is an odd system. Imagine Olympic gymanistics being judged this way. You, the judge, observe only a sub-set of the participants and then give a score based on your own criteria. Maybe you grade on a curve and maybe you don’t. The hope is that given enough judges any unfairness gets cancelled out, but maybe it doesn’t.

Obviously this year’s comp was judged by this year’s rules, as it should be, but maybe it would be helpful to investigate other voting sytems. I’m not suggesting an electoral college system :slight_smile: , but…

some sort of pairwise ranking might be just as easy for judges and take away some the ambiguity in the current system. I’m thinking about the Condorcet method: https://en.wikipedia.org/wiki/Condorcet_method#Pairwise_counting_and_matrices

Judges would play 5 or more games and then rank them. They are doing this now, really; just sort by score. The rankings are collected from all the judges and a pairwise ranking algorithm applied.

This could be done with the judging data from this year’s contest. It would be interesting to know if the current methodolgy yields more or less the same result as the Condorcet method.

If anonymized data is avaliable, I would be willing to generate a spreadsheet. This is not meant in anyway to dispute this year’s results, I’m just curious about this method of vote counting.

It’s a matter of the word “unfair” being deployed frequently without people specifying what exactly they find unfair, and of the subtext being that this entire discussion was prompted by yet another person angry that Twine games exist in the competition.

Let me see if I can express this from another angle, without using the words fair and unfair.

IFComp is a popularity contest. The biggest crowdpleaser wins, by definition, since the crowd votes on what they like. A popularity contest measures the zeitgeist, at least within the community, and the zeitgeist (again by definition) changes with time.

IFComp has always been a popularity contest, and judges have always been free to judge by any criterion (the 1-10 scale was introduced in 1996). Complaints along the lines of “the IFComp ain’t what it used to be” are on shaky ground. Sure, the rules have changed every now and then, but the central tenet has remained the same, namely that the result should be based on people’s personal preference.

Rather, it’s people’s personal preference that ain’t what it used to be. What was popular twenty years ago is different from what is popular now. A majority within the community enjoys creating choice-based games, and a majority likes to play them. And IFComp simply reflects that. If we change IFComp to something else, by restricting it to a mode of interaction that was popular twenty years ago, it will cease to reflect the popular opinion of the community. In my humble opinion, that would go against its original spirit.

Now. It seems that fewer parser games are released every year, and this is a cause for concern. It saddens me quite a lot, actually. But I don’t think the answer is to bury our heads in the sand by banning other forms of expression from the comp. I think the answer is to: Write more parser games. Encourage people who write more parser games. Promote parser games to a wider audience. Advance the art of parser game programming (which I hope to do in a few days). Innovate, rejuvenate, win back the crowd, steal the spotlight, and leave a black and white feather in its place.

Yes, I’m pretty sure this exact thing has happened before. There was a prize or a sum for “highest placed parser game”.

I think I disagree somewhat with the “popularity contest” thing - Less than ten people (probably) played every game. The scores are averaged by the number of votes for that particular game. I would assume a game that got one solitary vote of 10 would beat a more “popular” game that received 100 votes of 9.

This is why people complain about the “hate votes” of 1 that seem to crop up every year, even on the most polished games. Though it’s not compulsory, the consensus is a 1 generally implies a game that’s broken or unfinished and nigh unplayable, or an offensive troll entry. No vote is better than a 1 for the game’s average.

Sure, in the highly unlikely situation that there was only one voter on a game and they gave it a 10. (Or, given that the comp requires five ratings to be counted in the rankings, the even less likely situation that there were only five voters, all of whom gave it a 10.)

In more realistic scenarios, there are enough votes per game, and the range of scores is narrow enough, that any individual vote will not affect the ratings much. To illustrate: Someone gave Campfire Tales a 10 and it’s still second-to-last in the rankings. If that person had not given Campfire Tales a 10, it would still be second-to-last in the rankings. Add an extra 10 voter: still second-to-last in the rankings. Add two extra 10 voters, still second to-last. As it turns out, the game would need a total of six extra 10’s (seven total) to move up in the rankings.

Even in the mid-ranges, where scores are closer, outliers don’t have that large an effect. Remove the one 10 vote from Eunice, and it would only drop three places. Remove the one 1 vote from The Forgotten Tavern, and it would only rise three. (Examples chosen because they had fairly clear outliers.)

As far as vote count, eyeballing the results page, there doesn’t seem to be a substantial correlation between medium and number of votes, except that the Adrift games got substantially fewer votes (probably because of the difficulty in actually opening them). The only other clear data point is that the entry with the most votes by far was +=x: the only one with more than 100 voters. The reason for this should be pretty obvious.

There also don’t appear to be clear patterns this year like automatically giving 1s to specific types of games, so that is a plus.

And… pick ParserComp back up and organize it! If you (not necessarily addressing you in particular) really like diligent implementation, and it saddens you that the voters prefer good storytelling, make a comp where diligent implementation is prized! Though “parser” ≠ “diligent implementation.”

I’m repeating myself, but if you see downsides to the fact that IFComp has become a certain way, the easiest solution is to make a comp that’s the way you’d like it to be. Well, not easiest. But most likely to get what you want.

Mostly I want another ParserComp and I wouldn’t be able to get it together to organize it myself. (Though I could supply some webspace if needed–I did that with at least one of the ShuffleComps.)

Just to spell out the obvious, you mean that it was first in the alphabetical listing?

I heard this 4 years ago. There was a nasty flame war on these forums in 2014 over this. I walked away with the idea to make another parser game. I’ve been working on it for 2 years now, but I’m not really encouraged to release it in the comp with the way things are stacking up. A 7.06 would put me around 14th this year. That’s well below 4th, and if only 2 parser games are in the top 10 I don’t know if putting a parser game in the comp is a good way to highlight it anymore. Maybe I’m wrong.

I feel we’re comparing apples to oranges and arguing over which fruit is the best. It’s a matter of taste. But if you think splitting the comp would marginalize choice-based. I think you’re wrong. It would marginalize parser. And maybe that’s a reason not to do it. The only way to stop it would be to split the money down the middle. But that’s not going to happen, I just thought I’d put it out there.

Exactly. (It being by a well-known and well-regarded author, with polished cover art, certainly doesn’t hurt, but there’s (probably) a reason he gave it that name.)

That’s the other thing about choice-narrative popularity - Parser games require a ton of work and testing. There were multiple games this Comp that were in development for more than five years. I actually did a test in Inform 7 for the first room in Cannery Vale (which you can download and look at from the itch.io page if you care to, but it’s literally just one room and a description and a music/logo test).

It would have probably been a really neat game in a parser, but there’s so many moving parts that it would have been a nightmarish thing to do in the minimum of time I left myself, and I wouldn’t have been able to bring it together so quickly. I only had perhaps two people sort of run through the game, and they didn’t try very hard, but other than my own minor typo issues, there was only one major bug which was meta and mostly engine-based where the game didn’t restart cleanly after hitting an ending - that’s a rare thing to get to without thorough and devoted beta testers who will try that. That’s not to say that parser games lack in quality - most actually get polished as much as a Pixar movie due to the dev cycle unless the author is chasing a deadline.

Aside from that, I had much better control over the atmosphere and the look and feel and the music in the more media-friendly choice system. Part of the appeal of text I believe is to readers who don’t want extra atmosphere fluff.

I have made three ASM games which were constructed start to finish in less than 2 months - although two were concepts that had been in my head for years, so it’s not really fair to say I made the entire games in that short of a time. I wasn’t quite writing as I went along, but I’ve learned how to execute a concept quickly in a way that’s nearly impossible in parser without a schedule that rivals AAA-games with fifty people working on them. With a choice game, the coding and testing is simplified. If I had enough complete ideas of reasonable length, I could probably crank out a game every two months.

[double post]