Thoughts about shapes of vote distributions

This started as my comp afterthoughts, but just turned into a bunch of discussions about histograms.

Wizard Sniffer has a strictly increasing histogram; I think the only other games that have had that are Slouching Towards Bedlam and Violet. This suggests the game could become a fixture in the canon.

Eat Me is really surprising with its bimodal distribution. I’ve seen a lot of games with 2 peaks (Pogoman Go! had peaks at 6 and 8), but I’ve never seen one with peaks at 8 and 10. It’s like there were two groups: one that thought is was awesome, a 10, and one that thought it was awesome, an 8. Having a peak at 8 as the ‘bad’ group is pretty incredible!

Removing the two 1-votes from Harmonia would not have changed the rankings; it would have still taken 3rd with a score of 8.44.

Other games with interesting distributions include Alice Aforethought and Charlie the Robot. Alice has peaks at 5 and 8 (with 8 being bigger), while Charlie has peaks at 2, 7.5, and 10. Alice was a hard puzzle game, so I wonder if the 5 peak consists of people who ‘bounced off’, with the 8 peak being people who finished it. Charlie the Robots peaks are (in my guess) people who were immediately turned off, people who didn’t really like it but were impressed by how long and developed it was, and people who adored it.

Swigian had two peaks (I just noticed; that’s neat!) which makes sense since I wrote it to be good in some ways and bad in others; it’s probably representative of the 2 populations that value the two groups of things I focused on or didn’t focus on.

10pm has an increasing series of votes with a steep dropoff after 8. What’s going on there?

Just Get the Treasure has a small peak at 3 and a higher peak later. This is a game with most of its content hidden away, so the 3 is probably from people who didn’t see that.

The Traveller has tons of peaks. It is a graphic novel, so it probably just really split the votes. The Dream Self, a game with similar dynamics, had a similar distribution.

A big chunk of games in the center were slightly buggy or underdeveloped but had good concepts, and they all have bimodal distributions (the most exaggerated being Day of the Djinn).

What Once Was has all of its votes in the range from 3 to 7, which is probably only possible due to the smaller number of voters (30). It also has a low standard deviation (1.39).

Another thing I noticed is the presence of very low (say, <= 3) ratings among even many of the top-10 entries. Harmonia isn’t my cup of tea, but really— 1/10? All of the games in the top 30 have some four- or five-point ratings, and presumably some of those are from people who just don’t have any interest in parser IF, non-parser IF, games without puzzles, games with puzzles, game that are too depressing or serious, games that aren’t serious explorations of human drama, games that raise the dread spectre of medium dry-goods puzzles, or whatever. Fine. For a game to merit 1/10, though, I would expect it to make spiders pour out of your monitor, or simply to not compile (which, if the alternative is cyberspiders, probably merits at least a 2/10).

Looking at the ratings, it seems that basically all games that aren’t parser-based (so twine games, plus games like Salt or 10pm) have 1-star ratings, while almost none of the higher-ranked games with parsers do. I would guess that there are a couple people out there who have a personal vendetta against non-traditional games being included.

Yes I noticed that it seemed to be the non parser games that were affected by this. Sad to see, and it’s something I thought might have been picked up by the comp software.

People have said for the past few years that someone is automatically giving all the web games 1-votes, but Black Marker, Nightbound, Hexteria, and Moon Base didn’t get any. I’d bet at least one person who favors parser did indeed cast 1-votes for various web games, but I’d hesitate before calling it a vendetta. What seems more likely to me is that you’ve got a handful of voters who eschew nuance and vote 1 to mean “I didn’t like it” and 10 to mean “I liked it,” since that voting mindset is pretty common outside IFComp. Maybe I’m being too charitable.

In 2015 and 2016, I had more 1-votes than anyone else who entered. This year I didn’t get any. It seems as though someone with a vendetta would’ve kept it up.

Harmonia had strong feminist themes. Maybe a voter strongly objected to this?

Here’s to hoping that the 1-star folks left anonymous feedback on the games… I’m very interested to see what rationale they provide for their ranking. I’m going to guess many are along the lines of CMG’s thinking – either a flat, “I didn’t like it,” or an even more unhelpful, “This isn’t IF.”

A common theme among the feedback I’ve received is that the game simply felt incomplete – even people who quite enjoyed it expressed a desire for it to be longer or its story more detailed. So perhaps that’s why the 9s and 10s are hacked off? A for Effort, F for length, as it were.

And a tiny side note – 10pm is in fact a Twine game, and I believe Salt might be as well, but I’d have to doublecheck on that. At least you can safely count 10pm among Twine games, Aziraphale. :stuck_out_tongue:

This is perhaps too cynical, and I’d like to believe someone wasn’t just voting down non-parser games, but the four games noted above (mine included) that didn’t get any 1-star votes were already receiving low to lukewarm public reviews. It could be that the voter(s) didn’t expect them to perform well anyway, and instead focused on better-reviewed games?

Could be a stretch, but as noted already, all the top rated nonparsers had 1-stars in there while the parsers did not. I’d think if it was a lack of nuanced perspective, it might carry across both.

I really don’t get the 1-votes for Will Not Let Me Go and Harmonia, either. Voters can vote how they want, I suppose, but to me Harmonia’s presentation alone ought to rule out any scores that low.

Will Not Let Me Go has the same two 1-votes, which indicates not. Although maybe the two 1-votes are from different people.

Future Threads also has an unusual distribution: Almost steadily increasing until peaking with lots of 9’s, but then no scores of 10. Maybe voters thought it was too short - like litrouke says about 10pm? This gets back to Mathbrush’s Swigian experiment about length being one of the things voters (perhaps unconsciously) consider.

I’m one of the Future Threads 9-voters, and I thought the length was perfect. I just reserve the 10 for my favorite game of the comp; in this case, Eat Me.

Right; I guess the only thing that’s clear is that the 1 votes were political and not based on the objective quality of the work (and I encourage the authors not to put too much stake in them). I can speculate that “Will Not Let Me Go” was too “touchy-feely” for certain “hardcore” IF fans, but again, that’s wild speculation.

It’s well known that the IfComp rewards crowd-pleasers and penalizes games that take risks with either its themes or mechanics, and hope that nobody is discouraged even if their game didn’t make the top 3.

In an open competition with anonymous voting, anyone can get 1-star scores, and who am I to assume the votes were in bad faith? Nevertheless, I appreciate the community’s concern—it speaks well of us to discuss these things and to urge authors not to be discouraged by the occasional bad score.