Why Are There So Little Parser Games Now?

Maybe better, but we are still counting “popular votes” vs. “experts votes” or whatever you want to call it.

Now, this sounds elitist… Sorry.

Ratings aren’t only subjective, though, they’re context-sensitive. If a competition game gets the bulk of its ratings during the comp by reviewers who are playing other comp games, it’s likely to be scored in relation to the zeitgeist. An “objective” six in a year of strong games might only average a four.

You might also score my objective-six Lovecraft pastiche lower if you’ve just played Anchorhead, or if the love interest reminds you of your mean uncle, or if you’re hungry.

Sure, there’s probably a real difference between an eight average and a two average, but if we’re going to pick an arbitrary number & say “Games over this line are objectively good,” we’re… I guess I just don’t think that’s a real thing.

(ETA: This goes for expert votes as well as popular votes. Also trying to decide who is an expert and who is qualified to decide who’s an expert sounds like one big dramatic headache in the offing.)

Yeah, you are right.

The one thing I think might help is just some guidance from IFDB itself about what the ratings are supposed to mean. On Netflix or Letterboxd, for example, when you rate a movie, it’ll show you a description of what each star rating means:

1 Star: I hated it
2 Star: I didn’t like it
3 Star: I liked it
4 star: I liked it a lot
5 star: I loved it

IFDB does list what the star ratings “should” mean, but you can only access the explanation (as far as I know) when you’re on the screen to write a review. Here’s what it says:

[1 Star] A terrible game; completely unrecommended
[2 Stars] A badly flawed game, but maybe worth a look
[3 Stars] An average game; or a mixed bag, with some real strengths but some serious weaknesses
[4 Stars] A very good game, highly recommended
[5 Stars] An excellent, exceptional game

I kinda try to follow this myself, although the three-star rating ends up bearing more weight than the others for me, since many things can go into a “mixed bag.”

It seems very skewed towards 3-ratings, but it’s clear that people don’t follow this probably because it’s not very visible. At least one user has an user page saying they give games they “sort of liked” a 2.

Yeah, it is skewed towards the three-star rating. Sometimes I end up giving games I don’t care for three stars, just because I don’t think they’re “badly flawed” enough to deserve two, and sometimes I end up giving games I think are better than average three stars, because I don’t feel quite justified nudging them up to four. But the result is that I’m giving the same rating to one game I enjoyed and to another that I didn’t. It’s probably inevitable when you have only five stars to choose from, but I think it’s preferable to increasing the range to ten. In my experience, people don’t take advantage of the potential for more nuanced ratings with ten-star systems, and you get even more extreme disparities.

But doesn’t it also minimise the effect of an extreme 1-voter or a delighted 10-voter? And, in fact, don’t those gut reactions deserve a place in the rating? If one person played the game through and thought it was worth a 6 but eight people had some a reaction to its beginning scenes that they voted it as 1 immediately… sure, a review is always best, but this “disparity” also reflects the game. A game which has wildly varying ratings is very intriguing.

In a five-star system, this sort of thing is not very visible - you end up with it being skewed towards three stars. Whereas in a 10-star system some games might be skewed towards 1-3, others towards 4-6, or towards 2-4… It’ll gravitate towards something a bit more realistic as representing the reactions of the players.

Anyway, you mention some experience with that system, and I have none, so I defer to expertise. I’m not convinced, but I won’t push it any further. :slight_smile:

When people have more choices, they tend to struggle more selecting one, so with a ten-star system you likely would get more people voting extremes and shying away from the ambiguous middle. Of course you’re right – that’s just the opposite of what’s happening now, with people (such as myself) being funneled toward the three-star rating. Issues arise either way.

Then again, the community would factor into it too. It might be the case that IFDB has a small enough and conscientious enough user base to make good use of a ten-star system. I suspect it won’t change though, since it seems to work well enough currently.

I don’t know if the difference between a 5 and 10 star system is relevant. If a 5-star system says 3 and a 10-star system says 6.5, I’m liable to treat anything in the latter between 5.5 and 7.5 as a 3 in the former, anyway.

There are also other effective ways to normalize ratings, like Bayesian estimates. IMDB is really, really good at that. There’s also the Netflix approach, where they give you a personal rating based on opinions you share in common with other people with similar interests, among other things. I don’t know if IFDB has enough traffic for those types of solutions to be feasible.

One other note: at the end of the day, the reviews that affect me the most are the ones written by people whose opinion I already respect. If a rando on IFDB gives a game 5 stars, I may or may not pay attention. If Emily Short calls it the game of the year, I download it immediately.

IFComp 2014 was the first time I voted in any such competition, and I was concerned that I should vote ‘well’. As such I found the advice on the Guidelines for Judges page very useful, which goes thus (for a 10-point system):

  • 10: This game epitomizes what interactive fiction can do, perhaps breaking new ground in the process. It dazzles and delights. People interested in the form will be talking about and studying this game for years to come.
  • 7, 8, 9: A good/great/excellent game you’re pleased to have played, and which you’d recommend to others (with three gradations of enthusiasm).
  • 5, 6: A respectably crafted work that didn’t necessarily move you one way or another, but which you might recommend with reservations. (A 6 offered more to hold your interest than a 5 did.)
  • 3, 4: A flawed project that doesn’t manage to live up to promise, and which you wouldn’t generally recommend playing. (A 4 has more going for it than a 3 does.)
  • 2: A work that technically qualifies as IF, but seriously misses the mark for one reason or another (or several).
  • 1: This work is inappropriate for the competition. Grossly buggy to the point of unplayability, perhaps, or maybe it’s not interactive fiction even by a generous definition of the term.

…but I also note that on that page are links to two other rubrics for judging games. Would something like the above rubric (or Sam’s, or Jacqueline’s) be appropriate for ‘standardisation’ of votes on IFDB? (Quite possibly a pipe dream…)

That is a good system used for things like this. If only IFDB had better guidelines on how to vote and a 10/10 system as mentioned above, ratings might be slightly more accurate and can give a better idea to players whether or not a game is decent.

Gonna be pessimistic here. I wouldn’t mind seeing it, but I doubt it would help significantly.

Cheers for the people who do take the time to rate games and write reviews in the IFDB! You are heroes and very much appreciated. But…

I think the real problem with the IFDB ratings is that there just aren’t that many people rating, and (mostly a problem for nonparser games) some of the people rating are doing spite rating.

In a heavily used system, people find their own metrics and balance each other out. You can see this in action on Yelp and Amazon.com. Under those circumstances, three wildly varying reviewers don’t impact things very much. But on the IFDB, anything over 10 ratings is a heavily reviewed game, and it was probably released in an IF community competition, because there’s a very strong correlation between releasing in a competition and getting rated.

Before making this assertion, I ran some info about my own IFDB stats.

[spoiler]I have never released a game outside a competition that received more than 4 ratings. By contrast, the only competition release I’ve had below 2 ratings was my IntroComp release, and even the SpeedIF games have gotten at least 5 ratings apiece.

IFComp releases (3 games): max 29, min 16, median 21
ShuffleComp release (1 game): 9 ratings
IntroComp release (1 game): 2 ratings
SpeedIF releases (4 games): max 13, min 5, median 9
Non-community competitions (1 game): 4 ratings
Noncomp releases (3 games): max 4, min 1, median 2

I should say - there isn’t necessarily a correlation between releasing in a competition and getting played. But there is a correlation between releasing in a competition and getting played by the IF Community (or at least the IFDB-rating part of it).

I’m basing this on a sample size of one, admittedly, but… more people reach out to me about This Is A Real Thing That Happened, the non-community competition release, than anything else I’ve ever written. Because TIARTH is innately connected to my server, I can see play stats, and I’m also pretty confident more people have played it than any of my other games, too. (One Eye Open might give it a run for its money, but that’s the only one.)

Despite this, it only has 4 ratings on the IFDB, and I think that’s because it wasn’t released in an IF community competition.[/spoiler]

True. I took a look at the database stats and saw that there are 6k+ registered users, but I can say that a majority of them (I mean, a huge, huge number of them) are inactive. And even if there are active members, half of them aren’t going to be playing the new releases. And in that small number, another half won’t be rating those new games that they have played. Therefore, we have very few people playing the new games, which will make the game’s ratings a little off or inaccurate (1 five star and 1 one star?). But just like Youtube, we can’t expect everyone to like or dislike a video.

On average, I see only 2 ratings per new release on IFDB, which is extremely sparse. Perhaps it’s because players don’t bother rating games on the site, because it doesn’t really affect much (unlike comps, where ratings either decide a game will win 1st or get last). Or maybe the number of people playing IF actively nowadays is dwindling. Let’s hope that it’s the former!

BTW, I loved One Eye Open. :smiley:

Back in the 90s and aughts I used to play around with IF a lot in TADS, coming up with overambitious projects that were abandoned when they started to run too slowly, spending more time inventing systems and subsystems than actually finishing anything. I didn’t really mind; it was fun, and gave me a sense of accomplishment to fix bugs for projects that nobody else would see.

I toyed around in Adrift and Inform a bit, but always went back to TADS because it fit my brain the best.

I’m getting back into the scene now as an adult, two decades later, and I’m probably going to do so through Twine. Why?

Accessibility. I would like to make things that people outside the hardcore IF community will actually play, and hypertext CYOA is a lot more accessible than parser-based games. Twine, I think, gives me enough complexity to keep me interested, while keeping the UI simple enough for a player-base made up of people new to the format.

Of course, there’s a good chance that I’ll just go back to TADS and make something neat. It’s something I go back and forth on daily.