Can we just ban AI content on IFComp?

The issue with that is topics about AI start about different aspects of it. This thread, at least, is remaining firmly in the “AI in IFComp” subject it started in, instead of devolving into a “merits of AI” argument.

But I agree. It’s often exhausting to keep up with such long and repetitive threads. Some people mute the ai tag entirely; if it becomes overwhelming, I’d recommend doing that.

9 Likes

I’m tired of all the AI discussion on the forum too, I suspect most of us are. But the issue is not going to go away if we ignore it now - quite the reverse, in fact. I also wouldn’t describe any of the participants in this discussion as a fanatic. There’s a lot of nuance in the views expressed here.

20 Likes

People are having a respectful discussion about a complicated topic, I really don’t see what the problem is?

I understand that there’s a justifiable concern around gatekeeping, but I feel like the situations aren’t really comparable for a few clear reasons.

  • LLMs and other generative AI products are not specific innovations around interactive fiction or games, they’re major products of major tech companies with billions in funding and billions in marketing behind them. There is a large propaganda campaign to promote the utility of these products, downplay their downsides, and generally make them permeate every aspect of public and private life.
    • IF is, historically, a community and tradition of artistic practice built around shared aesthetics, practices, and purpose-built tools, from Inform to Twine to Ink to Dialog. Substituting that with large opaque systems run by major tech corporations dilutes the identity of the comp.
  • LLMs cause demonstrable harm in many ways; this is a discussion that goes outside the bounds of ‘what is 'IF’ and into serious ethical concerns. LLM chatbots like chatGPT particularly seem extremely harmful in the way they are currently being deployed; see for example this recent New York Times article.
  • We have plenty of evidence of other community spaces analogous to IFComp being harmed by the presence of generative AI. I can point to things like Clarkesworld being flooded with low-effort LLM-created submission, or the fracas around Parsercomp earlier this year. The increase in the use of these tools this year alongside the increase in volume of submissions overall points to a similar trend with ifcomp.
  • If nothing else: numerous people are sick of this stuff being shoehorned into everything and every space becoming a stage for the technological pageantry of LLM chatbot usage.

If you’re not already famliar with the comp and don’t already have a high degree of trust in the comp itself or its organizers, it’s easy to assume when you see some AI art that “oh this is just another thing that’s been taken over by AI slop” and dismiss it entirely because frankly, the internet is currently full of formerly-cool or interesting things that are now just stages for people to rub their models against one another. Like, people don’t have infinite time to give things the benefit of the doubt.

24 Likes

One possible scenario might be that they load up the “Entries” page, have the (defualt) random shuffle pop some AI cover art right at the top, go “Ugh” and click away without even looking at the rest. In the case of GamingOnLinux specifically, anybody considering writing an IFComp article is probably taking time that would otherwise be used just writing something else. It’s not necessarily a matter of “refusing to play” so much as just deciding not to bother.

Also, Bruno Dias laid out a problem with avoiding only the AI entries in the post that’s already been linked a couple of times:

This would seem to be ‘fine’ but it creates a dynamic where the only people willing to play and rate the AI entries are people who are not going to object to them on grounds that they are AI, and thus they’re getting judged by a different standard than everyone else’s work. This means that the final result of the competition is at risk of legitimizing AI use or worse, making someone who put out real work feel bad that they placed behind someone who put out slop.

11 Likes

I feel like the situations aren’t really comparable for a few clear reasons

I understand your concerns about LLMs, and agree with many of them. But I don’t agree that the current pushback against allowing gen AI in the competition is unprecedented. For instance:

IF is, historically, a community and tradition of artistic practice built around shared aesthetics, practices, and purpose-built tools, from Inform to Twine to Ink to Dialog.

Rewind a decade or two and there were community members arguing, quite stridently, that our tradition of artistic practice excluded Twine and Ink from your list. (History has shown this position short-sighted, of course; many works in these media are now considered some of our community’s best.) Shared aesthetics and practices evolve over time.

the internet is currently full of formerly-cool or interesting things that are now just stages for people to rub their models against one another

Sure; the Advent of Code global leaderboard was completely overrun by people scripting queries to LLMs, for instance.

I don’t think anybody here is arguing in good faith that we want a competition full of “games” written by ChatGPT based on a low-effort prompt.

I don’t think that’s the likely fate, in the long term. Perhaps AI boosters are currently chasing interactive fiction’s “Deep Blue moment”; but I expect that LLM use in interactive fiction will reach a steady state somewhat akin to the use of computer engines in chess. Everybody knows that they exist, and what their capabilities are, and makes use of them for training and analysis; but the sport is a showcase of human achievement and nobody finds a human vs. AI chess match very interesting anymore.

Concretely, I expect that in a few years after the novelty of LLMs wears off, the amount of AI slop submitted to the IFComp won’t greatly exceed the amount of non-AI slop it has always received; and that judges will firmly relegate such entries to the bottom of the standings. But I’ve certainly been wrong before!

4 Likes

I saw this post; but replace “AI entries” with any other unpopular subcategory of games (Windows executables or other games that only run on high-friction platforms; obvious joke entries; games with immature or objectionable themes; etc.) and the dynamic that you mentioned hasn’t historically borne out.

2 Likes

But all of these examples, whether it be the most puerile joke game, ungrammatical “my first game” or bug-ridden homebrew parser are things written by people, not things generated by an algorithm. I would happily write a review for the worst game in the world if someone had put their heart into it. Or a funny review for a game that only existed to pull my leg. But I’m not going waste my time playing and reviewing something made by a computer. That’s the difference.

20 Likes

One question to ask is, why are people willing to play and rate ‘bad’ games? (by bad I mean ‘a game that the player doesn’t like for reasons that the author could feasibly change in the future’).

There are four reasons I can think of. First, pure completionism. Second, for the player to learn from the mistakes of others to make a better game themselves. Third, to sort through games to rate/rank them so others can know what to play. Fourth, for the player/reviewer to be able to give advice for that author to help enable them to make a better game in the future, like a farmer planting seeds for a harvest.

The fourth is my main motivation now (in the past, it was to come up with theories about what makes games good or bad). As an example, Victor Ojuel entered IFComp for the first time with the game Pilgramage, which was creative but buggy. He got a lot of feedback and later came back to enter Dancing with Fear, which was a much stronger game. This pattern has happened a lot, with, for instance, Laura Knauth entering three times in a row with better and better games until she won.

AI authors are severely limited in how much they can improve. They didn’t write it, the model did, so either they can try to figure out better prompts or wait for a better model.

So what’s the point of reviewing it when they can’t do anything about it?

And this isn’t restricted to AI authors. There are a small number of authors who produce the same type of game every year or multiple times a year, get the same criticisms every time, and never change; there’s no desire to grow. And that’s fine! But what is the purpose for feedback if they’re going to ignore it? It’s not like I’m punishing them if I don’t review, they literally don’t want the feedback (except for the rare cases where people repeatedly make the same low-effort games and ask for praise for it). AI is in the same spot; feedback feels fruitless.

A lot of problems with AI are theoretically solvable. It’s already much better now than I ever expected it would be; 5 years ago, BJ Best used AI to write a game and the quality was laughably bad: You Will Thank Me as Fast as You Thank a Werewolf - Details

Similarly, AI art has increased with leaps and bounds. The criticism that ‘AI is boring/bad’ might go away one day. Environmental concerns might be resolved at one point. I don’t think we should judge future possibilities by present limitations.

But the feedback issue is one I don’t really see a way around. Competitions are here for people to share their work for others to enjoy and to improve. AI use limits the potential for improvement, and the better AI gets in general, the less worthy each AI-generated entry is, since people can just generate their own. So I just don’t see a place for AI in competitions, not because it’s ‘evil’ but because it doesn’t satisfy any of the purposes for competitions in the first place.

27 Likes

I think this goes even beyond reviewing to the question of like… is something like Penny Nichols meaningfully ‘a piece of IF’ that you can ‘play’? Because the prompt given as the entry is really quite short and light on details. This works because chatGPT will spackle over all the gaps with generated detail, but have two people who played it had meaningfully comparable experiences? Can you discuss it after as if you played the same thing?

Are you playing a piece of IF or are you just role-playing with chatGPT as a player of a piece of IF? Is Penny Nichols even real? Does an object exist that we can say ‘is Penny Nichols’ in the same sense that we can say the z5 file for Curses ‘is Curses’. Can Penny Nichols be meaningfully archived if you don’t have access to chatGPT for whatever reason?

Is there anything there to talk about or take in as a piece of art at all? Like, you can’t talk about the prose (something I care about a lot in my reviews of IFcomp games) because the prose is just the same prose chatGPT will generate for anything else. You can’t talk about puzzle design, or surprising plot developments, or the characterization of the characters because the author did none of those things, chatGPT might generate them slightly different for different players, and it’s basically just another sampling of the slurry that makes up all chatGPT output anyway.

When it comes to games written by an LLM as opposed to games that actually use the live chatbot as a component, I think it’s less clear-cut, but I still think there’s a lot of practical reasons to ban it. I know that arguments that “oh making a game with Twine is too easy” are something a lot of people have queasy memories about, but the reality is that even the lowest-effort Twine still required the writer to write something.

The problem with LLMs is not that they enable people to make low-effort things; you could make low-effort games with BASIC. The problem with LLMs is that they enable someone to make something low-effort that looks, at first glance, like something high-effort. It short-circuits a lot of the first-pass logic people use to evaluate things before going in depth on their actual merits.

Particularly, LLMs are very good at generating huge volumes of text that is passably readable very quickly and with minimal user input.

This is not really healthy for the community, especially because it discourages low-skill, high-effort authors – people we’re supposed to encourage and nurture – from engaging or improving.

28 Likes

For a competition like IFComp for me the main issue is human creativity. And if a game isn’t grounded solidly on that basis, I don’t think it can be fairly judged alongside other games in the competition, nor do I think it’s fair to the other competitors.

As a reviewer again this time around I’m making all sorts of decisions about which games to play or not to play. Largely based on the time limits I have, as well as the games that appeal to me the most. And those that I can technically run (eg not the Windows executable that crashed my Mac trying to run it under CrossOver!). And responding to human creativity is a key factor for me. And it makes me extremely reluctant to play games that bypass that.

I will definitely make an effort to remember to address the question of AI fully in the post competition survey this time. But I agree with others that this is too big an issue just to leave to that. I’m also very aware as a participant in IFComp for 30 years now that post comp surveys are frankly not very well promoted and are easy to miss.

And equally while I grow weary of AI discussions here too, I don’t want them to be censored here. Scroll by if you don’t want to read yet another AI post or thread. But it’s too big an issue for the community to ignore and likely to become even more so.

20 Likes

I think one crucial difference here is that people who don’t run a Windows executable or bother with a high-friction platform are neglecting to play those games rather than refusing to play those games. The smaller number of ratings might have an impact - possibly more extreme scores - but there won’t likely be a trend one way or the other.

The AI entries, however, create a situation where someone who objects on the grounds that the author didn’t even write the thing (or that playing it would waste electricity/water or create the impression of increased demand for ChatGPT etc.) can’t reasonably express that in a rating, but someone who’s really keen on AI will happily play and may well rate highly, possibly because they see it as a cutting-edge use of their new favourite technology. Rather than merely reducing the pool of potential judges, it may eliminate critical judges in particular.

11 Likes

I think this is something we’re all wary of as we try to talk about LLMs in IF. Indeed, you’ve pointed out that some of us are trying to show “this time, it’s different”. I’ll just say that from reading about IF history, the reactionary response to Twine and ChoiceScript games doesn’t seem to be “engine-related” or “game design-related” per se. There were already parser games exploring hypertext game design, and certainly hypertext and choice games already existed before Twine was a thing.

For me, the main reasons that Twine creators were targeted are their gender and the subject they’re exploring. Because they were not recreating the typical Scott Adams or Infocom adventure game and instead exploring traumatic and grounded subjects, they were punished. The community has rightfully moved on, evolving their aesthetics and practices to be more inclusive. We still have tension that flares up once in a while, but I think we all generally agree they are “interactive fiction”.

I think LLM authors are not on the same level, at least right now anyway. I believe it’s accurate that to say the skeptical and anti-AI camps are discriminatory, but it would be weird to lump them together with the anti-Twine camps of the past. The latter discriminated based on gender and experiences, went so far to harass and threaten people, and created a large fuss about it. At the moment, the former just hates LLMs for a number of reasons.

Rather than saying “this time, it’s different”, I want to say that what happened to Twine and friends is very different from anything we’re talking about here and other “AI in IF” threads. While people were concerned that Twine and ChoiceScript would take over interactive fiction and blot out parser games from existence, the subtext is more like “I don’t want the community to be more inclusive with race, gender, and class”. It’s about the question of including marginalized people into the discussions, not concerns about the technology and its implications.

17 Likes

It does occur to me to point out that in fact LLM-written games can’t really tackle difficult or highly personal topics, because the LLM can only ever write from an implicitly hegemonic perspective; you can’t exactly write personal work by having a machine do it for you; and in many cases LLM output is outright censored or programmed to not go to certain places, it’s literally been trained to be incapable of exploring certain things (although seemingly without successfully preventing habitual chatbot users from accidentally ‘jailbreaking’ the LLM out of those safeguards)

18 Likes

Shared aesthetics and practices evolve over time.

GenAI content isn’t an “aesthetic,” it’s the crushing absence of aesthetic, an algorithmic average of other people’s (stolen) work that has no point of view, says nothing and means less.

Using GenAI to do your brainstorming, writing, coding, and/or artwork isn’t a “practice,” it’s a steadfast refusal to develop a creative practice of your own. This may be a foreseeable consequence of a culture that values only the product (preferably monetizable, and as much of it as possible), but that doesn’t make it any more admirable or less upsetting, especially when those values are generally at odds with those of a niche creative community.

15 Likes

Stupid LLMs making me jump in alarm at the “it’s not x, it’s y” construction (nothing against you at all, Lionstooth, but to the LLMs for ruining the construction of prose).

3 Likes

I’ve noticed chat-gpt 5 is obsessed with “pragmatic” and “forward-looking” answers.

1 Like

That’s your opinion today, about today’s tools.

You’re definitely right that this was a facet of the backlash against Twine.

And look, I’m not going to die on the hill of equating howling dogs with Penny Nichols, Troubleshooter.

But there were also a lot of people driven by, as you say, a fear that the rising popularity of choice-based games meant the end of interest in parser-puzzler from the one small community that still valued them. I know because I was one of them!

I mean if it counts for anything: I advocated for Twine authors, I wrote both choice-based and parser fiction, I was on the ‘right side of history’ such as it is all those many years ago, I’ve worked with (grammar-based, not ML-based) generative methods and promoted them, I’ve generally advocated for a broad and inclusive definition of ‘IF’.

Frankly, I find the comparison kind of absurd and almost insulting. Chris Klimas didn’t take billions of dollars from a16z. Microsoft didn’t put a dedicated Twine button in the Windows 7 task bar. This is comparing intra-community drama driven largely by parochial concerns to an actual broad societal concern around an emerging technology that a lot of people with a very considered understanding of the subject matter think is harmful in a variety of ways.

23 Likes

I do think it’s still different from the “how we feel about AI in the IF community” discourse.

Parser games are part of the IF tradition, so it makes sense for people to worry about its future. It could certainly lead people to some nasty rabbit holes, but the continuity of parser games is very important to preserve.

But I’m not sure if the same case can be said of LLM games yet. It’s still too new a tradition, and I don’t think there is a “community” we can speak of yet in the IF space. That may change, but I believe there’s no community that “value them” now, to use your language. We are not talking about what the current community is thinking (the anxieties around parser games’ existence, etc.) but rather speculating about the place of LLM games in the IF community. Even though this year’s IFComp is unusual with its number of LLM-based entries, it’s still a small fraction. As far as I can tell, this space hasn’t developed any further from being a fad.

The “wait and see” approach was something I took since, in spite of the technological and societal concerns I have, I remain apathetic about it. But I’ve come to realize that this passivity is sending mixed messages to the public. The thread has, for the most part, a lot of nuance from many prolific members, but it can also be read as us tacitly approving of the technology without any hesitation.

If anything, I find that a lack of a coherent message is probably threatening the IF space. As we think about potential censorship and so on, we are still alienating people who could’ve been interested in joining the community. It’s all a bit absurd when you think about it: we’re thinking about the few people whose entries are causing reviewers some anxiety and wondering if removing them is a tad too far when many others look at the IFcomp website and go “I don’t think I can support that”.

I can’t predict the future of LLMs, nor do I care to. I think it’s more important to consider what people inside and outside IF are saying now. It definitely goes beyond the scope of what a “post-competition” survey can do. Maybe, history will prove I’m wrong about the state of LLMs. But even then, I think it’s important to focus on the present, actually hear what people are saying, and solve the problems that’s making this year’s IFComp so uncomfortable to talk about.

12 Likes