Can we just ban AI content on IFComp?

To be fair, I would never use Claude to write Inform. There isn’t enough sample training code to make Claude as good at it as something like python or typescript.

Now you’re moving the goalposts. You generated this code and provided it as an example; you challenged me to identify the bugs in it; I took the time to do so; now you’re dismissing that and saying you could generate more code in something like Python, and it would be better.

And that’s the problem—I, we, this entire forum, will never be able to bug-check Claude’s output as fast as you generate it and post it. It takes us far more effort to fix the problems than it takes you and Claude to create them. It’s the gish gallop of technology, if you prefer that over “DoS attack on the forum’s contributors”.

28 Likes

The decisions you make in writing the code are intrinsic to shaping the story told by playing the game. The exceptions to this (linear twine games with default styling maybe? - maybe!) are the edge case.

Although as has been said a few times at this point, a ban on LLM generated code would simply be unenforceable.

18 Likes

And Jack Toresal and The Secret Letter and The Shadow in the Cathedral.

Neither were designed in code (though they were designed by established IF authors).

The premise that you can’t write a great a game and separately write the code is already proven false.

To suggest an IF story must be done as text and code together is gatekeeping. Nothing more.

Alright, this thread keeps popping up for me and I need to say something just for the peace of my mind.

I come from a VN dev background. This is a genre heavy on images, very visually oriented (I mean, it’s a visual novel). You often have entire teams working on one jam game because there’s a lot of work to do: coding, sprites, backgrounds, GUI, music, sometimes voice acting. It’s possible to solo all this with clever use of resources, though. Surely, VN jams would allow poor solo devs to use AI to assist themselves so everyone’s on the same level?

Let’s take a look at three big VN jams organized on itch: Otome Jam (for otome games which require a romance between a woman heroine and a man love interest), O2A2 (Only One of Any Asset, limited to have 1k words, one sprite, one background, etc etc), and Spooktober (imagine Ectocomp and IFComp having a baby. Terrifying, I know, but that’s what it is to me).

Otome Jam 2025. 92 entries. Rule 5: Use of Free to use, Public Domain and Creative Commons Licence resources are allowed and encouraged if you can’t fill in a position in your team (…) AI generated content is not allowed.

O2A2 2025. 215 entries. Note under the rules: Additionally, AI-generated assets are not allowed. Utilizing systems in which the training data is not owned by the dev (ChatGPT, Midjourney, Stable Diffusion, and similar) is not allowed, and those entries found in violation will be removed from the jam.

Spooktober 2024 (as 2025 is ongoing). 268 entries. Rule 6: No generative AI usage in any aspect of your project. Humans must illustrate, write, program, etc every part of your submission! Be mindful if you are using third party assets as per the previous rule, as this extends to 3rd party assets.

I need to ask: why are people in IF, which is objectively less resource-intense, so adamant on keeping AI in major competitions, hell, in the biggest competition? “I can’t code, I can’t make cover art” learn then, I had to learn this, it didn’t kill me. And if you can’t learn, team up with someone who can do this. Ask around.

For all the community that’s going on here (which I’m grateful for, don’t get me wrong), IF seems like an extremely individualistic thing where every author fends for themself and makes their own thing alone, not counting beta testing or an occasional collaboration. Perhaps this is a part of where this whole dilemma comes from: people want to make something but can’t go through all the steps on their own so instead of asking “hey, anyone wants to work with me on that? anyone wants to help me out?”, they resort to a machine that will simply give them what they want. And I find it incredibly sad that people would rather type in a prompt instead of connecting with someone who knows how to do things. I find it sad that we’re still having a conversation over whether or not the biggest comp out there should feature work that was done by a faceless LLM when VN community has no problem deciding that it prefers to see things which are made by an actual person out there. I find it extremely sad when people think they can’t learn or afford to fail.

You’re better than this.

35 Likes

No, but they were coded by another person whose programming choices affect the end result of the game—in Pacian’s words, they shaped the story told by playing the game. The fact that they were coded by someone different than the person who wrote the words that appear on screen doesn’t change that, it simply means that the authors delegated part of the design to someone else.

And there’s a very, very big difference between delegating to a real life human being who presumably understands what game you’re trying to make and how their code can help achieve your goals and delegating to an LLM that by its very nature cannot.

23 Likes

What? No. For smaller games it won’t kill you but that’s true of any small-scale programming project, and larger games will absolutely make Inform, Twine, etc come apart at the seams. (Winter-Over needed a last minute optimization pass last year because the amount of data we were storing caused up to 30s of lag whenever the player tried to save.)

Besides, why shouldn’t we have pride in our work? Why shouldn’t we have standards and best practices? This is a concerning thing to see from a sitting board member of the IFTF.

20 Likes

Strongly disagree! “It compiles” is about the lowest bar you can set for a computer program. Why should IF have such low standards? When I think of all the headaches I’ve had trying to make my games run smoothly it gives me another headache, but on balance it was absolutely worth it.

16 Likes

So you completely ignored my whole post. Cool, cool.

Maybe you’ll listen to @Naarel and @svlin instead.

4 Likes

100%. Part of the reason we are a community is that we help each other out. I can’t count the number of times I’ve had help from someone here who is a cleverer programmer than me. Lots of people here like to help. It’d be a shame if everyone just asked Claude to solve their coding problems for them.

16 Likes

Also, since I talked about VNs already, I’ll add that the separation of coding and writing has been a thing here for a while, at least in the circles I’m in. It’s common for teams to have dedicated coders and dedicated writers. Writing is often done out of engine in a completely separate document, then after being checked by proofreaders/editors (which are a separate entity from playtesters!) it’s all ported into the engine. This is a habit that stayed with me even though I do IF now, so a vast majority of my works exists in a Google Doc first as pure writing, then gets ported to Twine.

The thing is: a dedicated writer focuses on making the writing the best it can be. A dedicated coder focuses on making the code the best it can be. Sometimes they interact and ask each other: what can we do to make this game the best it could be? What coding tricks can we use to emphasize the text here? Should this part be rewritten to better fit what can be achieved in-engine?

I think that this is the true spirit of any competition: to strive towards something which is good instead of good enough, something that functions well instead of just compiling. Isn’t IFComp meant to be a celebration of brilliance of the community? Are we meant to just take “good enough”?

17 Likes

I get that this is a toy example, but the output Claude generated is pretty badly structured and isn’t how a (competent) implementer would write this at all. One thing that sticks out to me is the ‘held by the player’ property the LLM generated, which is never actually used anywhere, but there are other issues.

Ultimately coding is never ‘just’ coding, it’s expressing narrative design and technical design decisions that are going to determine how players interact with the piece and also the process of writing the piece.

Tthe LLM here made a bunch of those decisions, which it had to because the prompt is vague enough. It decided on verbs, it decided on the outcomes of actions and how they work. It decided on the data structure that these verbs use. This is offloading meaningful creative decisions to the machine, and I don’t think it should be permissible (refer to my above comments about social norms if you want to argue enforcement). You can go back and forth with Claude going “now make it do this, make it do that” but that just means that you’re, even more now, designing the game in conjunction with a chatbot.

Besides, the thing also inserted a bunch of player-facing game text, so it’s not really sidestepping the issue of using an LLM to generate writing, is it?

I think using the vibe coding box for something like Inform 7 is incredibly pointless… if you’re going to write a thorough enough prompt that expresses a clear narrative design of the feature you want, you’re basically just writing a spec. And the whole point of i7 is that it’s a predicate language so the code closely resembles a spec.That’s what makes it so useful outside of IF as a design prototyping language, even.

There’s plenty of evidence to suggest that LLM coding isn’t even useful in other fields, but for the specific domain of writing parser games with i7 it’s facially unconvincing.

“It compiles” is a low bar in general but it is a shockingly low bar in i7, a language designed deliberately to be extremely syntactically and semantically permissive, and with a syntax that also resembles the natural language English that LLMs are trained to spew large volumes of in the first place.

20 Likes

Before we forget that this is a thread about IFComp and not just us debating the merits and demerits of AI, I’d like to point to the best practices for IFComp authors:

If you apply no other guideline on this page to your IFComp entry, please consider applying this one.

Every work of interactive fiction, no matter the format, style, or experience level of the author, benefits enormously from playtesting prior to its first public release. We strongly recommend that authors always send their work through at least a round or two of testing – and subsequent improvement, based on feedback – before submitting it as entries to the IFComp.

What “testing” means varies depending upon the nature of the IF at hand. A complex, puzzle-laden work in a classic text-adventure mode might best benefit from weeks of rigorous, iterative testing, fixing, tuning and re-testing as a volunteer quality-assurance team scrambles all over the game, trying to break it as much as trying to solve it. On the other end of the spectrum, testing a story-focused work would likely focus more on the quality of the prose than on the underlying mechanics, scrutinizing the narrative branching and providing feedback about how meaningful the choices felt.

However you choose to test, the mere act of it will all but guarantee that the work you submit to the competition shall be immeasurably more polished and ready for public scrutiny than a wholly untested work.

This is very good advice and seems to be at odds with the claim that “code quality” isn’t required. Imagine if someone decides to send in a sloppy Inform 7 work with too few synonyms and buggy rooms, I think a judge has every right to be pretty critical.

So, the idea that “it compiles and works as I intended” does not seem like it’s on the same level that the guidelines like us authors to work on. Perhaps, you’re fine with having a very different kind of standard. But for the IFComp judges, that won’t fly:

Playing your game yourself does not count as beta testing. As the author, you willingly forego the ability to see your work with the same perspective as a reader approaching it with no foreknowledge. Of course you’ll “play” your game quite a bit while developing it, and you will find and fix plenty of flaws on your own. But you need people who are not you to play the game precisely because they lack your knowledge and expectations about it. They will uncover subtle confusions, unintentional surprises, and other problems that you would have never spotted by dint of being so close to the creation.

Whether you might like it or not, IFComp games have a higher standard to adhere to. There’s an expectation that the games are playtested by other people and refined over and over again.

AI or not, the games should have been playtested a lot. Code quality does matter, especially in a competition where a good chunk of the judges is practitioners in their own right. We want to be sympathetic when we encounter bugs and errors, but if it happens a lot, we have to lower our score for the game.

I find it genuinely ridiculous someone would claim code quality does not matter for IF, especially in the IFComp context. A cursory look at past entries would show bugs do matter. This is a baffling statement that is counter to the very first guideline all authors should follow. I don’t understand how anyone could say that.

20 Likes

I imagine the intent is “code maintainability doesn’t matter, because IF works usually aren’t maintained for very long after release”. But I disagree there too, because the LLM-generated code is filled with bugs, which matter in any sort of programming—and maintainability is what lets you find and fix those bugs.

That aside…


I came into this thread as a skeptic; a single-digit percentage of LLM-generated entries didn’t seem like a crisis worth litigating in the middle of the competition. Surely it could just wait for the feedback survey, and if it became a problem, it could be changed next year?

But the more I’m hearing from the pro-LLM crowd, the more I agree that it should be banned.

The most compelling argument I’ve heard for banning LLM-generated works from community events—since, after all, the ethical and environmental issues could theoretically be improved—is that people enjoy generating them but don’t enjoy playing them. The slop takes barely seconds to shovel out, but much longer than that to wade through. And this thread is only reinforcing that argument. The pro-LLM side is posting code samples that they clearly haven’t even read, asking us to find any issues in them, then when we put in the work to respond in good faith, they shrug and say they can always generate more slop later.

Stack Exchange (the community, originally, before the company reversed the policy for the sake of shareholder value) banned LLM-generated posts because they amounted to a DoS attack on the system’s quality-control mechanisms. It takes five seconds to generate and post a short story’s worth of nonsense, but several minutes to determine that it’s nonsense and delete it. LLM-users enjoy churning out slop, but reviewers don’t enjoy reading it. Engaging in good faith is useless, because the LLM never learns from its mistakes. And letting those quality-control mechanisms be attacked causes clear harm to the community that relies on them. (This is where I first picked up the DoS analogy I’ve been using.)

After reading this thread, I’m convinced that we shouldn’t allow DoS attacks against this community either. It takes time and effort for reviewers to play and analyze the games. That time and effort should be devoted to things people care about, not fire-and-forget shovelware that took less time to create than it does to play through. And I’ve yet to see any compelling argument that LLM-generated IF is anything more than that.


EDIT: To be clear, this is my personal opinion as a member of the community, not an attempt to sway the IFComp judges, and not an official moderator ruling. Vote according to your heart and your own impression of the entries; please don’t go rate all the LLM entries a 1 without playing them. I haven’t tried any of the LLM entries in the current IFComp and have no idea if they’re any good or not; this post is based on my experiences in this thread and the examples posted in it.

37 Likes

Yeah, for better or for worse, this thread forced me from “skeptic who isn’t sure if a ban is right despite my skepticism” to “this is degrading how our community functions, and we need to do something about it fast”. The individual LLM user will definitely find it unfair, but it is also extremely unfair to put judges and reviewers into this situation.

Until there’s a way to balance this, I’m advocating for a ban.

27 Likes

215 posts in 7 days is just over 30 posts a day. Congratulations…?

4 Likes

How well put, I couldn’t agree more. Thank you.

Code is not simply a container for text that is presented to the player as content. The structure of the code already stems from a narrative intention. And since this intention is always, at least in part, emerging in the work of the author or coder (we code and write in order to also discover what we want to code and what we want to write), this explains why, when we ask an LLM to, say, brainstorm a narrative idea or find out how to approach a particular coding method, we are mainly informed about what we definitely don’t want.

The juxtaposition of elements does not form an intention. Having an answer for everything does not form an intention. Guessing what is probable or consistent, completing a pattern, imitating a style, balancing parameters, and anticipating needs by systematically programming common use cases does not form an intention.

Narrative intention is exactly the opposite: a perspective, a point of view, an insolence, a transgression, an abstraction, a reappropriation, a recomposition, a selection, a diversion, a wilful ignorance, a contempt, a kiss, a love, a disability, an imperfection, a confession.

Good luck to all the GPTs in the world in coding desire. Without a body of flesh and blood that will experience death, it’s going to be complicated.

12 Likes

I guess this thread is the real winner of the Comp…

7 Likes

I believe this is one path to writing IF and I also believe a storyteller will find a way to tell their stories one way or another. IF is beautiful because it’s a collaboration of stories and technology. I’m sympathetic to the line being drawn on a complete ban and if I complete future works with Sharpee and Claude founded on my writing and storytelling, I’ll happily release them outside of all competitions. None of us do this for glory. We do it because we have stories to tell and we love the medium. I just think the coding doesn’t matter if you tell a good story.

2 Likes

Here’s what it inspires me: if the story is everything and the code doesn’t matter, then the gameplay and the interactivity management don’t matter, and the player doesn’t matter. Instead, we’re looking for a reader, and this isn’t a game, it’s a story. In a story, the interlocutor invests their imagination differently than when confronted with a game.

9 Likes