Can we just ban AI content on IFComp?

It’s worth pointing out that LLMs cannot play chess; if asked, they just start making up illegal or impossible moves as soon as the game leaves extremely well-trodden territory[1]. Dedicated chess engines are very good at chess. This is again, the problem with the term ‘AI’; it’s putting a bunch of disparate stuff under one umbrella.


  1. It is of course possible to have an LLM call a chess engine as an external service, but of course I can also call a chess engine as an external service to beat you in chess. ↩︎

17 Likes

That’s why I used “AI” (with the quotes) and not LLM.

1 Like

The chess test is pretty funny. It plays “fine” for about ten moves and then shits the bed.

1 Like

I made that graph prior to seeing enough context window to be productive targeting a complex problem. I was estimating the progress LLMs would make over time but also in relation to complexity.

if I updated the graph, it would move the upswing to the middle and solidify the line since I’ve seen measurable productivity targeting complex problems, especially with Claude Code, Opus 4.1, and a MAX $200/month subscription.

While I have strongly mixed feelings about LLMs, I can’t agree with this statement. My experience over years of hunting for information on Stack Overflow is that I have to scroll through dozens, if not hundreds, of posts that discuss problems similar to but not necessarily identical to the one I’m having, full of the same arguments, multiple solutions, and philosophical debates we’re having here (even if Stack Overflow does enforce topic rules more rigidly). I work in a weird development environment where virtually every project needs a different technology, and traditionally I spend at least half my development time researching problems. I would say that in the short term, it may be that LLMs’ single greatest strength is that they reduce the amount of time it takes to hunt for useful answers on Stack Overflow. So I give full credit there. However, that comes with multiple caveats.

First, it works well for senior developer with enough experience to evaluate the code that is returned and correct mistakes. However, junior developers are unlikely to have the same understanding, and less likely to learn from the hunt - and vibe coders, well they’re just stuck. So that’s a mixed bag.

Second, and representative of a much longer term problem, is that the LLMs are eating Stack Overflow’s lunch. As more developers use it for answers, fewer are asking questions on Stack Overflow, which means less food for LLMs, which is to leading to informational stasis and LLMs feeding on their own output, leading to diminishing quality responses (to say nothing of bad actors poisoning the LLM well with malicious information). LLMs are quickly becoming ouroboroses (ouroboroi?) that eat and excrete their own poop. It seems entirely possible that in the long term they’ll become less useful rather than more.

An idea I’ve discussed with my boss a couple of times is that it would be nice if we could have LLMs trained exclusively on, say, the full set of O’Reilly books. That seems like an actual service worth a subscription cost.

7 Likes

If I were on the team guiding the training of an LLM for coding, I’d have cut off stack overlflow already and pointed it at open source repositories with high code quality. I’d be buying source code. If I thought of it, I would hope OpenAI and Anthropic have also thought of it. :slight_smile:

I agree with all of this except where code is concerned. I believe augmenting your IF storytelling with GenAI is an acceptable line to cross. Why should writers care about "a trumpet is a kind of thing” when they can just prompt Claude to “write a trumpet musical instrument that the PC can play songs on from sheet music” and watch the magical OZ make that code happen?

This took two prompts:

I'm writing an Inform 10 game and need the following code for this:

a musical instrument (trumpet) that the PC can play from sheet music

implement all of these things so I can add them to my story file

then

Assume there is a single room called The Band Room and the trumpet is on a chair in the room.

produces this code (in Claude)

"Musical Instrument System" by Your Name

Chapter 1 - Musical Instruments

A musical instrument is a kind of thing.
A musical instrument can be held-by-player or not held-by-player. A musical instrument is usually not held-by-player.

Playing is an action applying to one thing.
Understand "play [something]" as playing.

Check playing:
	if the noun is not a musical instrument:
		say "That's not something you can play." instead.

Check playing a musical instrument:
	if the player is not carrying the noun:
		say "You need to be holding [the noun] to play it." instead.

Carry out playing a musical instrument:
	now the noun is held-by-player.

Report playing a musical instrument:
	say "You play a few notes on [the noun], producing a pleasant melody."

Chapter 2 - Sheet Music

Sheet music is a kind of thing. 
Sheet music has a text called the composition-name. The composition-name of sheet music is usually "Untitled Piece".
Sheet music has a text called the melody. The melody of sheet music is usually "a simple tune".
Sheet music has a text called difficulty. The difficulty of sheet music is usually "beginner".

Reading music is an action applying to one thing.
Understand "read [sheet music]" as reading music.
Understand "examine [sheet music]" as reading music.
Understand "look at [sheet music]" as reading music.
Understand "study [sheet music]" as reading music.

Check reading music:
	if the noun is not sheet music:
		say "That's not sheet music." instead.

Carry out reading music:
	say "This is sheet music for '[composition-name of the noun]', a [difficulty of the noun]-level piece. The notation indicates [melody of the noun]."

Playing it from is an action applying to two things.
Understand "play [something] from [something]" as playing it from.
Understand "play [something] on [something]" as playing it from (with nouns reversed).
Understand "perform [something] on [something]" as playing it from (with nouns reversed).
Understand "perform [something] from [something]" as playing it from.

Check playing it from:
	if the noun is not a musical instrument:
		say "[The noun] is not a musical instrument." instead;
	if the second noun is not sheet music:
		say "[The second noun] is not sheet music." instead;
	if the player is not carrying the noun:
		say "You need to be holding [the noun] to play it." instead;
	if the player is not carrying the second noun and the second noun is not visible:
		say "You can't see [the second noun] well enough to read it." instead.

Carry out playing it from:
	now the noun is held-by-player.

Report playing it from:
	say "You carefully follow the sheet music and play '[composition-name of the second noun]' on [the noun]. [run paragraph on]";
	if difficulty of the second noun is "beginner":
		say "The simple melody flows easily from the instrument.";
	else if difficulty of the second noun is "intermediate":
		say "After a few tricky passages, you manage to perform the piece reasonably well.";
	else if difficulty of the second noun is "advanced":
		say "The challenging composition requires your full concentration, but you pull it off.";
	else:
		say "You play [melody of the second noun].".

Chapter 3 - The Trumpet

The trumpet is a musical instrument. The description of the trumpet is "A gleaming brass trumpet with three valves. It looks well-maintained and ready to play."

Instead of playing the trumpet:
	if the player is not carrying the trumpet:
		say "You need to be holding the trumpet to play it.";
	else if a sheet music is visible:
		say "You play a bright fanfare on the trumpet. The brass instrument's clear tone rings out beautifully.[paragraph break]You notice there's some sheet music nearby that you could play from.";
	else:
		say "You play a bright fanfare on the trumpet. The brass instrument's clear tone rings out beautifully."

Chapter 4 - Example Sheet Music

The jazz standard is sheet music. The composition-name of the jazz standard is "Blue Moon Serenade". The melody of the jazz standard is "a smooth, swinging jazz melody with syncopated rhythms". The difficulty of the jazz standard is "intermediate".
The description of the jazz standard is "Sheet music for a popular jazz standard, with chord symbols written above the staff."

The beginner etude is sheet music. The composition-name of the beginner etude is "First Steps". The melody of the beginner etude is "a simple ascending and descending scale pattern". The difficulty of the beginner etude is "beginner".
The description of the beginner etude is "A basic practice piece for trumpet students, with large, easy-to-read notes."

The fanfare score is sheet music. The composition-name of the fanfare score is "Royal Fanfare". The melody of the fanfare score is "a majestic and triumphant series of ascending intervals". The difficulty of the fanfare score is "advanced".
The description of the fanfare score is "An ornate score for a ceremonial fanfare, with complex rhythms and high notes."

Chapter 5 - The Band Room

The Band Room is a room. "The band room is filled with music stands and instrument cases. Afternoon light filters through tall windows. A sturdy wooden chair sits in the corner, and a music stand has been set up nearby."

The chair is a supporter in the Band Room. It is fixed in place. The description is "A sturdy wooden chair, the kind typically used by musicians during practice."

The music stand is a supporter in the Band Room. It is fixed in place. The description is "An adjustable black metal music stand."

The trumpet is on the chair.
The jazz standard is on the music stand.
The beginner etude is on the music stand.
The fanfare score is on the music stand.

Test me with "examine chair / take trumpet / play trumpet / take jazz standard / read jazz standard / play trumpet from jazz standard / drop trumpet / play trumpet from jazz standard".

Resulting in this output:

Musical Instrument System
An Interactive Fiction by David Cornelson
Release 1 / Serial number 250925 / Inform 7 v10.1.2 / D

Band Room
The band room is filled with music stands and instrument cases. Afternoon light filters through tall windows. A sturdy wooden chair sits in the corner, and a music stand has been set up nearby.

You can see a chair (on which is a trumpet) and a music stand (on which are a jazz standard, a beginner etude and a fanfare score) here.

>test me
(Testing.)

>[1] examine chair
A sturdy wooden chair, the kind typically used by musicians during practice.

On the chair is a trumpet.

>[2] take trumpet
Taken.

>[3] play trumpet
You play a bright fanfare on the trumpet. The brass instrument's clear tone rings out beautifully.

You notice there's some sheet music nearby that you could play from.

>[4] take jazz standard
Taken.

>[5] read jazz standard
This is sheet music for "Blue Moon Serenade", a intermediate-level piece. The notation indicates a smooth, swinging jazz melody with syncopated rhythms.

>[6] play trumpet from jazz standard
You carefully follow the sheet music and play "Blue Moon Serenade" on the trumpet. After a few tricky passages, you manage to perform the piece reasonably well.

>[7] drop trumpet
Dropped.

>[8] play trumpet from jazz standard
You need to be holding the trumpet to play it.

If I were doing this, I’d articulate the kinds of sheet music and their descriptions and I’d update all emitted text to my liking, but the code is going to be the code whether I write it or Claude does.

This goes back to my old Textfyre days of designing a story in a document first. I probably could take the old Secret Letter doc and produce the foundation of the game. It would need help, but I could cycle through the design process and when the story is 90% complete, I could run through for text touch ups, testing, and completion. What took a year back in 2009 could take a couple of months (or less) today. Mike Gentry’s words would still be there. The logic would still come from our design.

1 Like

I’m sorry, but is this meant to be a positive example of textual content? I feel like you’re arguing against yourself.

10 Likes

Why then would I ever want to play this game that you did not write and which is not an expression of human creativity? If you didn’t invest the time, or feel passionate enough, to learn the craft that it takes to create it, or even if you know the craft but choose not to exercise it, I certainly wouldn’t have any interest in consuming it.

Writers should care about their craft, every aspect of it, every word and letter of it, because that is the essence of artistic creation. Good artists sweat over every sentence and word choice. Anything less is not art, or not good art, and believe me I know what a huge can of worms I’m opening by trying to define what is art, but I believe that is the fundamental question underlying the entire debate about whether to AI or not to AI. If you don’t care, why should I?

18 Likes

Because I like the process? I don’t know Inform, but I like programming (although I’m not very good at it) and I like problem solving. Figuring out how to code a trumpet is half the point of this thing I do as a hobby. And this is a fairly basic example of a task one might do in Inform which I would absolutely not want to skip over were I to try and learn, because there will be complex edge cases where the best AI assistant in the world will choke and to fix that I’d want to understand the absolute nuts and bolts of how the system works.

Again, it’s unlikely IFComp will ever be able to police the use of AI coding assistants without quadrupling the volunteer workload but this is antithetical to how the whole community operates. This is a small space devoted to a niche art form that was last commercially successful thirty-odd years ago - almost everyone here is doing it for love of the craft.

16 Likes

It feels good to care? Because it feels really good to make things and understand how they work, and it feels good to look at something that, sure, is imperfect, but that you (not a probability machine LLM) made yourself. Sorry, I truly can’t fathom asking this question, I’m honestly gobsmacked. “Why walk around the park when you could drive your car around it”, “why knit a blanket when you could buy one”, “why color a picture when you could google image ‘pictures of flowers’”,“why make a cake when I could buy one”, “why play an instrument when you can use spotify” like… arguments about quality or Art aside, why would I outsource all of my creative effort when it’s one of the things that makes me want to keep living? Man, if I can’t find joy in making my silly little garbage from the ground up, why am I even here?!

20 Likes

Code, not words. I agree every letter of the story should come from the author. Every piece of logic should come from the author.

But we find bits of code all over, use other people’s libraries. Using Claude seems no different to me. It’s just code generation at its root.

2 Likes

High code quality, but every bug still in there is in there because nobody has been able to find it yet, and the LLM will be learning to replicate those hard-to-find bugs as well.

I think you vastly underestimate how much source code you’d be buying to train an LLM.

Oh, they certainly have, but they have more to gain from just scraping code for training left and right, because again: an LLM requires a lot of data, and training is a process of seemingly-endless hunger. It’s a nice idea, but I don’t think it would pan out in practice, and I think OpenAI and Anthropic are incentivized to throw everything they can at the model to claw their way to the top.

I’m noticing a pattern from you, more and more, as you post to this forum. You seem really driven to put your name on some creation, but you also seem fundamentally repelled by any creative process, so you put a lot of effort trying to figure out ways to replicate the act of creating, without actually engaging in the act of creating. I have never before seen someone who wants to be a software engineer and writer, yet is so incredibly allergic to coding and writing.

Writers care because we are writers. That’s why we log in. That’s why we show up. It’s simultaneously baffling and fascinating that this seems to be an alien concept for you, but I’m also happy that there’s an entire forum of coders and writers who are willing to share insights and explain this drive. Hopefully you come away from it with a deeper understanding.

I feel like there is a huge difference here, because a library is code that was specifically written and packaged for other people to use, and is structured in a way to be ergonomic, organized, and optimal.

The output of an LLM is a statistical path between keywords, based on uncountable pages of code, from many different sources, magnitudes of stability, and magnitudes of production-readiness. Most of it was not intended to be statistically fused together like this (or at all), and the output is typically myopic – to describe it generously – and exhibits a kind of “anti-refactored” quality, which makes bugfixing and optimizing an absolute nightmare.

Trying to compare such output to libraries was a wild and confusing choice, and I’m trying to figure out why you linked the two in your reasoning.

15 Likes

I’ve been writing code for 40 years. One of the first programming jobs I had was at a service bureau where they hosted apps for clients on a PDP-11. They also invented what was then called a 4GL which was essentially a code generation tool to speed up development.

My entire career has seen programming tools that speed up the development process. I’ve written my own object-relational-mapper platforms.

I have written plenty of IF from scratch, so I know where the creative bits are and where the cold code is. A for-loop is just a for-loop. An if-then is just an if-then.

Using GenAI for those elements of code that do not require storytelling creativity are no different than all of those 4GLs and ORMs.

One could argue that Inform 6 itself is leap forward from ZIL and Inform 7/10 is a leap forward from Inform 6. Twine is leap forward from Inform/TADS/Hugo.

We have always been making tools to build IF easier and faster.

And I know some will conflate writing the story with writing the code and I agree that some code is creative. Would it be possible to reproduce Counterfeit Monkey using GenAI. Probably not, so I get the argument that some level of IF can only be done with human direction, but CM is the edge case, not the common case.

I hold up Textfyre as an example. Both published games were written first and coded by a third party that had nothing to do with the design. Was that crossing a line? I did not code Secret Letter or Shadow and neither did the writers (though Jon did clean up the code for Shadow to get it out the door). It’s hard to hold Secret Letter up as anything exemplary, but Shadow in the Cathedral is an excellent IF story. It was designed and written in a Word document and coded by a third party. I proved that the programming is secondary. The design, storytelling, and writing are what matter.

2 Likes

To be crystal clear at this time: While I do wholeheartedly believe coding is a creative endeavor on par with painting and writing, that is not what compelled my previous reply, and it’s not the argument I’m making here. However, this might be a response to previous posts, in which case I’ll sit down.

My point was that an LLM writes functional nuggets, one at a time, but it also writes the bugs we’ve missed in the training data, and companies have had to throw as much training data as they possibly could at these models. Even in the best case, a group might curate the training data, but these bugs are – by nature – hard to find, and the single-task structure of LLM output makes these even harder to find than normal.

When a human team structures and plans a large project, they are structuring, implementing, and refactoring to minimize places where bugs can appear and also maximizing the effect of future bugfixes. A human team certainly is not perfect, but they prepare the code to smoothly shake off imperfection, once it’s discovered in the future.

No, this was collaboration with another group of people who were intelligently structuring and implementing code for a whole project. This team was not just stringing together a statistical chain of keywords from any scraped codebase within reach. The difference between generation and collaboration is pretty crucial.

10 Likes

Is this meant to be a good example? This code is a mess, and I’ve already identified at least a half dozen bugs in it.

Would you really prefer to spend your time going “wait, there’s a held-by-player property that has nothing to do with being held by the player, what does that property do, is it even checked anywhere” than just writing the code yourself? To me, that sort of debugging is the worst part of writing code, and I spend a lot of my writing time trying to minimize that sort of thing (by e.g. having meaningful variable names, and certainly not misleading ones).

16 Likes

Bugs? It compiles and works as I intended and was done in 15s. Besides, IF is one place where code quality isn’t required.

“It compiles” is really not strong evidence that there are no bugs. “Works as intended” is a strong claim to make without more extensive testing, especially when it seems to have inserted features (e.g. difficulty level of pieces) that were not requested in your original prompt.

13 Likes

speaking from the perspective of a player/beta-tester and not a programmer, if you had compiled the generated code and asked me to play/beta-test it, what do you expect will happen?

6 Likes

I suppose you can just ask the AI to maintain the code as well and not worry about technical debt. But it’s repeating itself and generating tons of unnecessary code—it writes a full “playing” action, for example, which sets variables that are never checked anywhere…but that doesn’t matter, because these rules will never actually run, because it then reimplements the whole thing in an Instead rule. EXAMINE SHEET MUSIC and X SHEET MUSIC will give different descriptions, because it wrote a new action to replace examining when the noun is sheet music, but forgot some of the synonyms. You can also EXAMINE SHEET MUSIC in the dark, but can’t X SHEET MUSIC in the dark, because it forgot some of the conditions on the examining action. It re-checks conditions that the parser already enforces, like requiring objects to be in scope to act on them, so some of these check rules will never execute. Other conditions are checked multiple times, like attempting to “reading music” something that is not sheet music, with different failure messages each time. It uses magic constants instead of enums, and has extra variables that are only checked if other variables don’t match any of the magic constants, for no reason I can see.

If someone submitted a pull request with code like this, I would reject it as an unmaintainable mess. The fact that it took me many times longer to find the bugs in this code than it took your LLM to generate it is not a good thing. There’s a reason I compare LLM slop to a denial-of-service attack!

23 Likes