In this situation, it’d usually be preferable to post here asking how to do this. For your specific example, I (as well as others here) could easily tell you how to do this. We’re a very friendly and helpful bunch, honest!
From my limited understanding/experience, this is how I see AI and its capabilities in broad use. I’ve attached a robot overlord threat rating to each point. This is totally scientific, by the way.
Story → Seemingly creative on the surface, but zero depth. Anything resembling an actual plot is garbage.
Threat Score: 2/10
Programming → Great for specific, defined functions, but not good at all for interface or game mechanics.
Threat Score: 3/10
Audio → Good for ambient music that’s not trying to achieve anything specific. Threat Score: 4/10
Graphics → Great for generic character portraits, cover pages and mashing things together in a weird, surprising way. In any other capacity, it often does something unnatural, undesirable, and off-putting.
Threat Score: 5/10
Two disciplines, writing and coding, are required to make interactive fiction games. (Some tools require less coding, but all require some level of “programming” skills.) Because AI doesn’t understand the big picture of things in those aspects, AI is considered more of a tool than a magic wand. A great story or cool game mechanic is a reflection of the author’s skill.
However, in the realms of graphics and audio, it’s rare to see these aspects done with any level of expertise in IF competition. IF appeals to writers mostly so when music and graphics are added to enhance an IF entry, it’s a big deal. And most participants are just hobbyists. Because graphics and audio are not dependant on the story prose, they typically don’t require any skill to integrate. And I think this is the crux of the concern about AI in IF competitions. It’s easy for an author to put in a graphic that they didn’t make, but the impact on our player brains is undeniable.
We like graphics and music. When all the sensory neurons are firing at once, it’s more engaging. A great story, shown in a pleasing font in plain text… doesn’t hold a candle to the “full meal deal” of engaging graphics and music to put you in the mood. The cover of a book can sway a reader’s interest, to be honest, so what chance does a great story have against a competent story with all that sensory candy… that they didn’t even make themselves? Even with AI being allowed in competitions, can a reviewer be impartial to the AI generated candy and judge the work itself?
This is the fear, I believe. Is it reality? I don’t think it has happened yet where AI generated content has swayed the judging process in any consequential way… but the fear is real.
My take is that AI can serve a purpose, but it can’t elevate a game beyond it’s foundational design (the human engineered part). If someone wins an IF competition with AI content (with the capabilities that AI has today), I think we have to admit that they must have made a great game… and that making a game can also be akin to directing a movie, while relying on the talents from other sources.
Also, the reason there are multiple categories of movie and music awards is that there is so much diversity in creative fields that acknowledging great artists requires more than a “best overall” award. The idea of having a separate category for AI assisted game authoring isn’t such a bad idea. It might put an end to the arguments and assumptions being made. “And the award for best image generation prompt input text goes to…”
Something has to be allowed to happen in order for people to recognize the benefit or detriment AI has in competition. Otherwise, it’s just speculation… from both sides. I’m glad AI is being allowed, even though it doesn’t interest me personally in game creation.
Anyway, just food for thought.
I actually find it incredibly aggravating that videogames include music and or audio cues for combat, generally. I will always mute both the OST and any environmental noises (footsteps, weapons, weather, etc) if it’s an option, as I am typically listening to a documentary or video essay on X2 speed in the background.
I only begrudgingly turn on audio for IF in particular when the author explicitly tells us it’s a crucial part of the experience, and even then I typically mute it after deciding I don’t want to listen to squeaky doors or narration. I am also generally listening to my own, unrelated music, and it’s irritating to have to turn off my nightcore temporarily to disable a game’s tunes. It’s especially annoying in IF, where I expect a purely textual experience typically. For visual novels there’s a little more leeway, but I have to be in a particular mood to play those.
@anon66621404
I agree with you. I only mentioned music because it’s another possible form of expression. If the graphic art in a game is poor, I find it distracting. If the music is poor, I find it annoying. And I’d rather be distracted than annoyed.
I have no idea how you can listen to documentaries while playing a game. Seriously, women can multi-task circles around guys. My mother has you all beat hands down though. She would be in her chair, with the TV or stereo on (sometimes both), with a book in her lap… while sleeping. Turn off the TV, change the channel, or even lower the volume slightly and her eyes would snap open as she declared that she was watching/listening to that. Not a word of a lie.
I think music can be done well in IF, but it needs different rules. I’d test the “new rule” waters by playing ambient music when you enter a new area, but fade the music away and let the story prose breathe again. I’ve never played an IF game that approached music with such subtly. It’s usually a looping track that I mute very quickly. Perhaps there are some IF recommendations out there in that regard.
My understanding of the guidelines—which, to be clear, I think could do with clarification—is you have to credit it if it made a substantial creative contribution to the work. A spellchecker noting that I said “accomodate” instead of “accommodate” isn’t having a creative contribution because its purpose is to do pretty much the same thing every time; if I type “accomodate” again, it’ll correct it to “accommodate” again. My input was the creative part, not the spellchecker’s processing of it, even if it does use machine learning under the hood.
Similarly I generally wouldn’t credit Google Translate for minor things—if I ask it for the French word for “book”, it should ideally say livre every time, that’s its job—but if I passed an entire work of IF through Google Translate and used its output, I would mention that.
I have the duty to inform you that the handling of the rule, following a report from me, has changed and now the note appears under the cover image separating the uses of generative AI for the creation of the game from those for the “marketing material”
I don’t multitask while playing games, but if a text game has visuals/audio, that’s often a strike against it in my book – unless the visuals/audio are integral to the gameplay and/or narrative. If those elements could be stripped out without damaging the game’s core meaning? If they’re window-dressing? I consider that a weakness. I want every element to be necessary, not just “nice to have.”
I admit that I’m more of a stickler about this stuff than the average player. But in a conversation about the dangers of AI, I wanted to chime in. Adding AI-produced multimedia to a game would not automatically make this player rate the game higher. I’d still use the acid-test: “Does this multimedia need to be here?” Usually, the answer to that question is “no.”
I’ve read messages from people who specify they like to listen to their own music when playing IF which is normally silent, and while that makes me sad since I love including a soundtrack and incidental sound clips for vibe, I understand and the audio in IF games should not be required.
In fact Mathbrush said one of the reasons he liked CV so much is “he was listening to sad music” when he first played (that game is full of sad and menacing music anyway!)
Elizabeth Smyth in the podcast review of RSPM said several games were open in browser windows when judging and she ended up playing another game to the groovy synth wave score of RSPM which set a completely different vibe!
I don’t normally listen to music while I’m reading books or playing text games (or writing anything myself). But if a text game has a soundtrack, I’ll listen to it, since I try to meet games on their own terms. This tends to make me like the game less, however, since the soundtrack is typically irrelevant on a ludonarrative level (to pull out the snobby technical term). I value ludonarrative harmony more than almost anything else.
One reason I wrote the game What Fuwa Bansaku Found was to incorporate visual artwork into a text game in a way that felt necessary to me. sub-Q strongly encouraged authors to add multimedia to games, but their multimedia elements often felt like window-dressing. That’s how I feel about multimedia in most games. So I’m not too concerned about AI-generated “candy.” (Well, it wouldn’t be great if this so-called candy started clogging up a bunch of games, I suppose. But that’s another issue.)
I feel the same way about games that have lots of superfluous images. I don’t need to see pictures of everything, but the occasional “feelie” like a map or an image of a scribbled note popping up can be a fun extra.
Definitely. Often, there are bad implementations of music in IF games. This is my reaction in most cases too. We’re mostly hobbyists, after all. If a game looks like the author has an artistic eye, I give it the benefit of the doubt… with my hand on the volume control.
This raises the concern of graphics for graphics sake (and music for music’s sake). This is an amateur mistake and we’ll see/hear quite a bit of it as AI makes it easier to create additional media for IF. As I said before, bad graphics can be distracting, but bad music is annoying.
We might start seeing more! Or not. It’s a risk, but I’m still not too worried. At least, not right now. Maybe I’ll eat my own words after IFComp…
Whether multimedia is made by AI or humans, I also don’t want to sound like I’m totally against it. Cygnet Committee is a fascinating game with heavy multimedia-use. If an author like P.B. Parjeter has ideas, I’ll try 'em out. It’s just rare for text games to leverage multimedia effectively – for my tastes, anyway.
9 posts were merged into an existing topic: Designing for Accessibility (Happy Disability Pride Month!)
I am once again begging people to learn the difference between generative neural networks, procedural generation, and algorithmic generation. The ai-moderation/banning conversation always goes in circles and at least one of those reasons is cuz someone’s gotta try and point out that microsoft word spell checker and google translate circa 2013 are in the same category as chatgpt cuz a computer did a thing automatically. they are not.
I think if IFComp were to clarify they mean content you had purposely generated by neural networks (so none of the “but what if i read an article that was written by ai and I didn’t know probably!!” gotchas) that might help.
one might even add “and put into your game for players to directly experience”. (I personally don’t think neural networks should be used at all for these purposes, not for inspiration or a writing boost or an old “chum” to talk about your text with, because ethically they are horrifying. but nevertheless)
Could you maybe do a splaining thread about this? Like for dummies?
This is not true — unfortunately, you’ve fallen for the religious faith that a certain subsection of Silicon Valley tech bros seem to have in the inevitable coming of the singularity and their messianic fervor that this particular technology will bring them to that final end. But even a priori, attempting to linearly let alone exponentially extrapolate a trend is just folly, and the reality, even according to AI hype master in chief Sam Altman of OpenAI, is that the rewards for scaling up models and increasing the amount of data we train them on are already diminishing: OpenAI’s CEO Says the Age of Giant AI Models Is Already Over | WIRED. I recall there being another study (than the one at Sam Altman points to show that there are probably diminishing returns, which I’m not linking to directly because it is full of nonsensical AI hype) that talked about this as well, but I can’t seem to find it at the moment. What’s more,
- AI’s “emergent” capabilities fail to replicate: [2304.15004] Are Emergent Abilities of Large Language Models a Mirage?
- The appearance of the planning and reasoning that large language models exhibit is by far mostly due to having such a huge corpus of training data that they are likely to see any problem you give them multiple times in the training data, and any actual reasoning skills that have are highly limited to specific tasks they’ve seen a lot before: https://arxiv.org/pdf/2403.04121 [2307.02477] Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks
- The claims of large language models being able to pass it. Exams are vastly overblown to the point of absurdity: Notion – The all-in-one workspace for your notes, tasks, wikis, and databases.
That’s not to even mention the fact that these models will eventually face “modal collapse,” as more and more of the internet becomes AI-generated garbage and thus more and more of their training data becomes AI-generated garbage which leads to inbreeding that renders these models worse than useless. AI models collapse when trained on recursively generated data | Nature
The existence of modal collapse as an inherent feature of AI — as demonstrated by the paper above mathematically — points to a deeper problem, one you can see a statistical statement of directly in that study’s mathematical intuition.
The problem is that you simply can’t make these large language models better than humans at creativity by simply scaling the amount of compute and data you throw at them forever or even making tweaks to various aspects of their algorithm because the way that they fundamentally operate, as I’ve explained here before, is fundamentally inimical to creativity at the most basic conceptual level. The way large language models work is that they accuse the most likely word to follow each of the several thousand previous words that you give them, including in the previous words that they have outputted, based on the likelihood of various words following a statistically similar array of words in the text corpus they were trained on. What that means is that they will always give you the most average, most uncreative, most common thing imaginable. They give you the arithmetic mean of all human writing and that’s it. Even any creativity and uniqueness you input into the system will be gradually erased as it proceeds to regress to the mean. This is a fundamental property of the nature of the algorithm that these things use. You can’t hand wave it away by just extrapolating out past progress. That’s not how this works.
This is just like the problem with hallucination in AI. Everyone who is a fan of AI seems to blindly hope that if you throw more computing power at it or bolt on control mechanisms after the fact you can somehow solve this problem when the problem is fundamental to the algorithm that makes these models function ([2409.05746] LLMs Will Always Hallucinate, and We Need to Live With This)
The question of whether a reader could discern the difference between AI-generated content and human content is actually immaterial. It serves largely as a red herring that is argumentatively effective only in that it redirects the burden of the debate back towards the person who was against AI because they now have to defend something instead of you having to defend something. In essence, it is a dodge.
I don’t think anyone on my side of the argument is arguing that the capabilities of large language models and humans are purely disjoint, such that one can always tell which is which. What we are arguing is that all humans have within them a large range of creative expression that they are capable of with a little application of practice and effort and thought, whereas large language models are capable of a very small band of that range of expressivity, the most boring and average section, by the inherent nature of the algorithm that they are built around.
I’m quite happy to say that large language models writing is indistinguishable from human writing… bad human writing.
The point is that by the nature of the algorithm used to generate text in large language models, what they will always produce is mediocre and (usually above the scale of the paragraph) incoherent and repetitive writing. That bad writing might be indistinguishable from the bad writing of a highly educated but very stupid human, but that doesn’t matter. What we are trying to defend against is the introduction of machines for generating copious amounts of brainless mediocre writing into a space dedicated to artistic creativity and intention.
Why don’t we also ban mediocre authors who turn out prose thoughtlessly without an inkling of a unique or creative idea or spin on a previous idea and without any attention to or attempt to develop their own craft, then? Because humans are capable of a broad range of reasons for writing and that would be impossible to divine of a human. But here we have before us a machine that is ontologically designed to be that kind of writer and we have before us the opportunity to preemptively ban it without having to worry about the thoughts and feelings and possibility of artistic development of a real human writer.
Or alternatively, imagine that there was a single, terrible, mediocre, thoughtless, uncaring writer that was offering for free to write copious amounts of prose, more than any human could really write in a similar amount of time, for anyone who asked. A person so mind-bogglingly productive that they have probably produced 30% of the pros on the modern internet at this point. I think if there was a person like that, and it seemed likely that a portion of people who wanted to participate in interactive fiction competitions planned to use that person for all of their prose writing instead of doing it themselves, then we would definitely ban that person.
I’m not gonna be the best at explaining this but:
procedural generation
Hanon explains this great.
Everything on the below site is procgen, but the descriptions are the best example. https://www.fantasynamegenerators.com/#descriptions
With procgen, you know the input cuz you created it, and you know the exact rules and process for getting it to its result.
algorithmic generation
Probably not exactly the right word for this, but. Algorithms are essentially rules to systematically do [a thing] on data even when unpredictable. If i ask you to find the number between 1-100 that I’m thinking of, asking “is it 1? Is it 2? Is it…” And so on until I say yes is a very shit algorithm. A better algorithm is “is it above or below 50?” and if I say above, “is it above or below 75?” etc, since you can accomplish the same thing (guessing the number) with less effort, but they’re both algorithms.
Even when the input is unpredictable (whatever number is in my head), you know the exact rules and the process for getting to a result (the correct guess).
Markov chains can produce very ChatGPT-like output based on their complexity but they are not neural networks.
natural language processing and generation are what ms word spellchecker (minus copilot) and google translate circa 2013 are. You may notice that neural network large language models (like chatgpt) are a specific subset of NLP.
Neural network generation
These are incredibly hard to explain, but basically neural networks are bunch of computer programs in several connected “layers” doing complicated math on a very large set of data.
this first layer’s calculations are designed more or less by humans (hence why we know how neural networks function, as seen by alexis’ papers) but as the data progresses through the layers, the computer starts calculating (“training”) on its own results. after these calculations are done, the computer programs have essentially a custom-made algorithm of their own to use on input and no one knows the rules to this algorithm. it’s a black box.
then when you give some variable input (like sending a message to chatgpt) it takes it and spins it around in its enormous mystery algorithm and spits out output in the way that alexis described.
so the input isn’t predictable and the result is reached by exact rules and processes that are by design unknowable.
this is just the technological differences, and only to my understanding – I am open to corrections from other knowledgeable folk. neural networks are different in many other respects regarding ethical usage (which there is very little of). as I’ve said elsewhere, the science of neural networks is fascinating, but the datasets they train on, how they train, and what they’re used for, are all rotten to the core.
I think this is painting with a bit too broad a brush, for what it’s worth. Neural networks are used for all sorts of things, and the large language models and image generators now being branded as “AI” are a tiny fraction of it.
Pretty much any optical character recognition, for example, relies on neural networks—one of the inspirations for modern neural networks (specifically the “convolutional” ones, CNNs) is the structure of the human retina, and it’s almost impossible to do any sort of “recognize things in images” work without involving neural networks in some way now. Same with speech recognition. Same with most large-scale search engines like Google.
The problem with neural networks is that, well…as you said, they tend to be utterly opaque as to how they work, so as soon as they’re brought into a subject where accountability is needed, there’s a problem (e.g. translating in high-stakes court cases where someone’s asylum application gets denied because of a grammatical error). And since the best way to improve them is to provide more computing power and more training data, the biggest ones (like ChatGPT) use huge amounts of unethically-harvested material. But an optical character recognition system probably had its training data generated in perfectly mundane ways: it’s not hard to generate reams and reams of gibberish in different fonts and levels of distortion.