This may be important for other people, so I’ll pass it on:
I clarified with IFComp support today that translation with AI is explicitly not meant in the questionnaire. If the texts are your own and you have only translated them into English via DeepL or something like that, then you don’t need to check this option in the questionnaire: “Was used to generate text used in this game (use of spell check/autocomplete excluded)”
(Wow, I’m an edge case again. How do I always do that?)
fair critique, and I agree with your given example. it’s hard to speak about llms and neural net image generators like midjourney and put them in the same category and avoid the term “ai”.
Yeah, that is the hard part. The best I have is that both ChatGPT and Midjourney are intended to be used creatively—the whole point is that you can give it the same input a dozen times and get a dozen totally different answers. If I could scan the same document a dozen times and get completely different text out, I’d say that’s a horrible OCR system.
And that’s where the harvesting of training data becomes such a problem, because if a font I created is used to train OCR, that doesn’t really have any impact on my ability to make fonts and get credit and royalties for them; if anything, it means my fonts are more useful because scanners can deal with them better. But if someone uses my books to train a system that’s meant to creatively generate new books in my style, well…
I agree with the general distinction you’re trying to make here between generative AI and stuff like OCR, but one drawback of ChatGPT is that it will spit out different responses with the exact same phrasing and imagery from a range of broadly similar prompts. i.e. it’s even more hackneyed than you’d think (or it engages in a form of self-plagiarism), so that not only is it a poor coauthor, some large part of any response is usually reproducible.
I think that for the purposes of consistency, such as for an OCR text, the two sentences “the beautiful lilacs softly bloomed in the purple twilight” and “the violet dusk blossomed gently with pretty lilacs”, while the same semantic content and details in a mildly altered order, are still completely different sentences.
I think that’s over-generalizing. Ther are some usages that don’t seem to have any ethical concerns. I’m reminded of this very simple, but very cool example done in Minecraft: https://www.youtube.com/watch?v=DQ0lCm0J3PM
It’s essentially the worst of both worlds. The quality, accuracy, and even coherency of a large language model’s response to a prompt can vary wildly based on tiny perturbations in the phrasing of the prompt that bear no obvious correlation to the nature of the output, in much the same way that the output of a pseudorandom number generator varies all over the map from the input number you give it. But at the same time, you can be assured that no matter what you input, whether it be wildly different prompts or similar but meaningfully distinct ones, or the quality of the output it produces, whatever it does produce will be tediously repetitive of everything else it has written, such that you can neither really predict the quality of the output you will get, nor can you count on it to actually be creative across the whole corpus of outputs you get from it.
It’s essentially like engaging in some kind of thaumaturgy where the means by which you summon beings and the beings so summoned are both soundly beyond human comprehension… and also mind numbingly boring.
This is why prompt engineering resembles alchemy so strongly: ostensibly intelligent people being sucked into a sort of highly rationalized folk superstition, piling ever more epicycles onto the increasingly baroque theoretical superstructure they are using in order to convince themselves that a process that fundamentally cannot be rationally controlled is in fact under their control or could be controlled with just a little bit more effort, all because they are intelligent in a narrow sense, but refuse to go through the effort of applying that intelligence to actual critical thinking on a meta level about the tasks they set their intelligence to.
We talk about AI generated graphics and music, but we typically don’t recognize animation as a distinct form of expression. Here’s a fun video outlining AI from that perspective.
A spellchecker noting that I said “accomodate” instead of “accommodate” isn’t having a creative contribution because its purpose is to do pretty much the same thing every time…
Yes, for a simple spelling checker, that’s true. But it could get fuzzier with more advanced proofreading software. Suppose it read the whole paragraph and said, “The correct spelling is ‘accommodate’ but a better word here might be ‘tolerate’.”
My goal is not to propose where the lines should be; I honestly don’t know, and I’m not sure I even have an opinion. All I’m trying to do is to highlight that it might be more difficult to draw hard lines than many might assume.
Maybe that’s okay. Maybe we don’t need hard lines because most competitors are aligned on the spirit of the rule. But it still might be worthwhile acknowledging that the lines are blurry.
This is by design. LLM’s have something called “temperature”… if you set the temperature to 0, then you will always get the same answer. However, this is not desirable if you want an chatbot that “sounds human like” and generates prose. A higher temperature makes the LLM more creative by increasing the randomness of its response.
When people refer to the “parser”, they are usually referring to a combination of three things:
converting text into a syntax tree
semantically resolving the syntax tree against the world
executing the semantics
AI could do (1). If it were to attempt 2 or 3, it would also have to model the game state. A feat not suited to AI, because AI has quite limited state.
Nevertheless, I’ve have seen attempts at making AI generate JSON for state, which is then stored in a “database”. Something along those lines might be possible, except (2) would somehow need to query that database/world model.
As long as someone is clear up-front about what they’re doing, then go right ahead! As long as the competition doesn’t explicitly ban LLM generated content. If it does I’d suggest talking to the organizers on it as the bans are usually focused more around text and images.
That said, several AI-based parsers have already been developed and to my knowledge none of them have been any more than an exercise in frustration. (One member of the IF community has been working on the task with none of the results rated higher than 2.5 stars on IFDBb; and meanwhile the AI-powered 2023 rerelease of The Portopia Serial Murder Case is widely agreed to be worse than the original 1980s game.)
The real trick is finding a way to make an AI parser actually improve the experience, which as @jkj_yuio pointed out is not a trivial task.
I’m very much in favor of asking authors to disclose use of AI in competitions, because I would like to be able to ignore those entries. I have no interest in playing those works – there are enough human-created works to satisfy me! Of course people can lie, but I’m willing to believe them unless it’s been shown that I should not. If more and more people start silently using AI, then I might have to switch to only playing games that state up front the author did not use AI, which would be a little annoying, but again, I don’t feel in any danger of running out of games…
I think the one example where at least some folks thought the LLM-parser approach kinda worked was this Zork modification, so it might be worth checking out for folks interested in these areas.
I’m not sure if that project is more than theoretical (or at least not publicly available). Later in that same thread I mentioned a seemingly different one I found at https://newzork.ai/
That one at least didn’t make the experience worse, which is the usual outcome when AI is involved.