Proposal: An IF LLM Chatbot

DavidC · December 21, 2024, 3:01am

I’ve mentioned this in recent threads and wanted to get a temperature of what everyone thinks is possible.

My idea is an extension of all of the public IF resources. Train a large language model on source code, hints, walkthroughs, designs, theory, discussions, and anything else publicly available.

The model would be exposed through a Chatbot hosted on AWS and freely available. The model and its training code would be uploaded to the IF archive with an explicit non-commercial use only license.

Training data would be culled from sources explicitly not restricted for such use. This is where I get fuzzy on what can and can’t be used for this. If it’s in the archive or the source is on IFDB, that would seem to invite experimentation, as long as no one is selling anything. The one area I would be concerned about is the source for Inform, TADS, and other platforms. Are those off limits?

The resulting Chatbot could be extremely useful to the IF community. With RAG enhancements to external web pages, it could be a starting point to learn about IF, to get help with code, and other non-standard questions.

Thoughts?

I have not presented this to the IFTF board because I think this requires wide transparency.

BrettW · December 21, 2024, 3:43am

My feeling is that this is magical thinking.

In that you think you can get away with the same strategy used by big companies with lawyers and good will to burn.

In that training such a thing will provide anywhere near the results that you’re hoping for.

In that hosting such a resource will be trivial cost.

I’d rather the effort go to good old fashioned “making a static website with tutorials” which is more likely to help someone and endear you to the tiny community that is interactive fiction. It’s not as hype worthy, but it’s cheaper, less effort and more likely to work.

Encorm · December 21, 2024, 3:46am

I would be EXTREMELY leery of training this on people’s individual work without explicit permission. Not only is this a legal grey area but the point of the IF Archive (and IFDB, to a lesser extent) is documentation and preservation. The IF Archive’s TOS does not currently say anything about GenAI scraping as far as I can tell, but I suspect if they were to change it to specifically allow it you would see authors pull their work down or decline to submit future work. People (including me!) are in general very protective of the stuff they’ve put hard work into, so I think training it on material that the author(s) have explicitly opted into is the best way to do this without stepping on anyone’s toes. That includes forum posts and source code (since the motivation behind releasing that is generally to educate other users directly and people will likely take offense if it’s ingested without permission by an AI).

I am generally an AI pessimist so I don’t really see what the advantage of an LLM-powered chatbot is over simply starting a conversation with the users of this forum, so that’s something you’ll also have to articulate very clearly if you want to proceed with this. The people here are wonderful and the personal element here is the best part of it.

DeusIrae · December 21, 2024, 3:49am

At the risk of being a Luddite curmudgeon, this doesn’t seem like a great idea to me - beyond running right into this community’s sore spot around LLMs, it also seems like we’d be worse off long-term if folks interested in IF consult chatbots rather than engage with each other and the broader community.

DavidC · December 21, 2024, 4:02am

This was my original thought. Start some kind of file dump where it’s 100% opt-in, add tags to IFDB and ifwiki.org where people can opt-in.

As for the AI-biases, I get it…the models can 100% provide false information as positive information, can offer other people’s IP as examples. It does get fuzzy. But it also can provide a significantly better experience than current platform docs and async conversations where speed-to-answer is concerned. I’d have thought ChatGPT had pulled ifarchive in somehow, but I think that’s unlikely:

Have you heard of the game Cattus Atrox?

Yes, I have! Cattus Atrox is an interactive fiction game by Michael Gentry. It's a horror/suspense game where you play as someone investigating strange occurrences on a farm, particularly involving cats.

That response is a complete fabrication.

I do believe LLMs are useful, but I also know from extensive experimentation that they are very cranky.

As for cost, that’s unlikely to be a problem. This model wouldn’t come close to actually being “large”. A low-end gpu would likely be sufficient to host it.

jmspring · December 21, 2024, 4:15am

Simple reply, most in this community don’t expect their content to be scraped. End of discussion, Period.

Followe up discussion - I’d like to do this, can we for a pool of volunteers…

pbparjeter · December 21, 2024, 4:17am

Are you aware of this thread that popped up a few days ago? I’m not sure how much of your goals overlap.

jmspring · December 21, 2024, 4:17am

Further, there is nothing technically stopping anyone from this. Licensing/authorization aside and I don’t think intfiction has an updating ToS on this. @Admin1

Draconis · December 21, 2024, 4:21am

To put it bluntly, I release my source, hints, walkthroughs, and so on to be used by humans, not to train LLMs. I’d be open to an experiment like, say, training a ReNN on a whole lot of Inform 7 source code and seeing if it could eventually produce something compilable. But I have no interest in my work benefitting chatbots like ChatGPT, especially if the intent is to replace the discussion forum I dedicate so much time and effort to.

Encorm · December 21, 2024, 4:23am

Glad to hear it would be strictly opt in! I think that’s the only way to handle it while being respectful of people’s work.

That said, I’m still not seeing what the added value would be here? The IF community is small and most of the relevant info a newbie would need can be found on three websites (this forum, IFDB, and the Archive) plus the Wikipedia page for Infocom. What benefit would this chatbot provide versus searching through the existing and relatively centralized resources?

DavidC · December 21, 2024, 4:27am

An outsider who didn’t have standing in the community could get away with it, but I’m definitely not that guy. I just view this as an additional feature of IFDB or IFWiki and “not that big of a deal”.

But I tend to like to push technology. The copyright abuses for images and text are pretty bad where OpenAI is concerned and that’s been the major obstacle in getting a lot of people onboard. Vector databases with NLP frontends are an interesting tech. It’s sad it started out as pure abuse (likely illegal too).

Had there been an opt-in start with more response validation, we might be seeing a different perspective.

If it’s not obvious, I was just asking for “feelings”. I think this is an important discussion, just because these are the times we live in. I could 100% see a game company privately scrape every bit of IF they could come up with to help design new games using vector database tech. We’d probably never know.

DavidC · December 21, 2024, 4:31am

That would be part of the experiment. Claude already has a pretty deep knowledge of IF. I think one neat thing would be having all of the hints and walkthroughs in a chatbot. You’re stuck in a game and can go to one place to get a hint using natural language, even specifying limits so it doesn’t say too much.

mathbrush · December 21, 2024, 5:27am

This is an interesting question on a few different levels. I’m going to give a really long and rambling answer but feel free to completely ignore it.

I’ve noticed that my own personal reaction to the ethics of scraping and AI use depends on the environment I’m in and the field.

For instance, I teach International Baccalaureate classes, and they’re AI positive, as long as students cite their AI use and expand on it. One of our highest-scoring students in recent years on the Theory of Knowledge essay had some troubles with English, so used AI to rewrite their essay from their native language/way of writing into more fluent English. I’m moderately leery of this but prefer open use to hidden use. I’ve been putting traps on high school tests where I ask graduate level math questions (like “How Many Elements Does the Alternating Group on 6 Elements Have?”) and if anyone gets it right they fail the quiz. [I could ban computers but it’s an online quiz and people sometimes ‘use the bathroom’ to check AI on someone else’s computer]

My most positive AI interactions have been with small-scale programming, like explaining what a line of code does or asking it to create a small chunk of code for a website. I recently asked it what this line I got from an online tutorial does:

fnm env --use-on-cd | Out-String | Invoke-Expression

because I had to redo it every time I started powershell, and it helped me write a script into my profile that automatically does this for me. [if some fancy tech person wants to explain how that’s idiotic and this is a clear example of how AI gives cheap bad answers that don’t promote understanding I’d be happy to learn from you.] I haven’t used it for creating website code myself, but I’ve had my students try it, and walked them through the problems it creates (like linking to ‘profile.html’ when that doesn’t exist yet). A ton of coding questions online are repetitive and easily scraped and using AI is pretty helpful.

In writing I’ve had very negative feelings towards AI. I love stories as a way to connect to human hearts and see what they love. AI writing is disappointing because there is no connection. Moreover, so many AI writers are convinced that no one can tell their writing is AI. Especially the stupid quatrain poems that all look the same. Or the games with sweeping staircases that connect to ravines or ‘doors full of mystery’ that have beans on the other side. And they scrape a lot of people’s stuff in a way that resurfaces very similarly to the original and it feels gross and unpleasant. The data it’s trained on isn’t repetitive and easy to ‘verify’ like programming is and has a lot more consent issues. And unfortunately the whole dataset for the ‘base’ ai was originally trained on that sort of thing. I’m strongly opposed to this sort of AI use, to the level that I ban AI game promotion or even discussion on the r/interactivefiction subreddit.

Inform programming is more like writing than programming to me. I have never asked AI for Inform help and don’t want to; each line of Inform adds character and personality to a game. Making a container transparent affects the player’s experience as much as making a character red-haired or describing a voice as being flute-like. I want complete control over those decisions.

My son likes Character.ai, which I feel is a better use of AI than static writing in a game, and I’ve had fun with him helping him overwrite the game’s programming. It’s still gross it’s trained on people’s writing, but it’s also trained on active participants in character ai itself.

So my overall thoughts for an IF chatbot are:
-if it could actually help with coding, that would be cool, but I’m skeptical that there is enough data out there to help, even if everything was consumed. Especially since new versions keep breaking things from older versions.
-for generating game text and puzzles, I don’t think it’s helpful. It’s not good at pacing and plot arcs, has no concept of what is an appropriate ‘hook’, and has resulted in awful, awful games.
-for generating game ideas, it seems fine. I just tried asking copilot for five Inform 7 game ideas and they were normal good ideas like ‘time travel’ and ‘enchanted forest’. Not exciting but not awful.
-for hints and walkthroughs I think it would be amusing, since using hints makes you feel bad you got spoilered but AI also makes mistakes, so you could use AI hints and not know if it was right or not, like the fleas in Wizard Sniffer. But the better it got the worse I’d like it.
-As a source of amusement itself, it would probably end up like Ai Dungeon, which was popular for a while but seems to have settled down to a low buzz of activity (judging by its subreddit). It controversially scraped a lot of people’s stories.

I do agree that a large part of the reason to do IF at all is to interact with the community in some capacity, and asking questions here serves a dual purpose of finding out the interests of your future audience.

As a last thing I don’t like the extreme electricity use and pollution/water use AI generates. I do think it’s neat they’re starting up nuclear power again for AI farms but right now I personally feel a little guilty (but not too much) whenever I try AI (which is about 20 minutes a week).

I am fine with you using my own materials for training purposes, but I do incorporate a lot of code like Emily Short’s examples which the original author may not want scraped and I do quote a lot of people in my essays that also may not want to be scraped, and I’m not authorized in any way to give permission on their behalf.

Overall, I don’t think the resulting AI bot from the project you outlined would provide significant benefit to the community given current levels of AI technology, size of training data, sentiment, and environmental impact. I feel like it’s a ‘solution waiting for a problem’ but the right problem hasn’t (in my opinion) shown up yet. I can envision a future in which those circumstances could change, especially for languages like Dialog which are extension of pre-existing languages like Prolog that have larger training sets.

zarf · December 21, 2024, 2:52pm

I am very wary of the scenario where some new Inform writer goes to the LLM first, generates a pile of code that does not work, and then brings it to the forum saying “Help! My code doesn’t work!”

We put in a lot of effort to help people with their code, but doing that for machine generated code would be a huge waste. LLMs allow people to generate infinite numbers of mistakes at the push of a button, and the machine will not get better at writing Inform. (Industry discussion is pretty solid that we’ve already reached the point where LLM training will cost exponentially more and more for less and less improvement.)

Honestly I would be pretty upset at being asked to look at LLM-generated code, and I don’t think I would be the only one.

InternetJanitor · December 21, 2024, 4:30pm

In every scenario where LLMs are not being used directly to generate spam I feel that they are still harvesting and consuming goodwill in a corrosive fashion. As literate humans, we have spent most of our lives charitably interpreting the written word we are exposed to as the product of human intent, seeing past typos, navigating meandering sentences, and patching over superficial grammatical or logical errors. LLMs produce something which wears the skin of language, but there’s nothing underneath. They can produce output which structurally resembles code, or advice, or prose, or ideas, but the more closely and less charitably a reader inspects this output the more flaws and banality they will discover.

Seeking out new places to insert this technology poses no benefit to communities of humans, and even tacit endorsement of their profoundly dubious utility enriches unethical actors. I’d recommend spending some time reflecting on your actual motivations for inflicting LLMs upon more people and spaces.

DavidC · December 21, 2024, 4:37pm

I’d prefer not to subject this forum to the broader moral discussion of using LLMs.

Keeping it specifically in relation to IF is sufficient.

That said, my experience with LLMs has been enlightening. I’ve learned a lot of new things building Sharpee.

manonamora · December 21, 2024, 5:33pm

This became such a big issue in the Twine Discord server that they had to add the following rule:

3. NO POSTING OF AI CODE (e.g. ChatGPT) :
AI-generated code for Twine is notoriously buggy and takes forever to get to work, to the point it is an active hindrance in both questions and answers. Do not ask for help with AI-generated code in the help channels or post AI-generated code as answers to others. Should AI become more reliable for Twine, the mod team will revisit this issue.

Just a lil EDIT: Twine code is actually 3 different formatting of code, and AI tend to mix them up in a way that doesn’t work at all (that and the deprecated coded sprinkled throughout)

DavidC · December 21, 2024, 6:04pm

Just thinking about my own code gen with Claude and it honestly never even occurred to me to bring stuck instances to people. I either figured out out how to move forward myself or got Claude to move forward.

I think this is why GenAI is better for experienced developers. We know how to identify core problems and resolve them.

GenAI used by inexperienced developers is definitely a wider problem.

HanonO · December 21, 2024, 6:14pm

Like every online company, Discourse (the people who make this forum’s software) did a push for opting into their new AI features. We did not adopt those - partially because it required some type of license or subscriber fee to some 3rd party LLM thing (?) And also because I personally had no idea what value AI would add.

Our forum’s purpose is to get real people to interact - to the point we’ve made a rule that posting AI content as a “user” participating in conversation - basically trying to masquerade as a human or trying to fool people is not allowed.

The only thing my tiny brain can imagine is that AI would probably scrape conversation topics and summarize them similar to how Amazon has a “review summary” which that’s actually good for. But summary of discussion topics would seem to have the effect of more people not reading through them for real.

I suppose AI could pitch in with moderation - if you tell it “any post with these words or discussing these topics should be hidden” but we don’t get enough traffic for that to be necessary. Or I guess it could answer questions users have based on the scraped knowledge of postings here or about forum rules and regulations - “How do I reply as a linked topic?” I suppose.

Apparently the Discourse AI thing didn’t go over as it sounds like they’re now moving away from it due to lack of interest. They asked their user base why people weren’t using it and many responses were along the same lines: “AI/Content Abuse” “It’s a solution to a problem we don’t have” “What does this even do?”

Below is a link to that thread and my specific response.

Greyelf · December 21, 2024, 7:49pm

I’ve considered (and been encouraged) a number of times over recent years of building a Twine specific LLM as a personal project to better understand the workings of such. Intending to use my own answers / solutions to other peoples questions / issues as the training source (as there are quite a few of them), so to side-step / reduce the “using other peoples works” and “not gaining the correct express permissions” mine-fields that rightfully plague most of the existing generative LLMs & the like.

And each time my research & consideration has lead me to the same issues:

not enough “good quality” source material to do the training, even if I considered getting permission & access to the “code” related sections of the more “professional” designed & implemented of the released Story HTML files (1).
because technical output is being generated the end-user would need to be able to tell when the LLM is hallucinating, which means such a tool is less useful for those with less knowledge of the subject matter.
when supplying technical answers / solutions there is often a need to ask clarification questions, to determine what “issue” really needs solving. And this is especially true when the questions are being asked by those with less knowledge of the subject matter. Current LLMs are not good at asking such thus they often supply an answer to the “wrong” question.

This keeps leading me to the question of exactly what use would such a tool be to those who need the most help…

(1) The amount of source code needed to train a “programming” related LLM is huge. There are reasons why the best of the existing “code” generating LLMs only really output “good quality” Python, HTML, CSS, and JavaScript code. And when if comes to those LLMs actually testing the code they are making up, as far as I know they can only currently do that for Python.