This is an interesting question on a few different levels. I’m going to give a really long and rambling answer but feel free to completely ignore it.
I’ve noticed that my own personal reaction to the ethics of scraping and AI use depends on the environment I’m in and the field.
For instance, I teach International Baccalaureate classes, and they’re AI positive, as long as students cite their AI use and expand on it. One of our highest-scoring students in recent years on the Theory of Knowledge essay had some troubles with English, so used AI to rewrite their essay from their native language/way of writing into more fluent English. I’m moderately leery of this but prefer open use to hidden use. I’ve been putting traps on high school tests where I ask graduate level math questions (like “How Many Elements Does the Alternating Group on 6 Elements Have?”) and if anyone gets it right they fail the quiz. [I could ban computers but it’s an online quiz and people sometimes ‘use the bathroom’ to check AI on someone else’s computer]
My most positive AI interactions have been with small-scale programming, like explaining what a line of code does or asking it to create a small chunk of code for a website. I recently asked it what this line I got from an online tutorial does:
fnm env --use-on-cd | Out-String | Invoke-Expression
because I had to redo it every time I started powershell, and it helped me write a script into my profile that automatically does this for me. [if some fancy tech person wants to explain how that’s idiotic and this is a clear example of how AI gives cheap bad answers that don’t promote understanding I’d be happy to learn from you.] I haven’t used it for creating website code myself, but I’ve had my students try it, and walked them through the problems it creates (like linking to ‘profile.html’ when that doesn’t exist yet). A ton of coding questions online are repetitive and easily scraped and using AI is pretty helpful.
In writing I’ve had very negative feelings towards AI. I love stories as a way to connect to human hearts and see what they love. AI writing is disappointing because there is no connection. Moreover, so many AI writers are convinced that no one can tell their writing is AI. Especially the stupid quatrain poems that all look the same. Or the games with sweeping staircases that connect to ravines or ‘doors full of mystery’ that have beans on the other side. And they scrape a lot of people’s stuff in a way that resurfaces very similarly to the original and it feels gross and unpleasant. The data it’s trained on isn’t repetitive and easy to ‘verify’ like programming is and has a lot more consent issues. And unfortunately the whole dataset for the ‘base’ ai was originally trained on that sort of thing. I’m strongly opposed to this sort of AI use, to the level that I ban AI game promotion or even discussion on the r/interactivefiction subreddit.
Inform programming is more like writing than programming to me. I have never asked AI for Inform help and don’t want to; each line of Inform adds character and personality to a game. Making a container transparent affects the player’s experience as much as making a character red-haired or describing a voice as being flute-like. I want complete control over those decisions.
My son likes Character.ai, which I feel is a better use of AI than static writing in a game, and I’ve had fun with him helping him overwrite the game’s programming. It’s still gross it’s trained on people’s writing, but it’s also trained on active participants in character ai itself.
So my overall thoughts for an IF chatbot are:
-if it could actually help with coding, that would be cool, but I’m skeptical that there is enough data out there to help, even if everything was consumed. Especially since new versions keep breaking things from older versions.
-for generating game text and puzzles, I don’t think it’s helpful. It’s not good at pacing and plot arcs, has no concept of what is an appropriate ‘hook’, and has resulted in awful, awful games.
-for generating game ideas, it seems fine. I just tried asking copilot for five Inform 7 game ideas and they were normal good ideas like ‘time travel’ and ‘enchanted forest’. Not exciting but not awful.
-for hints and walkthroughs I think it would be amusing, since using hints makes you feel bad you got spoilered but AI also makes mistakes, so you could use AI hints and not know if it was right or not, like the fleas in Wizard Sniffer. But the better it got the worse I’d like it.
-As a source of amusement itself, it would probably end up like Ai Dungeon, which was popular for a while but seems to have settled down to a low buzz of activity (judging by its subreddit). It controversially scraped a lot of people’s stories.
I do agree that a large part of the reason to do IF at all is to interact with the community in some capacity, and asking questions here serves a dual purpose of finding out the interests of your future audience.
As a last thing I don’t like the extreme electricity use and pollution/water use AI generates. I do think it’s neat they’re starting up nuclear power again for AI farms but right now I personally feel a little guilty (but not too much) whenever I try AI (which is about 20 minutes a week).
I am fine with you using my own materials for training purposes, but I do incorporate a lot of code like Emily Short’s examples which the original author may not want scraped and I do quote a lot of people in my essays that also may not want to be scraped, and I’m not authorized in any way to give permission on their behalf.
Overall, I don’t think the resulting AI bot from the project you outlined would provide significant benefit to the community given current levels of AI technology, size of training data, sentiment, and environmental impact. I feel like it’s a ‘solution waiting for a problem’ but the right problem hasn’t (in my opinion) shown up yet. I can envision a future in which those circumstances could change, especially for languages like Dialog which are extension of pre-existing languages like Prolog that have larger training sets.