ChatGPT is a mostly bad tool for Inform 7 development

…I’m actually inclined to give it some points for that. Mind you, I’m taking for granted that we can’t hope for something actually good when we ask for something creative, thus something that could possibly be a creative prompt for a human is the best possible outcome, i.e., I’m not setting a very high bar.

Sure, if the clues were actually just “3 numbers, at least one of which is missing a label, have fun guessing” that’s a game I would be quitting in disgust; sure, there’s a UI issue to overcome with how we enter the things with the missing labels.

But most locked door puzzle mechanics have been done to death; a keypad with scrambled numbers with some missing labels hasn’t been. I don’t think it’d be easy to come up with a way to make it workable and fun in a real game. But I admire that the response offered some novelty a lot more than I would have a much more workable solution that amounted to “to get the brass key, you have to search the flower pot”.

2 Likes

I just tested chatGPT with GPT4 for designing a puzzle. Seems to be a bit better, see below:

Give me a puzzle for opening a locked door in an adventure game.

Puzzle: The Weighted Pressure Plate

Description: In this puzzle, the player encounters a locked door with no visible keyhole or handle. On the floor in front of the door, there is a pressure plate. To unlock the door, the player must correctly distribute a set of differently weighted objects on the plate. The correct combination of objects will apply the precise amount of pressure needed to unlock the door.

Setup:

  1. A locked door with no apparent way to open it.
  2. A pressure plate on the floor in front of the door.
  3. A set of objects with various weights scattered throughout the game world or nearby room. For example, these objects could be stones, metal bars, or differently sized books.

Clues:

  1. An inscription or engraving near the door, hinting at the required pressure or the combination of objects needed. The inscription could be a riddle, a cryptic message, or a numerical hint.
  2. The weights of the objects are not explicitly stated. However, the player can find a scale or balance in the environment to compare the weights of the objects and determine their relative heaviness.
  3. The player may find a journal or a note from a previous adventurer, describing a failed or partially successful attempt to unlock the door, which could provide a clue about the correct combination of objects.

Solution: The player must experiment with the various objects, using the scale or balance to determine their relative weights. By analyzing the given clues and the objects’ weights, the player must deduce the correct combination and place those objects on the pressure plate to unlock the door.

Example:

  1. The inscription near the door reads: “Three parts heavy, two parts light, and one part just right.”
  2. The player finds six objects: A, B, C, D, E, and F.
  3. After using the scale, the player determines the following weight relationships: A > B > C = D > E > F.
  4. The player deduces that they should place objects A, B, and C on the pressure plate to apply the correct amount of pressure.
  5. Placing the objects on the plate unlocks the door, allowing the player to progress in the adventure.
2 Likes

Hmmm, interesting. But why A, B and C? C and D are the same weight, so they should be interchangeable, and even so we only seem to have the “three parts heavy” and not the other parts mentioned in the riddle.

3 Likes

It’s getting better, but it’s output is still quite verbose and repetitive. I think there was a puzzle similar to this in ‘The Job’ by @fredrik. The concept is good, but, as @mathbrush said, the solution in the example doesn’t seem to make sense.

3 Likes

Training the instance with more examples of good puzzles would probably help, though I note that the puzzle given is far from specific to a text-based IF game. It would not be a far stretch to see said puzzle be implemented in a game inspired by Tomb Raider, for example.

2 Likes

I think the ideal way to have generative AI create decent interactive fiction is to create an IF dataset and then fine tune a non-chat model to produce content based on this. This could be a challenge - you’d need to collect works and create transcripts of them, and feed them into the model as training data. While the output model itself may or may not be a derived work of the transcripts, the transcripts themselves will be copyright protected so permissions would have to be sought if you shared it, which you absolutely should because having a public dataset of IF transcripts would be a boon to world-building and using AI models in future. It may ultimately help pave the path to ordinary people creating Star Trek holodeck type content at home. (Combined with other stuff)

One way to create a dataset would be to use an AI tool to convert pages of public domain books into the IF style, then upload this to huggingface. It’d not be ideal, but it’d be a step in the right direction. Another idea would be to build a dataset using open-assistant code where people take turns being the computer and the user, and use that as a data set. But you’ll need thousands of conversation trees and to rate them.

Anyway, regardless of whether it’s been sharpened up with fine tuning, don’t use a chat interface to make your content. Use text-completion instead so you get the correct narrative style. You want to give it as much context as you can, and have it create continuations.

For example ChatGPT originally did something like:

This is a transcript of a conversation between a user and a very helpful, ethical, friendly AI chatbot:
User: Hi. Can you help me?
AI: Yes of course, I’d be glad to help; just ask whatever you like!
User: What’s the capital of France?
AI:

Then you tell it to keep generating text until it generates "User: ", and insert the next prompt from the user and continue. But it turned out it’d tell you how to do unethical things, so they fine tuned it with conversations between cheap workers and rated the conversations. The result tuned for “this is what we African workers think that our bosses think that woke US corporate types want to hear” - and that’s how ChatGPT responds! Using the OpenAI playground rather than ChatGPT sidesteps a bit of this, but it’s not free and it’s still trite and generates content that lacks rawness of a real author. It can only explore a very small subset of art.

There’s other models though, and recently LLaMA2 was decensored somewhat. You can likely find instructions of how to run this on Google Colab or on Huggingface for free. RWKV is another good model, and there are more out there. Ideally you want to have a long chain of text that already exists and ask it to autocomplete the end, then cut the start off and carry on - that’ll give you the best consistency.

If you’re anything like me, what you really want is for you to give examples of your writing style and works that you love, then have the AI build the details of your world for you on the fly, remember what’s going on in it, and introduce characters that you’ve written yourself from example dialogues you’ve written. This requires a bunch of different technologies that I’m not sure exist yet, and data sets that need to be gathered so they can be fed into models that people can use freely in order to innovate. But the most promising is stuff at the moment is built on top of langchain using decensored models. Character dot AI have some cool stuff going on in this area if it’s your thing you should check it out, but it’s not there yet or wasn’t the last time I checked.

I’ve been ramping up on learning about AI since I first started this thread.

To clone one’s own writing style, one would need to (additionally) train an LLM with a corpus of one’s own writing. A zero-shot approach is impossible, because writing style has no parallel to zero-shot voice cloning. A low-shot approach is only likely if one’s writing style is commonplace enough that it’s subsumed within the writing samples already trained into an LLM.

The attention window is the biggest obstacle to keeping user-added worldbuilding in memory. It could be added to a training corpus, but keep in mind that LLM hallucination will still occasionally rear its head. Just ask ChatGPT to talk about or roleplay with Star Trek technology for an extended period to see how things can sometimes go awry.

In principle an LLM could be trained with a bunch of Inform 7 examples to serve as a coding assistant. Concerns about guardrails and “censorship” (the real underlying complaint is about political ideological bias) are misplaced when it comes to using LLMs as coding tools. ChatGPT has been visibly dumbing down their chatbot for the sake of performance, trading off accuracy for speed.

My impression is that langchain is fairly crude under the hood and will be superceded by better tools, probably within a year. Thing is, Inform 7 is already natural language, so a tool like langchain (grounded in typescript) adds unnecessary complexity, which would ultimately limit the performance of any particular model compared to a model tuned to work with Inform 7 code directly.