Story Development with Claude: A Methodology for Authored Interactive Fiction

Funny story. Recently I signed up for a free-tier Claude Code account, at the prompting of someone I know who is more enthusiastic about AI-assisted coding than I am.

I didn’t want to present a problem I was actually working on to it, in the theory it would be more difficult to evaluate what it produced if I didn’t already understand the problem well myself.

So I presented it a TADS3 coding problem I’d previously wrestled with to see what it would come up with.

And I can’t claim this with absolute certainty, but its coding style looked like…mine. Specifically in coming up with illustrative examples, unprompted, what it provided included a number of stylistic idiosyncrasies those who have read my numerous posts here would probably recognize.

I will say that I’m probably in a unusual situation that a disproportionate amount of the recent public TADS3 corpus is stuff that I’ve posted here and in public github repos I’ve linked to here. Owing to the relative paucity of current TADS3 stuff and my tendency to be…prolix.

7 Likes

Oh I misunderstood the question.

Two things here:

  1. Current state was to move as fast as possible with a port of an existing game. Diff accuracy isn’t really necessary since in the initial design phases we extracted all of the messaging.

  2. Future state I plan to have a full diff tool as a side panel in an “Author’s Version” of Zifmia where the right side will have tools to play-test and affirm text similar to Inform 7’s skein, but I think not exactly the same.

My case is “worse”. I believe my success with Claude has led directly to changes in Claude Code’s capabilities. It’s way too close to my early “ways of working” to be a coincidence.

I have no trouble believing Anthropic monitors github for Claude-generated software and actively reviews the patterns engineers have used to maintain productivity and not get sucked into the traps of diminishing returns (which can happen if you’re not hyper-diligent).

When I say “My case”, I’m saying a small number (comparatively) of engineers found optimal paths in using GenAI and I was potentially one of them. Just watching Reddit and it’s obvious most engineers are still trying to get multi-agent chaos to work and I’ve been steadfast in my belief that no amount of “Agent X does code review and Agent Y does scope creep review” will ever work. Building software is an iterative process and the specifications change as the code matures. The agentic mindset is a waste of time.

Someone has to be watching the flow and pound the ESC key and ask critical design questions before the GenAI goes in an unwanted direction. I’ve caught very serious GenAI rat holes this way and sometimes it’s the AI and sometimes it’s the spec and sometimes it’s just me realizing things need to change. There’s no way an Agent can see what I see.

3 Likes

I’m sure every interaction with every corporate genAI widget is getting datamined. Not just for performance metrics to improve the genAI itself, but for ad targeting data and so on as well.

I’d be kinda surprised if there aren’t a bunch of people also working on the “put a straighjacket on the madman” approach to genAI interactions, as genAI’s weird failure to “automatically” apply all the specialized knowledge it has at its disposal is one of the first things I noticed years ago when these things first started showing up. And I’m not particularly clever. But yeah, if you asked me to lay money on the question I’d be willing to take very long odds that your work is represented in the current training data. Because I’d be astonished if anybody who has used Claude isn’t represented in the training data.

As for the “spy vs spy” approach (of stacking agents and having them work on different things) that’s probably as common as it is because a lot of problems in programming are NP hard and unless your getAI proves P=NP then there’s no better solution, in general (asterisk, see note), than making guesses and checking them. The note here being that a substantial part of computer science consists of heuristics for rapidly obtaining partial solutions to these kinds of problems, so there’s usually a literature search component as well, whether that involves searching stackexchange or using an LLM (which is probably just a fancy permutation of searching stackexchange).

But after posting I decided to check a couple of other places and as near as I can tell my personal coding style has polluted some meaningful fraction of all AI answers to questions about TADS3. I just asked google’s AI search “How do you implement an infinite object dispenser in TADS3/adv3?” and its response included:

A TADS3 code fragment
#include <adv3.h>
#include <en_us.h>

// 1. Define the class for the item to be dispensed
class Pebble: Resource
    '(small) (round) pebble*pebbles' 'pebble'
    "A small, round pebble. "
;

// 2. Define the dispenser
+ bucket: Fixture, ResourceFactory 
    '(infinite) (pebbles) bucket' 'bucket'
    "It contains an infinite number of pebbles. "
    
    // The class of objects this factory creates
    resourceClass = Pebble 
    
    // Allows the factory to accept items back (optional)
    resourceReturnable = true 
    
    // This method ensures there's always one item "visible" to the parser
    initializeFactory() {
        if(contains({ x: x.ofKind(resourceClass) })) return;
        createResource().moveInto(self);
    }
;

// 3. Custom creation logic (if needed)
modify ResourceFactory
    createResource() {
        return resourceClass.createInstance();
    }
;

This is clearly a slightly munged-up version of code I posted in this recent thread. It relies on classes and methods that are not part of stock adv3 but are presented in the thread (but not in google’s response). There is the mitigating factor that because it’s nominally a search result it also links the thread.

The stuff I noticed in Claude was a slightly less dramatic form of this. When it needed to come up with an object to use in an example it provided, verbatin, the pebble declaration I’ve used dozens of times in forum posts here and dozens more as code samples in public github repos.

Not, to be clear, that I’m trying to assert some sort of intellectual property rights over those code examples. I’m pretty sure the pebble example I’ve been using is itself a slightly modified form of an example used by Eric Eve in Learning TADS 3. And it’s just because I’ve posted a lot of TADS3 code and I keep using one specific bit of code as a generic example, and because there’s comparatively little other TADS3 code currently being posted, that specific example seems to be over-represented in the training data. And so when Claude needed to come up with something it didn’t use something like that example, it used precisely that example.

There are other things…like I think my original intent (to ask Claude a question I had previously worked out an answer to so I could check its work) backfired because I’m inclined to believe it wasn’t working out a solution similar to the one I worked out, I think what I was watching was Claude copying my homework in realtime. But that specific example seems to be a pretty clear example of verbatim re-use (in genAI-created code).

1 Like

The one thing I have come to also believe is that Claude can inspire new things. The data storage in Sharpee was an iterative dialogue over time and Claude convinced me it was better than my ideas (and it was 100% right). When I used my DDD lingo for the standard library, it said, so you want behaviors and data. How about “Traits”?

So there is an interesting dialogue happening. I’m not blind. I know Claude is just surfacing its vast programming knowledge. But that’s really the magic. No one can know everything and those statistical connections are highly beneficial to someone building something.

1 Like

Sure. Archimedes was inspired by his bathtub.

But I look at your example and think that if I was in the same situation I wouldn’t be “thinking wow this AI thing is great I need more of this AI thing”, I’d be thinking “if the AI is summarizing existing knowledge on the subject and doing that granted me insight I otherwise lacked, how do learn more of the existing knowledge on the subject”.

1 Like

Well I did do that, of course. When Claude insisted on an alternate data storage pattern, I didn’t just click “sure” and move on. I went off on a web hunt for everything I could read and realized Claude was accurate. I read parts of the Data Structures book you get in college and evolved a better understanding of these things.

Then I clicked, “Sure.”

Working with 4.6 is a bit like couples therapy. The partner says something and you say they are wrong. Then you realise that maybe they were 50% right. At the end of the iterative process you both agree and both can’t remember who originally said what.

2 Likes

This sounds like a lot of extra work (and expense) for not a ton of benefit. Writing the text and getting the plot tight is what takes up the lion’s share of development time for me, not writing the code, especially with a good set of unit tests and an expressive language that’s well-suited for IF.

4 Likes

I have had the opposite experience trying to construct my stories in existing platforms. I have reams of text for WIPs and I start coding, get frustrated with some I7 construct, and walk away. Heck I even fell back to Inform 6 for one project and although it was “better” I still felt a square-peg→round-hole issue. (yes, this could be my neurodivergence)

But more than anything, this is about GenAI as a code generator and to some degree a technical partner. Sharpee was just for fun, but a few months ago it became more. I asked Claude if Sharpee was a good way to partner on new IF and after a detailed conversation, we identified a golden path for collaboration (I write, it codes, I maintain guardrails, it follows them, I set structure, it adheres to them). It even pointed out that generating Sharpee+Typescript would be vastly more productive than Inform or TADS (which it knows, but not as well).

So, LLMs are notorious for being very sycophantic. Do you have anything other than Claude’s “word” backing this up? Does it ever tell you that you’re wrong about something? Will it ever contradict you? Or are you extensively reality testing everything it says?

5 Likes

Speaking from experience, Claude is able to give you a hard ‘no’ and an explanation, but you do have to format your prompt or question in such a way as to make sure it does. Even then, I don’t always trust the explanations, but they tend to offer enough hooks to work from.

If Google Gemini is to be believed, the Å-machine is a proprietary compiler only Linus Åkesson can compile his Dialog projects to. Probably because it really, really, really wanted a balanced pros and cons list.

[!note]
Asked about music in Dialog as I’m currently learning the language. I have found LLMs are really bad at it, which only has me really enjoy the process of learning something new. It’s making me really giddy

I just ported mainframe Zork in 54 days using Claude and Sharpee. If you think I’m in any way bedazzled by GenAI’s Floyd-like positivity, you would be mistaken. I’m speaking from experience with building business software for 40 years and two years of wrestling GenAI into a productive set of guardrails.

I’m not vibe-coding. I’m a pretty serious person.

1 Like

I think it’s a stigma many developers are now running into. Even I am starting to doubt my ability to program now that I’m letting LLMs handle much of the boilerplate and I mostly make the architectural decisions.

I’m actually really happy I started learning Dialog, since the documentation is accessible enough for me to quickly go through it, and LLMs seem not very well equipped to write it due to the niche nature of it.

1 Like

After 40 years, I have zero interest in code blocks. (If-then-else, do-until, while{}). I’ve written the same patterns thousands of times. I am a human LLM but bored of doing the small bits. Claude is awesome at doing the small bits and never gets bored.

I never let it do the big picture.

Yes. We have to accept this feedback as it’s intended. There are definitely undesirable aspects of GenAI.

But I think it will normalize quickly. The costs will diminish, the resource intensity will abate, and the guardrails will become a part of their structure.

I actually think GenAI will make better software engineers. Not coders. Engineers.

I remain somewhat skeptical, because like Susan said, the “turn the spec into code” part of IF writing isn’t the part that takes the most time and effort for me. So I might just be exactly the wrong sort of person for a system like this. But I do appreciate the efforts to keep LLMs from making any creative decisions or writing user-facing text.

This is a tangent, but—it’s supported in the 1.0.0-dev version of the Å-machine. I added sound support specifically for background music The Wise-Woman’s Dog, so it’ll be in the next official release of those tools.

All the pieces are also there to make it work on Z-machine as well (the Z-machine has well-supported sound-playing opcodes and the compiler supports resource definitions); the hard part is generating the blorb file with all the sounds in it, which the Dialog compiler isn’t really suited for right now. In general, the Å-machine is the better choice for multimedia things.

2 Likes

I think this is woefully underestimated in general. I have tried (and failed) to build multiple games in the past. Usually I tried to reduce scope at multiple points, but I found that the biggest effort usually wasn’t in programming or mechanics, but art, writing and music. I.e., content.

I think one of the only reasons I was able to finish my mobile game “The Delve” was because I intentionally left out much of the story, and instead played to my strengths in doing art, music and programming.

It taught me that people really underestimate the amount of effort that goes into building a visual novel, and they are not at all as easy as people generally claim. Unlike Roguelikes, you can’t get away with programming tricks and procedural generation to create content and play duration.

3 Likes

I would like to second (third?) this sentiment.

I can’t think of a single game project that I’ve abandoned because I realized it was beyond my technical ability. However, I have abandoned dozens of projects because I either ran into intractable (for me) narrative or game design problems, or because I realized that the amount of writing/art/modeling required to achieve my vision was beyond my capacity as an individual.

8 Likes

Which isn’t to say the coding is necessarily easy—but for me, the hardest part of that coding is getting to the point where I could write a comprehensive spec. Going from “I want to include liquids that can be put in containers and poured out” to “here are the types I’ll need, here are the actions I’ll need, here are the checks and behaviors I’ll need” is where I get stuck; going from “here are the types, actions, checks, behaviors” to “here is the I7/Dialog/Ink code” is usually not just easy but fairly fun.

So again, I might just be the opposite of the target audience for this product. But for me, the parts of the process that Claude could optimize are exactly the parts that I don’t need optimized.

5 Likes