Story Development with Claude: A Methodology for Authored Interactive Fiction

AWS’s projected capex spending just for AI this year is greater than what the top three cloud hosting providers spent, combined, in total in 2023. This is a specific example, but unless there’s a secret way to bypass spending literally hundreds of billions of dollars to hope to reach profitability, then literally nobody seems to have found it. That is simply manifestly not the case with cloud hosting. If the situations are not fundamentally different then the largest players in the market are on track to collectively blow around a trillion dollars on the mistaken impression they are.

Again I’m not sure if this is a semantic argument or a fundamentally different understanding of the situation.

At least in general people seem willing to make narrow distinctions between models for various tasks; when I was talking about my experiences with Claude Code (specifically Sonnet 4.6) you contrasted them with GPT 5.2 in Copilot. I assume that’s because you were implying there might be some meaningful difference between the results from the different models.

But we don’t have to rely on subjective expectation, there’s data on this. For example:

This is blog post about context-lens:

Which I present mostly as empirical evidence of functional difference between the different LLM tools.

1 Like

Oh these are fascinating! I had been wanting to build an open-source programming aide that can run on much weaker and cheaper models. This is the type of data I really need.

1 Like

We seem to be talking past each other. I’m saying it’s not feasible in practice for some new entrant to spin up a competitor to AWS or GCP, and your response is that AI is expensive and currently unprofitable. These can both be true.

Again, we seem to be talking past each other. I’m saying the difference between cloud hosting providers is analogous to the difference between AI models; your response is that there is a difference between AI models.

There’s a difference between cloud hosting providers too. They don’t offer exactly the same services. One is better for some people in some situations than another. If someone tells me they’re having a problem with AWS, and uses it to make a larger point about cloud hosting in general, I might say “Huh, that actually hasn’t been a problem for me with GCP.”

That’s all I was trying to say about the different AI models: if Claude isn’t working for you, try GPT, it works for me. People in general seem pretty happy with Claude, and I hear a lot more chatter about Claude Code than Codex, so I imagine there are others having the opposite experience from me who would tell me to try Claude if GPT let me down.

1 Like

It’s getting even more dystopian. Very interesting video from a lady called Sabine Hossenfelder who holds a PhD in Physics. The section with David Kipping from the university of Columbia is particularly interesting and bleak. Events are unfolding at a tremendous pace.

1 Like

I respect Hossenfelder, but- this video is shilling a paid AI seminar. Sheesh, I wonder if there might be a grain of salt needed here…

6 Likes

At least she had the decency to do it at the end and not “and now a word from our sponsor” in the middle :smiley:

1 Like

I’m not just taking about costs in the abstract, I’m using them as a sort of napkin math bounds on the plausibility of the two scenarios we’re comparing. That is, to get a seat at the table of the AI game it apparently costs approximately as much as it costs to get the three top seats at the cloud hosting table. That is, the lower bound on entry for AI…and it’s a lower bound because nobody’s managed to turn it into a profitable proposition yet…is strictly above the upper bound of for cloud hosting. And the top three cloud hosting providers control something around 65% of the usage (and they’re all comfortably profitable), so that’s also an argument that the cost of entry for AI is, to a first order approximation, something like the cost of running the entire cloud hosting industry (circa two years ago, so before AI hosting starts confounding the numbers).

I’m not trying to present a formal argument that you just spend the bucks and then beep boop you’re a hosting provider. And I’m not just arguing that AI is expensive in the abstract. I’m illustrating my strong confidence that the market reacting to rising prices by adding new providers is more plausible for cloud hosting than AI.

Sure. But the vast majority of customers’ requirements are cloud provider agnostic. Like AWS is huge and a lot of people are hosting things like mostly-static web content on AWS S3. But if amazon was hemorrhaging tens of billions a quarter on hosting and needed to jack up their prices to cover their losses, all of that could very comfortably migrate to smaller hosting providers. Which exist. That’s really not the situation in AI. There are smaller models and bespoke models and so on, but outside of corner cases once you move away from the major players there’s a huge quality cliff. If AI money stopped being free, and so the major players started having to price things based on their costs and that spiked the cost per token or whatever, there isn’t anywhere for the overwhelming majority of current use to go. Much less in the imagined future where literally all of coding is going through these services.

But at any rate I’m not sure how much anyone is getting out of this conversation. I’m also not particularly invested in convincing you of the model I’m presenting (that is, that dependence on a small number of AI service providers is a structural risk disproportionate to existing structural dependencies which, as you observe, arise naturally due to e.g. market consolidation) and, I suspect, vice versa. So I think I’ll leave it there.

2 Likes

Regarding Hossenfelder, grains of salt are the wise course of action. She’s a physicist, and I am not knowledgeable enough to dispute any of her videos on that. But, any of her videos regarding social sciences, she has been very blatantly and obviously wrong on many documented occasions. I used to respect her more, but I stopped giving her videos any credence when it became obvious her claims of scientific rigour clashed with her practice of it.

3 Likes

They’re both implausible enough that no business, in practice, is making its decisions based on the remote possibility that a new competitor to AWS/GCP might spring up overnight. I don’t see the point in bickering over whether one is more implausible than the other.

I agree with that, I just don’t think it’s relevant. Discovering a single chest of pirate treasure buried in your backyard is more plausible than discovering ten of them, but it’d be a mistake to include either possibility in your household budget.

I strongly disagree. The product offering and reliability of a smaller hosting provider are nothing like the product offering and reliability of a big one. If you don’t need the benefits that the big one provides, then sure, you can migrate (with some effort)… just like if you don’t need a model of the same quality as GPT or Claude, you can switch to a lower quality model from some other company, or even a model that runs on your own hardware.

Indeed.

1 Like

On a seperate interesting note, here’s a released bit of Python code about how the transformers powering GPT technology are assembled: microgpt · GitHub

It’s not powerful enough by any stretch and would require massive effort to train, but it’s the underpinnings to the technology.

I am personally very curious where local models will go in the coming years. The big providers obviously offer very powerful models that are near impossible to compete with, but I know there are local, small parameter count models being trained and experimented with every day on expert tasks, meaning they don’t need quite as much computing power for similar results within just a subset of tasks. They may not be very chatty, but they might just be good enough to produce really good C# code even on smaller hardware.

This is the development I’m personally curious about. Just to use the metaphor again, it’s the equivalent of comparing large enterprise grade cloud computing vs. a local server for streaming videos. The latter is not necessarily worse than the former.

(By the way, this is just my fascination speaking with the technology and not meant to take sides in a discussion)

1 Like

I’m not convinced that the “big providers”’ models couldn’t be run on your own PC already. Just that they’re not publishing their models currently.

For sure, they’re going to have a lot of iron. but divide that by the number of users and is it really more than a PC with a half-decent graphics card?

So i haven’t tried language models, but I’ve tried image gen. And i get way superior results from my own PC than any “big provider” out there. Although Gemini Pro with a paid key is close.

I think the AI as a service mongers actually water down their offerings (and more as time goes by), especially for free-loaders.

I don’t see any reason why you cant run the language models locally too. I plan to try it when i have time.

1 Like

My company is already talking about building our own llm and provisioning the hardware for on-prem.

I’m targeting fall for my own AI hardware.

This is probably happening in a lot of personal and corporate settings.

I looked into this today, actually. To get alright performance with a reasonable amount of tokens per second, you’ll want to run a 70b to 100b+ parameters model. These are functional for agentic programming workflows. DeepSeek keeps publishing versions of these.

To run that, you’ll want a setup with either of the following:

  • AMD Ryzen AI Max+ 395
    Or
  • 2x RTX 3090 GPUs

Note that this would be sufficient to run 100b+ parameter models, but you’d need to have them heavily quantised the closer you’re getting to 200b, otherwise the performance hit is too great to be functional.

The big commercial programming models are even larger if I recall correctly, but I didn’t confirm that in my brief research efforts today.

Note that this is mostly just me having asked around the KoboldCPP community and in some other places, so I may be off. I lack the means to actually try, but I did speak to someone today who may ask for my assistance in setting up such a thing.

Either way, I found that KoboldCPP very much outperforms Ollama due to better memory handling, so that is a neat feature. To get good programming capabilities, you’re basically looking at a few things:

  • A strong enough model that is able to properly understand their code changes, or a slightly weaker model that is fine tuned for programming.
  • Enough VRAM to support the model, and provide a large enough context window to help with thoroughput.
  • A decent agentic pipeline which prompts the LLM with the right tools to do the job.

I tried with my old GTX 1070 Ti today, but while I spun up a blazingly fast chatbot, programming inside of an IDE wasn’t working for it.

Yes, it looks like 100+ models are too big for most systems. I was going to try the suggestion in this vid for local models;

He runs a 24B model locally, and you can run more by using system RAM, but it gets slow. If you watch from 16.00 he rents space for $3.60/h for 190B. More if needed.

Note, there are services that allow you to rent GPU power for much cheaper, and only have you pay for uptime. If that is a route you want to go, I was thrown some recommendations today.

2 Likes

I am running models like Devstral Small 2 2512 Q4 on an M2 Max Mac Studio with 38gpu cores and 64GB of RAM with OpenCode CLI and anecdotally the experience is 1-2 orders of magnitude worse than Claude. If you have access to Claude it’s unusable.

1 Like

I think the spec driven format works well. I think points 3 and 4 are spot on. I’ve spent the last 15 months developing an IF text engine to do just the things you mention. There is a visual designer, and object editor, and a rules system that imbues the world logic onto the top of it all. It can have dialog, menus, object actions, and world state. I am using it to submit IF that I am working on for the contest coming up. So all the engineering and tech are my strong sides and actual storytelling is not. What I lean into are public domain stories that have been written and then I try to gamify them so to speak. I know I will never be a best selling author and the system I made is ultimately going to be for others to use for creation. The people I hope use it are the ones who want to tell stories that have the ability to have rich immersive logic.

2 Likes

Claude Code really is quite something. I’m afraid I go through my tokens weekly, so I’m genuinely considering whether it might be cheaper to just run my own model on either rented or bought custom hardware.

Because I admit, it’s baffling how powerful it is (though just like any LLM, it still occasionally makes some strange leaps of logic, despite giving it documentation)

I agree. I’m sure there are cheaper alternatives, as i suspect runpod pays a lot of promotion. I’d be interested to learn of cheaper alternatives, if you know of any. FWIW, i don’t really plan to run LLMs like this because i don’t really have a use-case for running locally. It’s just for interest that i was thinking of experimenting to see how good it might be comparatively.

On the other hand, I do run local image-gen. And this is primarily because i can generate 2k+ images, when most services out there limit to 1k pictures. I played with Gemini Pro, which claims to generate 4k, but this is currently inaccurate, as the pictures are in fact just up-scaled 2k ones. So, to date, i have not managed to generate 4k images directly.

1 Like

I pay for the Pro Max $230/month subscription. I’m never blocked. I’ve had four separate Claude Code sessions running my workflow full tilt for hours and I never get the “out of tokens” message.

It’s seems to be so effective I’m trying to sell it: https://devarch.ai/.