-
Sharpee (https://sharpee.net) is proof that one person can engineer an IF platform with GenAI (Claude Code).
-
I do not think you could do this using spec-driven agents. It’s taken me a year of non-stop iterative architecture and implementation to get Sharpee close to v1. The number of times I’ve had to lasso Claude and redirect its thinking is well in the hundreds. GenAI simply does not understand the big picture. You have to manage that yourself and there’s no way any group of agentic coding will land where you want them to.
You 100% did that. In fact most of the posts here from all of you have been invaluable, and I’d like to thank you all for that.
I’d be interested to know by what factor you think nowadays that time to iterate through the architecture and implementation would be reduced by? Starting from the same basis.
I am sure that we have all experienced the pitfalls of LLM’s, but it’s also hard to fathom (at least in my mind) the staggering rate of change and improvement. The arrival of Opus 4.6 was for me a defining moment, where I thought, lets yolo this all the way ![]()
Is it unreasonable to believe that in 12 months’ time a simple prompt like: “look at this mess of a project, fix everything wrong with it starting from first principles” will yield results that will boggle the mind?
Eventually there will be genuine AI which will be able to do almost anything.
For me the question is: do you want to say “I made this”, or just “I got my friend/parent/computer to make this for me” ?
Personally I want the former, to say it was me that made something. Personally I am not much interested in the latter.
I think “is Ink Turing complete?” is actually a bit tricky. The lack of arrays means that if you really want to simulate an unbounded tape, you would need to do it with strings, but the string handling in Ink is so limited that actually proving Turing completeness is slightly nontrivial. I’m sure I could do it, but it’s not as trivial as proving Turing completeness of, say, Python would be.
This is, of course, nitpicking on the order of “C technically isn’t Turing complete if there’s a fixed pointer size.”
I think from Opus 4.0 to 4.6 Claude’s ability to diagnose problems has dramatically improved. But the ways it goes off script are very much the same from a year ago.
The problem isn’t the model or the reasoning engine. It’s the context window. Claude (and other LLMs) can only focus on small sets of information and code. If you ask or expect GenAI to do more than that, you will 100% see your project derailed almost instantly past one agentic iteration.
Thanks again everyone for their feedback. Particularly the speed at which you responded was amazing!
I have made considerable changes, taking all the feedback on board (I think) and updated the site.
I even added an article entitled “A human entered the room” to record how this conversation is steering the process. As you will have guess by now, the development journal itself is the ‘product’ at this stage.
Direct Link: A Human Entered the Room · Urd
Although there’s a degree of irony in the fact that you clearly did this by pasting the contents of this thread into an LLM and asking it to summarise it.
The Urd website is very slick, but I don’t see a way to link to specific parts…
Anyway, after reading some of the design, a few things stand out. First, from a page with the heading “Welcome to Urd”, here’s monty-hall.urd.md, which is presented as a concrete example:
types:
Door [interactable]:
~prize: enum(goat, car)
state: enum(closed, open) = closed
chosen: bool = false
entities:
@door_1: Door { prize: car }
@door_2: Door { prize: goat }
@door_3: Door { prize: goat }
@monty: Host { name: "Monty Hall" }
rule monty_reveals:
@monty selects target from [@door_1, @door_2, @door_3]
where target.prize != car
where target.chosen == false
where target.state == closed
> target.state = open
- This is called Schema Markdown, but it feels a lot more like YAML than Markdown.
Hosthas no definition.- There’s no definition of how the player is supposed to interact with this scene. Do they interact by arbitrarily changing the properties of objects? If so, what stops them from setting
chosenon all three doors, or changingstateindependently fromchosen? - There’s no definition of sequence or temporal dependency. How do they learn which prize they’ve won? Does Monty’s rule run in response to some event, or does he choose a door immediately when the scene starts? Does the scene end at some point, or do they stand there messing with doors and watching Monty open other doors forever?
Moving down the page, there’s this:
How it works
Writers author worlds using Schema Markdown, a syntax designed to feel like writing prose. The entire vocabulary is seven symbols.@ Entity
? Condition
> Effect
* Choice
+ Sticky
→ Jump
// CommentThat is it. A character is @halvard. A condition checks state. An effect changes it. Choices branch. Jumps navigate. If the syntax forces a writer to touch type definitions or JSON, the tooling has failed.
- The syntax does not feel like writing prose. The example features three different kinds of brackets, words with periods in the middle, and no complete sentences.
- The example has other symbols besides those seven.
- The example starts with a type definition, and those parts in curly brackets look suspiciously like JSON. One must, therefore, be skeptical of the tooling’s success.
This is why I’d start with the runtime and get to writing meaningful code that you can test as quickly as possible: you’d immediately notice that this isn’t the prose-like, “clear, typed description of what exists, where things are, what the rules are, and what can happen” that you were aiming for.
One of the most important roles of the human working with a coding AI is to recognize when the AI is heading down the wrong path, interrupt it, and clear away remnants that will tempt it down that path again. Maybe other AIs in the “workforce” are able to do the recognizing part—or will be able to next month—but I think the current model of orchestrating them puts a limit on how well they can do the interrupting part, simply because they have to wait for each other to finish tasks, whereas a human collaborator can watch their thoughts scroll by in real time.
Not all of us. I think those people trapped in the AI maelstrom (or thrust there by their employers, though my impression is that this is largely US employers) forget that most of us have little or no interaction with them, nor particularly wish to.
Having said all that, I do appreciate the motivation of the project, though it does boil down to “can AI’s replace me entirely” ![]()
This bit of my post feels especially relevant now.
You’re absolutely right, that is such a profound insight!
The grovel and beg attitude of many Large Language Models is a testament to their intended widespread adoption. It’s not just thinly-veiled sycophancy—It’s a calculated market strategy
I feel dirty typing like that.
OP ran into the 24-hour limit of replies for new users, but wrote to let me know that I was looking at a truncated version of the code. The website has been updated, and there are now two files. Again, I’d link to them, but there doesn’t seem to be a way to do that; instead, you can get there by clicking “A Quiet Introduction” on the front page and scrolling down.
world.urd.md is still more YAML than Markdown, but now it defines Host as a non-interactable type, and defines some entry points.
world.urd.md
world: monty-hall
start: stage
entry: game
types:
Door [interactable]:
~prize: enum(goat, car)
state: enum(closed, open) = closed
chosen: bool = false
Host:
name: string
entities:
@door_1: Door { prize: car }
@door_2: Door { prize: goat }
@door_3: Door { prize: goat }
@monty: Host { name: "Monty Hall" }
rule monty_reveals:
@monty selects target from [@door_1, @door_2, @door_3]
where target.prize != car
where target.chosen == false
where target.state == closed
> target.state = open
More importantly, monty-hall.urd.md defines the game sequence, in a language that does resemble Markdown.
monty-hall.urd.md
import: ./world.urd.md
# Stage
A game show stage with three closed doors.
[@door_1, @door_2, @door_3, @monty]
## Game
### Choose
Pick a door.
* Pick a door -> any Door
? target.state == closed
? target.chosen == false
> target.chosen = true
### Reveal (auto)
@monty opens a door that hides a goat.
### Switch or Stay
Monty opened a door with a goat. Switch or stay?
* Switch to the other closed door -> any Door
? target.state == closed
? target.chosen == false
> target.chosen = true
* Stay with your current choice
### Resolve (auto)
> reveal @door_1.prize
> reveal @door_2.prize
> reveal @door_3.prize
The reference from world to the stage and game sections in monty-hall is odd, but we can see how the player’s actions are defined: as the action progresses through the Markdown sections, the game pauses at each one that isn’t marked (auto) to let the player pick one of the available choices, and then the action continues with the next section.
I still think this is overselling the “prose-like” nature of the language. In a world where Inform exists, the bar for something to be described as more like writing than coding ought to be pretty high.
The division between the two files seems arbitrary, and the division of labor between the “engineer” and “designer” seems like an illusion: the business logic of the game clearly lives in both files, so someone designing this game is going to be editing both. It doesn’t make sense to delegate the implementation of which doors Monty picks and which doors the player picks to different people.
I think these are issues that human oversight could probably catch sooner, but AI on its own might barrel ahead through without ever stopping to consider whether it’s going in the right direction.
So far, this hasn’t exceeded my expectations of what AI can do in terms of running a project. But, this seems like it might be expressive enough to implement Cloak of Darkness, the de facto “hello world” of interactive fiction, so I’ll look forward to seeing what it takes to produce something playable with this system.
Hi Tara,
Thank you for this. I was waiting all day for the reset to occur, and I hope the reply button will now work.
I also noticed that links are not allowed. I’m not sure why and how.
The splitting of files is optional. The md file can either be a single file with a YAML’ish frontmatter or split, like in this example. In fact the PEG at the moment only accepts the single file format. The pre-processor work will start tonight.
You are right to point out that the split seems unnecessary, but there actually is a very good reason. As you might have picked up already, the first thing that is mentioned are MUD’s.
The idea is that at the lowest end Urd can support the famous “chose your own adventure types of books”. Following from that the idea is to support all existing forms of dialogue systems, world simulation and then multi-user server simulations. It’s one of the reason for instance why everything in Urd is a spatial container (which causes some issues, but the idea is that it’s an actual collection of spatial objects that can be ‘simulated’. Think Genie3 from Google, a reasoning engine or even maybe a physics engine and the passage of time).
The one thing Urd doesn’t tackle head-on is parsing of user input. In that respect it’s different to maybe some other IF systems because it assumes that in the future an LLM can parse and reason about the world better than anything a human could express. It doesn’t mean that the output is LLM generated, but this will be left to the implementation. Urd will not have an opinion on that.
So yes, for the lower level of use cases a single file is enough. The IDE will be very simple too. However, there will be extensive issues that will need to be handled in a world where different states can live in different places and are mutable. For that scenario the pre-processor will support importing external definition files. If it’s a MUD, it will be beyond the capabilities of a non-technical person to master (or maybe yes with the help of AI).
In terms of expressiveness, of course nothing beats Inform 7. However, the use-case I think is different. That of course depends if I manage to get the different elements to dock with each other without exploding the space station.
As I mentioned in a previous post, if I simply get the compiler to work, that would already be a marvel.
This forum has a system of “trust levels”, which increase over time as you read other threads, make posts, like, reply, and so on. Links and images are restricted at the most basic trust level in case of spambots, but if you stick around and interact you’ll be able to post them soon.
I honestly don’t think it’s possible to simultaneously support all existing forms of world simulation. There are just too many different types of games out there.
My Thesis is that the output of AI simulation will be indistinguishable from a ‘real’ simulation. See for instance physics simulations against SideFX Houdini. Even Hollywood has realised that you don’t need to boil the ocean with CGI if all you want is a special effect that looks physically accurate.
I might be over-optimistic, but I think that in the future the runtime will be able to simulate any world given enough constraints and inputs. What is still needed is a way for the creator to actually make it fun. That is where I think that a new type/class of schemas will emerge.
I am just having a shot at it, there is no way Urd will be it.
At least it gives me something to play with that isn’t; let’s build another AI driven affiliate marketing site or SaaS business using AI. Funnily, conversing with some colleagues, another one of my devs also came up with the idea of boiling the ocean with Opus 4.6 by building a meta intermediate language as a secondary output from the compiler. His thesis being why input code in the first place, it was designed for humans and is not particularly helpful at observing and verifying the output.
If you’re giving it “enough constraints and inputs”, though, can’t x86 assembly also simulate any world?
See Elite and Rollercoaster Tycoon ![]()
Right, I’m just not sure what you’re proposing to gain from this. The reason to use Ink instead of C is that it gives you a bunch of help in making one specific type of game; by necessity, it’s worse at making Roller Coaster Tycoon as a result. But there’s a reason that people choose to write games in Ink rather than x86/C/C++/etc.
If your system can support every possible type of game equally, then it’s not very good at supporting any of them. Different games have different needs, and generally the biggest selling point to designers is that this system will meet their needs specifically.
To put it differently, if I want elaborate branching choices with plain text in between, I’ll grab Ink rather than C; the fact that C can also do Brownian motion simulations and Ink can’t just isn’t relevant to my decision, and since Ink doesn’t try to support that, it can be much better at supporting elaborate branching choices.
Maybe the value proposition or exploration can be boiled down to a simple statement. What does a machine need to reliably execute an interactive world without hallucinating the rules. The specs are nowhere near ther, but the idea is instead of linking together llm chains held together by langchain and mcp’s (which is think is a dead end), to go down a more formal descriptive route.
I believe that the schema supporting all the features is necessary to ensure that there can be zero ambiguity wherever the creator wants the specificity. Will Wyrd (the runtime) support it all, hell no. Will a future frontier model be able to support it all and in real time? In all likelihood, most probably.
Edit: Meanwhile, I am rejoicing that I will never need to look at a CSS file in my life again ![]()
I’m just not quite sure what you’re saying here. If whatever the creator produces (YAML, Markdown, etc) describes all possible interactions and behaviors, isn’t that just…a programming language? And if those behaviors aren’t included in what the creator produced, doesn’t that either require “hallucinating the rules”, or building them into the runtime and thus losing portability?
The conventional answer to “what does a machine need to reliably execute an interactive world without hallucinating the rules?” is “a programmer specifying those rules”. It’s clear you’re passionate about this project, but I’m having trouble figuring out what it’s actually meant to do beyond a traditional programming language.