A hobby project of absurd scale: building an IF schema language with AI as the workforce

Hi all, new here. Wanted to introduce myself and a project I’ve been working on, partly because this seems like the community most likely to tell me where I’m wrong.

I’m building Urd, a declarative schema language for interactive worlds. The short version: you author .urd.md files (Markdown with YAML frontmatter and a set of sigils for dialogue, choices, conditions, effects), and a compiler produces JSON that any runtime can consume. The scope is deliberately broad. Spatial models, typed entities, containment, state mutation, and narrative flow in a single format. Think of it as trying to unify what Inform does for world modeling with what ink does for portable narrative, in a format that’s authored in a text editor and consumed by anything.

The spec covers a parser, linker, compiler pipeline, a formal grammar, a JSON output schema, and a reference runtime (Wyrd). For a single person, I genuinely can’t think of a more complex thing to attempt short of writing an operating system.

Here’s the part that might raise eyebrows: I’m building this almost entirely with AI assistance. Not “I asked ChatGPT to write me a compiler,” but more like a structured pipeline where AI agents draft specs, other agents review them, findings get consolidated, contradictions get surfaced, and I make the architectural decisions. It’s an experiment in how far you can push spec-driven development when AI is doing the heavy lifting on volume and you’re doing the heavy lifting on judgment. A giant iterative sausage making machine of AIs reviewing each other’s work, basically.

The process itself is almost more interesting to me than the product. This project lets me explore a question I find genuinely fascinating: can a single person with AI leverage actually architect and deliver something at this complexity level, if they’re disciplined about specifications and validation?

So here’s what I’m looking for from this community:

  1. The thesis. Urd’s core claim is that no existing format unifies spatial simulation, typed world state, and portable narrative in a single engine-agnostic schema. Inform 7 comes closest but is coupled to its runtime. ink is portable but dialogue-only. Twine/Twee handles branching but has no world model. Is this gap real, or am I missing something?

  2. The approach. Spec-first, then formal grammar, then compiler, then runtime. The test corpus is split into positive cases (must parse) and negative cases (must fail with correct errors). Each bug becomes a new test. Does this track with how compiler projects in this space have been bootstrapped, or are there lessons I should learn before I learn them the hard way?

  3. The AI elephant in the room. I know “AI-generated” triggers a justified skepticism reflex. I’d ask that you evaluate the specs on their merits rather than their origin. That said, if the specs read like slop, I genuinely want to know. That’s exactly the kind of AI-induced psychosis check I need. The whole point of posting here is to get feedback from people who’ve spent decades thinking about interactive worlds, not to market a product.

The project is at https://urd.dev if you want to look at the actual specifications before forming an opinion. Everything is public.

Happy to answer questions about the architecture, the AI workflow, or the design decisions. And if the consensus is “you’re reinventing something with extra steps and no benefits”, I’d rather hear it now.

2 Likes

Hello, and welcome to the site!

The first thing I see when opening your site is a page full of glowing AI-generated testimonials. This is immediately a red flag. Current commercial LLMs will gush and rave over anything you give them, because users like being praised. In my opinion, testimonials from sycophants are worse than no testimonials at all. (In fact it looks like every single file has these sorts of panegyrics attached…)

I scrolled down to the bottom to check out the “Specification” links…only to find that they don’t work. Clicking these does nothing.

So I scrolled back up to look at “a quiet introduction”.

Inform, born in 1993, gave us the first serious world model for interactive fiction: rooms, objects, containment, rules, paired with a natural-language parser.

This is false. Inform was not the first parser IF language with a built-in world model. TADS, for example, goes back to 1988. ZIL goes back further (though it wasn’t commercially available for a long time).

Scrolling down a bit further:

Declarative, not imperative. You describe what the world is, not what it does. A door is locked. A guard reveals information under certain conditions. The runtime figures out when and how. Outcomes emerge from structure, not scripted sequences.

It sounds like you’re trying to delegate all the actual behaviors to the runtime. This is something that’s been tried before, all the way back to the 80s; we sometimes call it “database-driven” IF.

The problem is that the parts of an IF work that use only standard, built-in behaviors tend to be rather boring. The interesting parts are the ones that aren’t built-in. If you think of a parser IF game that you especially liked, I would bet money that the parts you liked most are not built into that system’s standard library.

You said earlier:

Inform came closest. It unified space, objects, rules, and narrative in a single system, and it did it thirty years ago. But the world it describes is inseparable from the Inform runtime. You cannot hand an Inform world to Unity, to Godot, to a browser, to an AI. The world model and the execution engine are the same thing.

But if all the behavior is defined by your runtime, that’s just as true for your new system as it is for Inform. If the difference is that you can embed your runtime in Unity/Godot/etc, well…you can also do that with Inform’s runtime, as of version 10.

It sounds harsh, but when I look at your website, I mostly see a lot of very confident bullshit, using big words and lavish praise to say very little…and what it says is often outright incorrect. That does not fill me with a desire to read more.

14 Likes

Right, I was going to say, what makes this any more engine-agnostic, exactly? You can put a z-machine or glulx implementation anywhere. There is no Inform runtime, not in that sense at least.

1 Like

Do you have a playable demo? Preferably with source included?

I’m afraid I’m not particularly interested in spending my time poring over lengthy LLM-authored specifications (your website seems to imply the required reading time is somewhere around the four-hour mark?). But there is one key decision about your process which I think is ill-advised and has the potential to save you days or months of wasted effort.

I think this is backwards and here’s why. We see a lot of “new system” announcements in this community, and the prevailing wisdom is that no-one cares about your system until it produces an “admirable game”, that is, one which inspires people to say “I want to make a game like that, so I’m going to check out how it was created.” Your description of the project has a lot of detail about the tooling but I’m far from clear what actually happens to turn your huge blob of JSON into a play experience. As far as I can tell, either the runtime is LLM-powered as well or it’s a database-driven model of the sort @Draconis mentioned. The problem is, neither of these approaches have so far produced many games that people want to play. Getting to the point of having a single game which can actually be played is the last step in your process, despite the fact that this is the most experimental part of your project and the part most badly in need of a proof-of-concept.

My advice would be to start at the other end. Get a prototype of your runtime working, then hand-author or otherwise hack together a single example of a short but engaging game that showcases the benefits of your system and get people to play it. If that’s a success, you can get back to work on the rest of the pipeline. If it’s not a success, you need to rethink what the games produced by your system are actually going to look like before you go to the effort of making a huge toolchain to produce them.

Yeah, if you want anyone to take you seriously apart from people already in the grip of AI psychosis, get this off your front page ASAP.

7 Likes

Grok telling you to ship it before you actually have a specification for the grammar is a joke that writes itself.

6 Likes

Thanks for taking the time to put the feedback together.

I think the core issue is that I framed this badly. This isn’t “I built a new IF engine”. Instead, it’s an experiment in AI-driven spec development. The entire pipeline, including the reviews you saw, is generated through AI conversations that challenge and iterate on each other. The site is essentially a dev log of that process. The irony is in a certain sense structural.

The idea is: write a declarative spec, hand it to current models, see if they can build something like the Barcelona cathedral from the ground up, and watch where it crumbles.
Then come back later with better models and see if the failure points move. It’s a test of what spec-driven development with AI can actually produce, not a claim that the resulting system is ready to compete with Inform or TADS.

That said, none of that context comes through on the site right now. It reads like confident product marketing rather than an honest experiment, and when the claims don’t hold up to scrutiny, it just looks like bullshit, as you said. That’s on me.

However, the site itself as well as all the supporting documents is being used as the context for the AI experiement. The goal is that the site generates and updates itself automatically as part of the process.

I’m going to rework the framing to make the experimental nature clear upfront, fix the historical inaccuracies, and address the “what happens when authors need non-standard behaviors” question directly. Your feedback is genuinely useful, appreciate it.

Regarding the architecture section not clicking, that is currently an issue. As you know, mermaid diagrams look like spaghetti charts and the architecture document is a special svelte island that does not feed back into the AI context. So the best way I thought was to include it at the bottom of the page. I think that I will just convert it to PDF and leave it as a dangling artefact that is directly linked.

Regarding the feedback from others on shipping a playable game, you are 100% correct. However, this is the typical case of developer procrastination where you end up writing a game engine instead of the game. Except that this is now weaponised with modern AI allowing you to ask the question: “can I automate that creation process too?”

This thesis is wrong in every part, fundamentally wrong and confused.

Let’s start with your confused idea that Ink is more “portable” than Inform, that Inform is “coupled to its runtime.”

Inform generates Z-code/Glulx game-file documents, bytecode intended to be run in any interpreter. Z-code/Glulx interpreters are extremely portable, especially Z-code. Z-code interpreters are available for for every platform you’ve ever heard of, including platforms too resource-constrained to run a web browser, like Commodore 64, for embedded CPUs, for Postscript printers.

There are also HTML+JS interpreters for Z-code and Glulx. (And Twine and Ink.) You can run HTML+JS run on many platforms that don’t even let you compile C.

Ink isn’t designed to be more portable than Inform; instead, it’s designed to be embeddable in a video game. You can design any kind of video game with brief choice-based text-based interludes, and Ink is a great fit for that. And when I say “any kind of video game,” I mean any kind of video game. You can have a jumping platformer with interludes of text-based IF, or an arena brawler like Super Smash Bros., or a civilization building game. Any kind of game can pause and have an IF interlude.

Next, let’s look at “Twine/Twee handles branching but has no world model.”

“World model” and “spatial simulation” are terms with no clear definition; a world model (or a spatial simulation) can be as rich or as simple as you need it to be. You can model a game world with a handful of boolean variables, or with dozens of quantities, or with a nested hierarchy of objects in locations. In Hunt the Wumpus, int wumpus_location = 7 is a very simple spatial simulation, strongly typed. Any Turing-complete language allows you to create a world model as rich (or as simple) as you like.

Twine and Ink are both Turing complete; they expect you to model the world at the level of complexity you need.

Neither Ink nor Twine are “dialogue-only.” Many Twine/Ink games contain no dialogue at all. (Dialogue is just an especially obvious use case for embedding choice-based text-based IF in a video game.)

I think you may have wrongly assumed that Twine games are like paper Choose Your Own Adventure books, where the only game state is the current page number. They’re more like Fighting Fantasy books, single-player gamebook RPGs that you’d play with a pencil, a paper character sheet and dice.

One key difference between Inform and Twine/Ink is that Inform provides a sophisticated world model. You don’t start an Inform game by defining what a room is, what an object is. Those are all part of Inform’s standard library.

But, let’s say I understood what you said as meaning “a rich, spatial world model, provided out of the box.”

The problem then is: how on earth would you integrate someone else’s rich, spatial world model into a video game? How would you integrate that into a platformer? Into a shmup? Into Super Smash Bros.? Would you pause the brawling to let the player move from room to room, collecting inventory, and unlocking doors…? And what would that have to do with the rest of the video game?

People have written Z-code interpreters in C# that you could embed into a C# Unity project. I have no idea why anyone would want to do that, but you can do it today, without defining a new schema language.

In conclusion, Inform 7 is more portable than Ink, but Ink is designed to be embedded in a video game. All of them support world models, but Inform 7 provides its own, out of the box, which is exactly what would make it silly to integrate in a video game.

Urd starts from a simple premise: what if you could describe an entire interactive world as structured data?

Not code. Not a script tied to an engine. Just a clear, typed description of what exists, where things are, what the rules are, and what can happen. A description that any runtime — whether a browser, a game engine, a text terminal, or an AI — could pick up and execute.

What would be the point of that? AI can write code now. You can generate an Inform game today, and it will run anywhere. It wouldn’t even be easier for an AI to write Urd than it would be to write Inform.

It seems to me that you were so preoccupied with whether you could implement this project in AI that you never stopped to think about whether there would be any benefit to doing so, even if you swallow the assumption that using AI to generate games is even a good idea.

10 Likes

I already made a lot of changes based on the collective feedback (i still need to regenerate the audio). Also, thank you for mentioning Inform 10. I will make sure to add something specifically about that.

I believe that Inform 10 has not solved the problem of automated reachability analysis, dead-end detection, exhaustion checking, or path coverage. Correct?
I am not saying that Urd will manage to, but that is clearly a target that the little robots are told to aim towards.

That’s an interesting question. Perhaps I might redirect your attention to the topic of automatically testing Inform games, instead of this thing you’re imagining building here.

Automated reachability analysis is a subset of the provably undecidable halting problem. (“Will this program reach line 999 and halt?”) But there are a bunch of partial solutions, and property-based testing can be brought to bear on Inform games. Some people have done so in the past.

Most people don’t normally bother with that for free text adventures distributed online, but if you could provide something easy to use and learn (and not too expensive to run; property-based testing with genetic algorithms can be very CPU intensive), that would be useful.

4 Likes

Your feedback is gold. That’s exactly the reason why I came here before pulling the trigger on generating the peg and pest references for the markdown version (backlog folder).

You guys have definitely given me enough food for thought to have another big round of discussion with the AI and question the basic fundamentals of the why and how.

Once you understand that having further discussions with the AI is not the way to go about this - and only then - you might be able to understand the why and how.

12 Likes

That is quite a sweeping statement given the trajectory we are on and the reality at the workplace.

But I totally understand the sentiment.

1 Like

AI only works when you already know what you’re trying to accomplish. When you try to use it to figure out what to accomplish, you’re talking in an echo chamber. Maybe you’ll help yourself figure out what you want to accomplish that way, but you’re as likely to lead yourself in weird, pointless directions, as the AI hallucinates silly answers to the question that sound plausible. That happened to you here. It sent you on a wild goose chase.

If you do the same thing again, you’re going to go on another wild goose chase.

9 Likes

What work did you do yourself before engaging with AI? How did you make yourself confident that you could assess what was coming out?

2 Likes

That is quite a sweeping statement given the trajectory we are on and the reality at the workplace.

But I totally understand the sentiment.

To be totally honest. If I manage to keep this monorepo with mixed technologies in a manageable state and churn out a CLI that can compile a new schema into a simulation run of the monty hall problem at runtime, I will be happy.

What I would then really love to see is as the new more capable models and systems come out, how much or little assistance they will need to go through all the docs and spit out working code that passes all the tests.

I am under no illusion here, I am not building a SaaS business with AI. This is really an experiment.

The feedback you have given me has honestly been gold. It has already helped me reshape and sharpen the narrative in the introduction to the site.

I am pretty certain that I can assess whether the sausage making machine produces outputs that can pass its own validation and tests :smiley:. Also, I always wanted to build an LSP server, so that gives me an excuse to give it a try.

Try to embrace for a moment the spirit of the 80’s/90’s demo scene, where you simply tried something out, not because there was a use but because it could be done.

My company already uses AI agents 24/7 and each developers have their little kitchen to basically use the machines to do all the typing. More recently, I have observed that our client support teams, integration engineers and product are using AI all day long to produce very high quality outputs.

So the hobbyist motivation for me is obvious. How would a repo look like that is able to tackle something absurdly complex. Simply telling it to write a C compiler has already been done.

If you have the specs to something that already has been implemented, that is also too easy. So here we are. Me asking in the forum where I suspect a lot of knowledge and wisdom to exist on the subject of IF to help me out in pointing the little robots in at least somewhat of the correct direction.

Parsing a language is the easy part. I’d start with the runtime so you can do end-to-end testing. Tests for “does the compiler turn this code into the expected JSON” are bigger and more brittle than tests for “does this code produce the expected output when run”, and more likely to cost a lot of time/effort/tokens to fix broken tests as the compiler’s output changes.

Starting with the spec and formal grammar runs the risk of creating a language that’s cumbersome to write in. Going in the opposite direction gets you to the point of writing meaningful code in the language sooner, and it’s easier to judge whether this is the language you want to be making when you have some real code to look at.

5 Likes

Yes, I am certainly at the traditional chicken and egg inflection point. I was actually toying with the idea of starting with an importer for the ink-proof tests. Translating them into urd and then taking the compiler output and translating it back into ink again for runtime testing. I am not sure if that is a totally crap idea that will unlock a whole new set of issues.

At least I would know that the tests should be able to pass.

Edit: Btw, I love your zilf site. I stumbled across it a few days ago already. So it’s an honour :smiley:

1 Like

If the LLM could answer these questions about the basic fundamentals of the why and how, wouldn’t it have given you better answers in the first place?

LLMs are programmed to grovel and beg for forgiveness when you say that they’re wrong, because users like it when they do that. They’re not programmed to fundamentally change how they work as a result. Their apologies are nothing but smoke and mirrors to keep you from blaming the technology when it screws up.

I said earlier that sycophantic testimonials are worse than no testimonials at all. I believe the same is true for advising on design decisions. (Most decisions, really.) If people praising every thought and gesture led to great art, then Nero would have been the greatest composer, musician, poet, playwright, and actor in the history of the Western world—and to all accounts he was not.

Which is harsh, but—

I’m trying to provide what you asked for in a more useful way than an LLM can.

I think this community in particular is very familiar with this ethos! IF is an incredibly unprofitable niche. Most of the people here are in it for the love of the game. If you want to design your own system exactly as you like, more power to you. But you asked for feedback, so we’re trying to steer you away from the dead ends people have run into over the past fifty years.

5 Likes