Hi all,
Like a lot of folks, I’ve been thinking about how much the parser is both the charm and sometimes the wall in front of IF. The way I see it now is that the game doesn’t really care about the exact words you write, it wants a normalized command. So I tried putting an LLM at the Glk I/O layer and left the VM and game files alone.
What I built is a small experiment in the form of a modification to Andrew Plotkin’s CheapGlk, then linked Glulxe against it. The Glk side intercepts your input, reads a few recent lines of output for context, asks an OpenAI-compatible API what the intent likely is, then hands the game a plain command like “take key” or “north”. Glulxe itself is basically untouched, I only added OpenSSL to the build and changed the link to the modified CheapGlk.
Here’s a quick gameplay demo gif (playing Try Again by Tom Devereaux) : https://log.beshr.com/playing-if-with-natural-language/demo.gif.
In practice, you can type naturally, use pronouns, even type in another language, and the interpreter passes a standard command to the game, depending on the model you use.
It tries to keep context straight. If a game asks for raw text, you can bypass the LLM by wrapping your input in brackets. So for a name prompt: [Beshr]. The idea is to keep games fully compatible, no changes to game files, just a more forgiving input path.
A few caveats. Interpretation isn’t perfect or always consistent. There is latency, often around 0.5s. Using hosted models costs money if you play for long sessions. You can point it at local models through an OpenAI-compatible endpoint if you prefer. Model quality matters a lot, I’ve had decent results with google/gemini-2.5-flash. Smaller models can work but probably need tuning for this specific task.
If you want to try it:
- Glulxe fork: https://github.com/beshrkayali/glulxe
- CheapGlk fork: https://github.com/beshrkayali/cheapglk
What I’m trying to understand is whether this lowers friction without flattening the interesting parts of IF. Does it keep puzzle-solving intact? Is the extra uncertainty from interpretation amusing or just annoying? Also curious which types of games break this quickly, room ambiguity, heavy conversation systems, custom verbs, that sort of thing.
If you have thoughts on evaluation I’d love to hear it. Reports about specific games that worked well or failed would be very helpful too!
More details and a longer write-up here: Playing IF Games with Natural Language.