zorkie - zil/zilf to .z compiler v1-8

I created a new Python-based zil/zilf-to-.z compiler. It does the traditional 6 or so steps in one program. For the infocom games with sources it produced smaller .z than the infocom system. Available for install via pip. See awohl dot com (I can’t post links).

1 Like

Was AI used to generate most of the code?

Seems like it.

Claude is listed as a co-author.

I appreciate the eagerness and interest, and I myself am a big fan of announcing things before they’re ready for public consumption (hence the name). I’m even a believer in AI coding assistants.

However, in my experience, AI coding assistants are especially prone to hallucinating when it comes to ZIL and MDL, because there are so few examples of valid code in these languages. For example, every difference this document identifies between the syntax of “MDL ZIL” and ZILF is incorrect, and several of the operations listed in this file do not exist.

That said, this project would benefit greatly from some concrete examples of what it can compile and what sort of output it produces. I see several places in the repo where one might look for working examples: examples, games, sample_z, tests/test-games, tests/test-games/examples, etc. Which of those can actually be compiled and run to completion?

9 Likes

One thing I found helpful while developing ZILF was to set up automated end-to-end tests that compile a project, run it in an interpreter with a predictable RNG, feed in a command file based on a walkthrough, and compare the expected output to a known-good “golden” transcript. This is useful not only for confirming that the compiler works, but also for making sure that the changes you make to support new games don’t break the games that were already working.

AI coding assistants can be very effective when they have objective tests to work with, so you may want to do this yourself too. If you want to test against Infocom’s games from historicalsource, I’d recommend starting with these:

Zork I
Zork II
Zork III
Sherlock
Beyond Zork
Zork Zero
Bureaucracy
Zork (German)

…as well as the ZILF samples Cloak of Darkness and Advent.

As you move down that list, you may find that many of the things Claude dismissed as “unnecessary”, “rare”, or “complex” are more important than they first seemed.

6 Likes

I also find it a bit alarming that your LLM reports:

Original Infocom source code works perfectly

  • Zork (if we had source): ✓ Would compile

It seems to be asserting that based on no evidence whatsoever! A compiler needs to be tested by actually compiling things and examining the output, not just claiming that they “would compile”. And the three Zorks are in fact the only Infocom games whose source is released under an open license (MIT), letting them be incorporated into test suites.

6 Likes

Excuse my ignorance, but how do you know it would compile if you don’t have the source? And how do you know that the source code works perfectly if you don’t have the source code and you can’t compile it?

2 Likes

Yes, AIs will try and weasel out of any to-do list. However, I have been making compiler test suites since the 1970s and know with an AI to ask how many of the say 117 inputs have proper output (and verify it). CLAUDE told me at least 10 times we had complete coverage of V1-V8 (except V7, it didnt find examples of). Yet when I went to play the following higher V version example, it didnt do it at all.

I did try running a game or two from each .z version. But I didn’t make a systematic list of what worked. I don’t know anything that doesn’t work. I’ll check the list you mentioned and more.

Here it the list of what worked out of the box from your list:

Compilation Results Summary

Successful Compilations

Game Version Size Notes
Zork I z3 35,884 bytes Full compilation
Enchanter z4 37,928 bytes Full compilation
ZILF Hello z3 638 bytes Standalone sample
ZILF Mandelbrot z5 944 bytes Standalone sample
ZILF Name z3 1,006 bytes Standalone sample

Problems being fixed now:

Failed Compilations - Issues to Fix

| Game | Issue | Root Cause
|
|-------------|-------------------------------------|---------------------------------------------
-----------------------------------------------------------------------------------------|
| Zork II | Unexpected character : at line 3580 | Lexer angle_depth goes negative (-1),
causing comment parsing to fail. The : appears in a string inside a commented form ;<COND …> |
| Zork III | Unexpected character : at line 3496 | Same issue as Zork II
|
| Beyond Zork | Unexpected character : at line 3511 | Same issue - lexer state corruption
|
| Zork Zero | Unexpected character : at line 128 | Same issue
|
| Bureaucracy | Unexpected character : at line 187 | Same issue
|
| Zork German | Unexpected character % at line 2068 | % used in atom names for special German
characters (e.g., SKARAB%AUS) - not handled by lexer |
| Sherlock | Missing file debug.zil | The repository is incomplete
|
| ZILF Cloak | Missing parser.zil | Needs zillib - uses <INSERT-FILE “parser”>
|
| ZILF Advent | Missing parser.zil | Needs zillib - uses <INSERT-FILE “parser”>
|

Bugs to Fix

  1. Lexer angle_depth tracking (zilc/lexer/lexer.py)
    • The angle_depth counter goes negative when there are unbalanced > characters
    • This corrupts the lexer state and causes form comments (;<…>) to be parsed incorrectly
    • The : inside strings in commented forms is then treated as a token
  2. % character in atoms (zilc/lexer/lexer.py:472)
    • The Zork German source uses % in atom names for umlaut encoding (e.g., SKARAB%AUS for
      Skarabäus)
    • Need to add % to valid atom characters in is_atom_char() or handle it specially
  3. Missing library support
    • ZILF samples using <INSERT-FILE “parser”> need the zillib path to be included
    • The compiler needs to search include paths similar to ZILF’s -i flag

“Running” as in running the compiler, or did you play the resulting games? I wouldn’t check them off the list until you know they can be played through to a winning outcome, at the very least.

“Worked” in what sense?

If your Zork I compilation comes out to only 35 KB, I guarantee it’s not a complete and functional game.

4 Likes

I am more used to making test data for normal programming languages. Test programs with expected output. Is there any automated test for .z files?

The z-code part of Zork I and Enchanter are maybe your numbers. The complete playable games should be around 80-90 kB. I very much doubt the objects gets compiled correctly.

1 Like

Well, there’s the approach I took with ZILF’s integration tests, which you can see as Zilf.Tests.Integration in the ZILF repo.

But before automated tests, I’d suggest trying some manual tests. Compile the game, play it, and see how far you can get. Walkthroughs are easy to find on Google.

That’s what I did to test Zork I for Glulx - I don’t have automated tests for Glulx yet, so I found a walkthrough and iterated on the compiler until I could make it to the end of the game without any crashes or bugs.

4 Likes

I had CLAUDE download the top 50 games. It made a tool named zwalker that runs around trying all the words in each room. It has an AI add-on to use OpenAI or Anthropic for intelligent exploration. The goal is to get a JSON walkthrough of commands to finish each of the top 50.

Then see if the same commands work with a file compiled by zorkie. Also, to play the commands through my .z to the JavaScript compiler to test it as well.

Also I got the z machine validation programs and fixed z2js to pass them.

This sounds a lot more complicated, and a lot less likely to work, than just downloading a walkthrough and typing in commands by hand. Is there some reason you’re unwilling to do that?

6 Likes

Vibe coding…

Please don’t paste big chunks of LLM output as responses here! The forum rules are that your posts should be in your own words.

I’m curious, if you don’t mind… what’s your motive for the project? It sounds like you’re not that interested in writing the code, since Claude is doing everything; but it also sounds like you’re not interested in running it, since you haven’t eg. played any of the ostensibly-compiled games. What are you hoping that you and the community will get out of this project?

7 Likes

For regression testing, the idea is to rerun solves for all the games after messing with the interpreter to see that the fix worked and didn’t break anything. Well, that’s how it works in compiler testing, which is what I’m used to. I guess it remains to be seen how well that works with IF

That’s also how it tends to work with IF—we do the same thing with Dialog, for example—but generally with human-written test cases that humans analyze the output of. LLMs will happily claim that all their tests would pass:

  • Zork (if we had source): ✓ Would compile

But you need to actually hold them accountable for that! Don’t trust them every time they say “yeah everything works great, don’t bother checking for yourself”. Like Tara said, there’s no way a full, working version of Zork 1 compiles down to only 35 KiB.

So: have you, personally, not an LLM, tested the output of this compiler to make sure it actually works? That’s something you absolutely need to do before you publish a new compiler and release it to the world.

6 Likes

Also, Tara’s already linked the ZILF test cases (which will probably be the most useful to you, since this is a ZIL compiler), but for Dialog, we test:

  • 31 examples from the first part of the manual, covering various aspects of the language’s syntax and built-in features
  • 5 additional examples to test new features added since the community took over the project
  • 23 examples from the second part of the manual, covering various aspects of the standard library
  • 3 additional examples to test new features added since the community took over the project
  • 1 example designed to trigger compiler warnings (we’re working on adding more to this category now)
  • 1 example (Cloak of Darkness) run through RemGlk to test output going to different windows, which dumbfrotz can’t handle
  • 1 full game (The Impossible Stairs) that uses the normal standard library, tested from start to finish
  • 1 full game (Miss Gosling’s Last Case) that uses a modified standard library, tested from start to finish

No version of the compiler can be officially released (in fact, you can’t even pull request to the main branch) unless every single one of these passes, and every one of them has been checked over from head to tail by multiple humans to make sure it’s testing what we want it to be testing and the output is what we want it to be.

If you want an LLM to write your compiler for you, you have to do at least this level of testing to make sure that what it’s producing is useful. And it has to be set up by a human, because otherwise you have no way of knowing that it’s actually testing what you want it to be testing. I would say “LLMs lie”, but “lying” suggests they have any sort of awareness of reality rather than being a glorified pattern-matcher; it’s more accurate to say they will produce dozens of lines saying “all tests pass!” simply because it’s what the good projects in their training data said.

7 Likes