Working on a new interpreter, having difficulty

Hello!

I’m working on a new interpreter to learn Crystal, and I’m running into a problem. I figured a first good step would be to disassemble stories first, and match the venerable txd’s output.

However, I can’t seem to find the conditions to detect where code ends and text starts. My conditions for the end of a routine are:

  • a decode error
  • a return opcode, no further jumps, and valid routine byte after (byte with a number in the 0-15 range)
  • if on an “unreachable” instruction, a return opcode with a zero byte after
  • if on an “unreachable” instruction, a return opcode with a valid ZSCII string after
  • when static strings begin (easy, only known on V6 and up)

However, several Infocom files do not match any of these, and even checking for a ZSCII string is flimsy because everything is a valid ZSCII string :confused: (right now, I check if it starts with an upcase, which does not cover all cases). I’m at my wit’s end, here - how can txd reliably know where code ends on V5 and less?

Thanks for any help!

There is no guaranteed way to know where a function ends. (I don’t know what rule txd uses.) To be completely general, strings and functions could be mixed in high memory.

The interpreter only needs to be able to execute code (or print text) starting at a given address.

The spec notes

I realise there’s no “easy” way of doing it, but txd manages to do it somehow. Just thought someone knew of the heuristic used there.

The goals of a disassembler are different to an interpreter. An interpreter will always know how to interpret a memory block because the previous bit of bytecode tells it what to do with it. I don’t know how txd works but it may only detect text that is printed from routines.

txd finds strings both in code and in the string section, but it’s possible that it misses some of them in some game files. I’m certain that it uses heuristics tuned to Infocom and Inform-generated files. You’d have to look at its source code to see what they are, though.