UnZ - Unpack Z-machine file format information

Oh, I can provide that! At least in broad strokes, but I can look into the details of any part you like.

When Dialog compiles a Z-machine file, it goes:

  • Header
  • IFID
  • Main heap
  • Aux heap
  • Long-term heap
  • Object tree
  • Object property tables (usually empty)
  • “Select table”: one byte for each (select) statement that needs it, used to remember its current state.
  • “Extended attribute table”: byte arrays used to give objects additional attributes beyond the ones the Z-machine provides. These are always static; even though they’re in RAM, Dialog never emits code to change them at runtime.
  • “Globals table”: the usual Z-machine global variables, followed by a word array holding any global variables that don’t fit in the Z-machine’s registers (I think? I’ve never actually hit the Z-machine register limit, I should test this).
  • “Object variables”: each per-object variable is stored as a word array indexed by object number.
  • Abbreviation strings
  • Abbreviation table
  • Scratch space
  • Header extension table
  • Unicode translation table
  • End of RAM
  • “Wordmaps”: for each predicate queried within a (determine object), there’s a table mapping possible words to objects that can emit those words. Common words like “the” don’t get entries, and some predicates can’t be statically analyzed, but this lets the runtime cut down the number of objects that need to be checked by multiple orders of magnitude.
  • “Wordmap data”: each entry in a wordmap is either a special value meaning “too common”, an object number, or a pointer into one of these tables that lists the possible objects.
  • Dictionary
  • End of addressable memory
  • Routines
  • Strings
3 Likes

Excellent!

Some future version of unz may decode some of that. I tried looking in the code, but Linus isn’t big on commenting the code.

2 Likes

Unfortunately not. The ifcomp2025 branch has more comments than the rest, because I’ve annotated what I’m working on, but it’s still pretty sparse. (That one, notably, includes comments explaining the format of wordmaps and wordmap data tables.)

3 Likes

Before I dig into UnZ’s source code (lol), quick question: what could cause UnZ to print this?

0094F E0 3F 01 2B 01           CALL_VS [Invalid routine: 0x0958] -> L00
  *--------------------------------------------------
  |Opcode:
  | Byte  1    VAR:224 (EXT:224)
  |  11100000  0xE0 224
  |  11        variable-operand (extended)
  |    1       VAR
  |     00000  @call_vs / CALL
  |
  |Operand types:
  | Byte  2    Operand types 1-4
  |  00111111  0x3F 63
  |  00        large constant (long immediate)
  |    11      no more operands
  |      11    no more operands
  |        11  no more operands
  |
  |Operands:
  | Byte  3- 4 0x012B 299
  |
  |Store:
  | Byte  5    0x01 1 (L0)
  |
  |Pseudo-code:
  |  local0 = [Invalid routine: 0x0958]();
  *--------------------------------------------------

I’m asking specifically about the “Invalid routine” part… I’m puzzled, heh. Thanks in advance!

Ah, got it. The call is to a routine UnZ doesn’t properly recognize. (Hmm… the routine is there though).

1 Like

unz collects all routine addresses so it probably fails at finding the end of the previous routine, and therefore misses this routine start. Can you share the file?

1 Like

Sorry, forgot to post this earlier. Happy to help.

Attached is test.z8; I also threw in dumps from UnZ and txd, plus the program source as regurgitated by my duct-taped custom assembler when I had it emit the assembled hex bytes. Not included in the program source is the small trampoline code prepended by the assembler at the start that calls the real main routine (which begins with the locals header) and then quits, aligning the routine start to an 8‑byte boundary since v8 routines must be at addresses divisible by 8 for packed addressing. Learned the hard way: execution starts in an environment with no local variables (no stack frame), so you can’t return from it, heh.

test.z8 (2.8 KB)
UnZ_dump_test.z8.txt (68.1 KB)
txd_dump_test.z8.txt (5.9 KB)
asm_dump.txt (3.3 KB)

Just for reference, in this dump, the invalid routine is…

0008F0: 01                                       ; routine_header locals=1
0008F1:                                          ; label endmenu_loop:
0008F1: B2 14 E5 1C 00 00 05 18 2A 14 C1 28 A6 05 40 13 2D 28 04 2A 69 00 A6 05 45 18 2A 14 C1 28 A7 14 E5 1C A7 15 25 7C 04 5C 8A 13 04 64 86 12 E4 64 B2 14 E5 28 BF 00 96 13 44 38 99 16 45 1C A6 87 C5 ; print '\n\n    *** The End ***\n\n\n\n1) RESTART.\n2) QUIT.\n>'
000932: F6 7F 01 01                              ; read_char 1 -> (local 1)
000936: 41 01 31 40 00                           ; je (local 1) 49 ?do_restart
00093B: 41 01 32 40 00                           ; je (local 1) 50 ?do_quit
000940: B2 11 D3 6C D1 39 20 52 B9 3A 93 16 45 9C A5 ; print 'Invalid option.\n'
00094F: 8C 00 00                                 ; jump ?endmenu_loop
000952:                                          ; label do_restart:
000952: ED 3F FF FF                              ; erase_window -1
000956: B0                                       ; rtrue
000957:                                          ; label do_quit:
000957: BA                                       ; quit

; Symbols
; 0008F0  ending_menu
; 0008F1  endmenu_loop
; 000952  do_restart
; 000957  do_quit

As stated before, UnZ says “Invalid routine”.

  *--------------------------------------------------
008E3 E0 3F 01 1E 01           CALL_VS [Invalid routine: 0x08F0] -> L00
  *--------------------------------------------------

Quick summary… Ignore the “Compiled With: Inform 6.15” line, it’s not a vanilla build, it’s heavily customized by me. I didn’t bother implementing the checksum lol (it shows Calculated checksum: 0xB347, checksum error), but interpreters like Bocfel, Windows Frotz, Hunky Punk, and Fabularium open it fine. Bocfel is the most robust; before implementing the file length field, Windows Frotz opened the file but didn’t tolerate the file being longer than what the header said and later crashed when you selected an option (this also stumps txd).

As the dumps show, txd deduces the routine at 0x8f0

Routine 8f0, 1 local

  8f1:  b2 ...                  PRINT           "^^^   *** The End ***^^^^1) RESTART.^2) QUIT.^>"
  932:  f6 7f 01 01             READ_CHAR       #01 -> L00
  936:  41 01 31 80 19          JE              L00,#31 [TRUE] 952
  93b:  41 01 32 80 19          JE              L00,#32 [TRUE] 957
  940:  b2 ...                  PRINT           "Invalid option.^"
  94f:  8c ff a1                JUMP            8f1
  952:  ed 3f ff ff             ERASE_WINDOW    #ffff
  956:  b0                      RTRUE           
  957:  ba                      QUIT            

while UnZ chokes just before that, maybe at the padding around 0x008EB. I like padding.

Padding:
008EB 00 00 00 00 00 


***** STATIC STRINGS (008F0-00B3F, 592 bytes) *****
 
008F0 01 B2 14 E5 1C 00 00 05 18 2A 14 C1 28 A6 05 40  .........*..(..@
00900 13 2D 28 04 2A 69 00 A6 05 45 18 2A 14 C1 28 A7  .-(.*i...E.*..(.
1 Like

Thanks. I’ll have a look. Seems like it finds a false string start address.

1 Like

Ok. I found the problem… type of a edge case. It fails to find correct en of routines (and start of strings), because the last opcode is quit, there is no padding and that the first byte of string is a valid opcode. Now I need to find a trick to identify this…

2 Likes

Heh. :wink: