UnZ - Unpack Z-machine file format information

Oh, I can provide that! At least in broad strokes, but I can look into the details of any part you like.

When Dialog compiles a Z-machine file, it goes:

  • Header
  • IFID
  • Main heap
  • Aux heap
  • Long-term heap
  • Object tree
  • Object property tables (usually empty)
  • “Select table”: one byte for each (select) statement that needs it, used to remember its current state.
  • “Extended attribute table”: byte arrays used to give objects additional attributes beyond the ones the Z-machine provides. These are always static; even though they’re in RAM, Dialog never emits code to change them at runtime.
  • “Globals table”: the usual Z-machine global variables, followed by a word array holding any global variables that don’t fit in the Z-machine’s registers (I think? I’ve never actually hit the Z-machine register limit, I should test this).
  • “Object variables”: each per-object variable is stored as a word array indexed by object number.
  • Abbreviation strings
  • Abbreviation table
  • Scratch space
  • Header extension table
  • Unicode translation table
  • End of RAM
  • “Wordmaps”: for each predicate queried within a (determine object), there’s a table mapping possible words to objects that can emit those words. Common words like “the” don’t get entries, and some predicates can’t be statically analyzed, but this lets the runtime cut down the number of objects that need to be checked by multiple orders of magnitude.
  • “Wordmap data”: each entry in a wordmap is either a special value meaning “too common”, an object number, or a pointer into one of these tables that lists the possible objects.
  • Dictionary
  • End of addressable memory
  • Routines
  • Strings
3 Likes

Excellent!

Some future version of unz may decode some of that. I tried looking in the code, but Linus isn’t big on commenting the code.

2 Likes

Unfortunately not. The ifcomp2025 branch has more comments than the rest, because I’ve annotated what I’m working on, but it’s still pretty sparse. (That one, notably, includes comments explaining the format of wordmaps and wordmap data tables.)

3 Likes

Before I dig into UnZ’s source code (lol), quick question: what could cause UnZ to print this?

0094F E0 3F 01 2B 01           CALL_VS [Invalid routine: 0x0958] -> L00
  *--------------------------------------------------
  |Opcode:
  | Byte  1    VAR:224 (EXT:224)
  |  11100000  0xE0 224
  |  11        variable-operand (extended)
  |    1       VAR
  |     00000  @call_vs / CALL
  |
  |Operand types:
  | Byte  2    Operand types 1-4
  |  00111111  0x3F 63
  |  00        large constant (long immediate)
  |    11      no more operands
  |      11    no more operands
  |        11  no more operands
  |
  |Operands:
  | Byte  3- 4 0x012B 299
  |
  |Store:
  | Byte  5    0x01 1 (L0)
  |
  |Pseudo-code:
  |  local0 = [Invalid routine: 0x0958]();
  *--------------------------------------------------

I’m asking specifically about the “Invalid routine” part… I’m puzzled, heh. Thanks in advance!

Ah, got it. The call is to a routine UnZ doesn’t properly recognize. (Hmm… the routine is there though).

1 Like

unz collects all routine addresses so it probably fails at finding the end of the previous routine, and therefore misses this routine start. Can you share the file?

1 Like

Sorry, forgot to post this earlier. Happy to help.

Attached is test.z8; I also threw in dumps from UnZ and txd, plus the program source as regurgitated by my duct-taped custom assembler when I had it emit the assembled hex bytes. Not included in the program source is the small trampoline code prepended by the assembler at the start that calls the real main routine (which begins with the locals header) and then quits, aligning the routine start to an 8‑byte boundary since v8 routines must be at addresses divisible by 8 for packed addressing. Learned the hard way: execution starts in an environment with no local variables (no stack frame), so you can’t return from it, heh.

test.z8 (2.8 KB)
UnZ_dump_test.z8.txt (68.1 KB)
txd_dump_test.z8.txt (5.9 KB)
asm_dump.txt (3.3 KB)

Just for reference, in this dump, the invalid routine is…

0008F0: 01                                       ; routine_header locals=1
0008F1:                                          ; label endmenu_loop:
0008F1: B2 14 E5 1C 00 00 05 18 2A 14 C1 28 A6 05 40 13 2D 28 04 2A 69 00 A6 05 45 18 2A 14 C1 28 A7 14 E5 1C A7 15 25 7C 04 5C 8A 13 04 64 86 12 E4 64 B2 14 E5 28 BF 00 96 13 44 38 99 16 45 1C A6 87 C5 ; print '\n\n    *** The End ***\n\n\n\n1) RESTART.\n2) QUIT.\n>'
000932: F6 7F 01 01                              ; read_char 1 -> (local 1)
000936: 41 01 31 40 00                           ; je (local 1) 49 ?do_restart
00093B: 41 01 32 40 00                           ; je (local 1) 50 ?do_quit
000940: B2 11 D3 6C D1 39 20 52 B9 3A 93 16 45 9C A5 ; print 'Invalid option.\n'
00094F: 8C 00 00                                 ; jump ?endmenu_loop
000952:                                          ; label do_restart:
000952: ED 3F FF FF                              ; erase_window -1
000956: B0                                       ; rtrue
000957:                                          ; label do_quit:
000957: BA                                       ; quit

; Symbols
; 0008F0  ending_menu
; 0008F1  endmenu_loop
; 000952  do_restart
; 000957  do_quit

As stated before, UnZ says “Invalid routine”.

  *--------------------------------------------------
008E3 E0 3F 01 1E 01           CALL_VS [Invalid routine: 0x08F0] -> L00
  *--------------------------------------------------

Quick summary… Ignore the “Compiled With: Inform 6.15” line, it’s not a vanilla build, it’s heavily customized by me. I didn’t bother implementing the checksum lol (it shows Calculated checksum: 0xB347, checksum error), but interpreters like Bocfel, Windows Frotz, Hunky Punk, and Fabularium open it fine. Bocfel is the most robust; before implementing the file length field, Windows Frotz opened the file but didn’t tolerate the file being longer than what the header said and later crashed when you selected an option (this also stumps txd).

As the dumps show, txd deduces the routine at 0x8f0

Routine 8f0, 1 local

  8f1:  b2 ...                  PRINT           "^^^   *** The End ***^^^^1) RESTART.^2) QUIT.^>"
  932:  f6 7f 01 01             READ_CHAR       #01 -> L00
  936:  41 01 31 80 19          JE              L00,#31 [TRUE] 952
  93b:  41 01 32 80 19          JE              L00,#32 [TRUE] 957
  940:  b2 ...                  PRINT           "Invalid option.^"
  94f:  8c ff a1                JUMP            8f1
  952:  ed 3f ff ff             ERASE_WINDOW    #ffff
  956:  b0                      RTRUE           
  957:  ba                      QUIT            

while UnZ chokes just before that, maybe at the padding around 0x008EB. I like padding.

Padding:
008EB 00 00 00 00 00 


***** STATIC STRINGS (008F0-00B3F, 592 bytes) *****
 
008F0 01 B2 14 E5 1C 00 00 05 18 2A 14 C1 28 A6 05 40  .........*..(..@
00900 13 2D 28 04 2A 69 00 A6 05 45 18 2A 14 C1 28 A7  .-(.*i...E.*..(.
1 Like

Thanks. I’ll have a look. Seems like it finds a false string start address.

1 Like

Ok. I found the problem… type of a edge case. It fails to find correct en of routines (and start of strings), because the last opcode is quit, there is no padding and that the first byte of string is a valid opcode. Now I need to find a trick to identify this…

2 Likes

Heh. :wink:

I’ve run into a minor annoyance: it chokes on blorb files. It’s no great difficulty to pull out the Z-machine part with blorbtool, but when I’m extracting gametext.txt files from a bunch of games at once, it’s a bit of a hassle.

How hard would it be to include blorb support in a future version? (If it’ll be a problem, I can just write a little wrapper that unblorbs the game first.)

I’ll add that to the list…

1 Like

A bit late for your project, but the “under development” version (0.17) of unz now works on blorb-files with a ZCOD chunk.

2 Likes

Unfortunately it still doesn’t seem to recognize my zblorb.

Unknown z-machine version, 70.
Try ‘unz -h’ for more information.

Could DM me the zblorb?

I’ll do you one better!

Hmm, what version do you use, and on what platform?

I havn’t done any “official” release yet on 0.17 but the version under development from 3rd of April or 26th of April works for me. You can build your own from the source on GitHub or use one of the compilation I have done on Version 0.17_develop - Google Drive

C:\Users\heasm\Downloads>unz hasawa.zblorb | more

***** ANALYZING *****

Filename:                                  hasawa.zblorb
Compiled With:                             Dialog 1a/01
Z-machine version:                         8
Calculated checksum:                       0xF290, checksum ok
IFID:                                      UUID://E2A83236-8E5B-4D96-887A-0B20EBA9A671//
Object count:                              340
Dictionary word count:                     1366
Scanning for routines from:                0x063F0
Found first routine at address:            0x063F0
Lowest routine address (immediate call) :  0x06458
Highest routine address (immediate call):  0x4B670
Strings start at address:                  0x4B6D8
Highest used global in z-code:             118
Number of used globals in z-code:          107
Number of unique properties:               0

***** MEMORY MAP *****

00000-043DE DYNAMIC MEMORY
   00000-0003F Header table, 64 bytes.
   00000-00001 Abbreviation strings, 2 bytes.
   00002-0003F Unidentified data, 62 bytes.
   00040-0006C IFID, 45 bytes.
   00040-013C7 Main heap, 5 000 bytes.
   013C8-017AF Aux and long-term heap, 1 000 bytes.
   017B0-01D8B Unidentified data, 1 500 bytes.
   01D8C-01E09 Object defaults table, 126 bytes.
   01E0A-030A1 Object tree table, 4 760 bytes.
   030A2-03BC5 Object properties tables, 2 852 bytes.
   03BC6-03C33 Unidentified data, 110 bytes.
   03C34-03D1F Global variables, 236 bytes.
   03C34-03C3F Scratch area, 12 bytes.
   03C34-0433D Unidentified data, 1 802 bytes.
   03C40-04271 Unidentified data, 1 586 bytes.
   04272-03C33 Abbreviation table, -1 598 bytes.
   0433E-04345 Header extension table, 8 bytes.
   04346-043DE Unicode translation table, 153 bytes.

043DF-063EF STATIC MEMORY
   043DF-063ED Vocabulary/Dictionary, 8 207 bytes.
   063EE-063EF Padding, 2 bytes.

063F0-7B697 HIGH MEMORY
   063F0-4B6D7 Z-code, 283 368 bytes.
   4B6D8-7B697 Static strings, 196 544 bytes.


***** HEADER (00000-0003F, 64 bytes) *****

00000 08                      VERSION Z-machine version:         8
00001 00                      MODE    Flags 1:                   0x00
00002 00 67                   ZORKID  Release number:            103
00004 63 F0                   ENDLOD  Base of high memory:       0x63F0
00006 63 F1                   START   Initial value of pc:       0x63F1
00008 43 DF                   VOCAB   Dictionary:                0x43DF
-- More  --

(Just saw thar the abbreviation table is reported strange, I’ll have to look into that…)

Hmm, I wonder what’s going wrong then? I’m using the one from the Google Drive…

I’ll try building from source and see if that does it!

Strange…

Oh, of course it’s something stupid. Firefox didn’t like me downloading anything directly into ~/bin (which, fair enough) and silently quarantined it, so I didn’t realize I was still running my old version of unz. I wish it would at least give me an error message or something when it does that.

Carry on!

1 Like