UnZ (v0.12) - Unpack Z-machine file format information

I’ve been working on and off with a tool for analyzing/unpacking the compiled z-machine story file information (think ztools). I’ve done it mostly for my own education and amusement but now feel that it maybe is time to release something for bug testing and suggestions for further development. There’s a lot of inconsistency how the information is presented and that’s because different parts where developed at different times (lately I spent an unreasonable amount of time on version 2 of the Infocom grammar format, a format hardly never used…) and I havn’t reallty decided what I think looks best

There’s lots of things that I want for the future, among, for example:

  • Better information on the individual properties on objects.
  • Unpack and identify arrays and their content, now in unidentified data areas.
  • Better support for Dialog.
  • Allow external symbol-file that names identified symbols (attributes, variables, routines, …).
  • GUI that lets you click and follow addresses in, for example, z-code

Help page to wet the appetite:

UnZ 0.10 (2024-02-17) by Henrik Åsman, (c) 2021-2024
Usage: unz [option] [file]
Unpack Z-machine file format information.

 -a                 Show the abbreviation sections.
 -d                 Show the dictionary section.
 -f                 Show all sections (default).
 -g                 Show the grammar section.
 --gametext         Output only a'gametext.txt' format of all text in the file.
 -h, --help, /?     Show this help.
 --hexdump          Show raw hexdump before each section.
 --hide             Don't show the abbreviation insertion points in the strings.
 -i                 Show the header section.
 -m                 Show the memory map.
 -o                 Show the objects sections.
 -s                 Show the strings section.
 --syntax 0/txd     Use TXD default syntax for the z-code decompilation. (default)
          1/inform  Use Inform syntax for the z-code decompilation. (txd -a)
          2/ZAP     Use ZAP syntax for the z-code decompilation.
 -u                 Show the unidentified sections.
 -v                 Show the variable section.
 -x                 Show miscellaneous other sections.
 -z                 Show the z-code section.
 -z <hexaddress>    Show the single decompiled z-code routine at <hexaddress>

Report bugs/suggestions to: heasm66@gmail.com
UnZ homepage: https://github.com/heasm66/UnZ

There are precompiled binaries for win-x64, linux-x64 and osx-x64 here.

The source code is a palimpsest of different iterations and testcode and nothing for the weak minded, visit the GitHub-page at your own risk, you have been warned.

Report bugs/suggestions here, via email or as an issue at the project page.

14 Likes

I have given it a quick but, I think, stressful, test:

across.z8 (one of the two largest .z8, using the full 512K z8 memory)
curses-r7.z3 (the oldest, and by Lord Inform’s admission,the most buggy)
curves.z8 (the other full 512K .z8 story file)
dejavu.z3 (the oldest Inform binary known)
Mansion.z5 (a “normal” .z5 story, chosen admittely at random)
moments.z6 (a very heavy-using .z6 features story file)

of all six, only curses-r3 crashed unZ:

***** SYNTAX/GRAMMAR TABLE DATA (028EC-02E5C, 1.393 bytes) *****

Verb 255 'get'/'lift'/'pick'/'take'
028EC 07 00 FF 00 00 00 00 00 00 Unhandled exception. System.NullReferenceException: Object reference not set to an instance of an object.
   at UnZ.Program.DecodeGrammarsInformV1(Int32, DictionaryEntries, Int32, Int32, Int32)
   at UnZ.Program.Main(String[])

so, I can assess that, as first release, 0.10, is a promising good tool, whose can be improved.

Thanks for your effort, and
Best regards from Italy,
dott. Piergiorgio.

4 Likes

I now have a fix for Curses-r7. Its ‘Adjective Table’ isn’t formatted as expected. This can also be observed if one tries to dump the information with INFODUMP:

    **** Prepositions ****

  Table entries = 17

239. "about"
240. "with"
241. "under"
242. "inside"
243. "throug"
244. "over"
245. "at"
246. "to"
247. "down"
248. "on"
249. "c  c      1hs "
250. "c  c      1hs "
251. "c  c      1hs "
252. "c  c      1hs "
253. "c  c      1hs "
254. "c  c      1hs "
255. "c  c      1hs "

A release 0.11 is coming soon with the fix as below:

***** PREPOSITION/ADJECTIVE TABLE (03009-0304E, 70 bytes) *****

03009 00 11                    Number of entries: 17
0300B 30 5D 00 EF              Preposition #239: 'about'
0300F 42 15 00 F0              Preposition #240: 'with'
03013 41 20 00 F1              Preposition #241: 'under'
03017 38 21 00 F2              Preposition #242: 'inside'
0301B 40 78 00 F3              Preposition #243: 'throug'
0301F 3A D6 00 F4              Preposition #244: 'over'
03023 30 FE 00 F5              Preposition #245: 'at'
03027 40 A9 00 F6              Preposition #246: 'to'
0302B 34 CB 00 F7              Preposition #247: 'down'
0302F 3A B3 00 F8              Preposition #248: 'on'
03033 00 FF 00 F9              Preposition #249: 'into' (table is corrupt at this entry, word taken from dictionary.)
03037 00 FF 00 FA              Preposition #250: 'off' (table is corrupt at this entry, word taken from dictionary.)
0303B 00 FF 00 FB              Preposition #251: 'in' (table is corrupt at this entry, word taken from dictionary.)
0303F 00 FF 00 FC              Preposition #252: 'from' (table is corrupt at this entry, word taken from dictionary.)
03043 00 FF 00 FD              Preposition #253: 'invent' (table is corrupt at this entry, word taken from dictionary.)
03047 00 FF 00 FE              Preposition #254: 'up' (table is corrupt at this entry, word taken from dictionary.)
0304B 00 FF 00 FF              Preposition #255: 'out' (table is corrupt at this entry, word taken from dictionary.)
4 Likes

UnZ is updated to version 0.11 (some small cosmetic fixes and the bugfix discussed above). Binaries and source code at the same places pointed to in the initial post.

4 Likes

I have update UnZ to version 0.12.

Changelog:

  1. Call without options or file prints help (same as -h).
  2. Cosmetic changes to printing in the objects section.
  3. List calls from property, action and preaction/parsing in z-code routine header.
  4. List possible startingpoint for arrays in “undentified data”, collected from opcodes loadb, loadw, storeb, storew and globals.

Example of 3:

Routine: 0x02C30               Called from routine(s) at 0x02C18
                               Called from property #40 at object #12 ("pair of sunglasses")
02C30 01                       1 local
                               (L00)

02C31 FD 87 01 15 12 66        COPY_TABLE L00 0x1512 0x66
02C37 B0                       RTRUE

Routine: 0x02C38               Called from routine(s) at 0x02C78
02C38 02                       2 locals
                               (L00 L01)

02C39 61 01 1B C0              JE L00 G0B [TRUE] RFALSE
02C3D 4A 01 20 40              TEST_ATTR L00 ATTRIBUTE32 [FALSE] RFALSE
02C41 4A 01 1D C0              TEST_ATTR L00 ATTRIBUTE29 [TRUE] RFALSE
02C45 4A 01 18 C0              TEST_ATTR L00 ATTRIBUTE24 [TRUE] RFALSE
02C49 D9 2F 05 8A 02 00        CALL_2S 0x2C50 (L01) -> -(SP)
02C4F B8                       RET_POPPED

Example of 4:

***** UNIDENTIFIED DATA (0097D-016D9, 3,421 bytes) *****

00970                                        00 00 00               ...
00980 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
00990 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
.
.
.
016B0 00 03 00 02 15 A3 14 F9 00 04 00 02 17 3B 16 32  .............;.2
016C0 16 65 00 03 00 02 15 D2 16 27 00 03 00 02 14 A0  .e.......'......
016D0 16 21 00 03 00 02 15 04 16 89                    .!........

Possible starting points for arrays (from absolute addresses in
opcodes loadw/GET, loadb/GETB, storew/PUT, storeb/PUTB and globals):
     0x0097D
     0x00A09
     0x00A6D
     0x00DFF
     0x00F0A
     0x00F70
     0x00FD6
     0x01644
     0x01648
     0x01654
     0x0165C
     0x01664
     0x0166C
     0x01676
     0x0167E
     0x01686
     0x0168E
     0x01696
     0x0169E
     0x016A6
     0x016B0
     0x016B8
     0x016C2
     0x016CA
     0x016CF
     0x016D2

Binaries and source code at the same places pointed to in the initial post.

1 Like

This is neat, thanks. A disappointment I have with glulx-strings is that it doesn’t show the dictionary.

I hope someone writes the equivalent of this for glulx (so I don’t have to!).

2 Likes