UnZ - Unpack Z-machine file format information

I bet it only does that if the two-byte word is a valid packed address in the string segment.

1 Like

By “incorrectly” in my earlier post, I mean that if those XX XX bytes had been treated as an offset into the Vocabulary/Dictionary, the name would have come out right.

Yes, perhaps the code is blindly doing that.

@zarf is correct, it searches for a valid intrepetation of the two-byte word and sometimes it is ambigious and it picks the wrong intrepetation.

Do you have any example that I could work with and see if it can do a more “intelligent” choise?

1 Like

I guess one approach is to look at each property number across every object. If all (or most) of the values of that property are valid dict word addresses, and few are valid string packed addresses, that can bias the decision when a single property value is ambiguous.

2 Likes

I’d lean toward the exhaustive approach as well.

I inspected your DecodePropertyData routine and reproduced why the “41 XX XX” case sometimes produces the wrong output.

Lazy fix (maybe?): prefer dictionary entries for property #1 (object name) when pPropSize = 2.

1 Like

Good ideas! I’ll hava a look.

Property 1 isn’t necessarily name in all games, although it’s common.

1 Like

I was being too lazy… another approach…

For pPropNum = 1 check the dictionary first (if a dictionary entry exists for the unpacked word, use it), then fall back to string/routine;

(But when both a string and a dictionary entry point to the same packed address, hiccups ensue.)

Similarly, I found the same bug with pPropNum = 2. In that case the correct behavior would likely be to show the raw data, but sometimes the two bytes are wrongly interpreted as a packed address and Unz inserts an unrelated decoded string.

By the way, your Called from property #XX at object #YY ("name_of_object") message is a godsend. I’m tweaking the code so it prints the name of property #XX when it’s known a priori that #XX is a verb ID, and I’m adding a toggle to display global variable indices in hexadecimal instead of decimal.

1 Like

Turns out the blob labeled by Unz as “Unidentified data (Class, indiv. prop & symbol table)” was the clue, it shows prop #1 is “name.” I vaguely remember infodump could tag the Property Names Table and dump the names, e.g., prop#1 becomes "name". (I could be wrong.)

1 Like

Yeah, the “unidentified” areas are mostly arrays. There is more work to be done with arrays, but I havn’t got around to it yet. They are hard because much of the information about how data are structured in an array are “hidden” in the z-code.

2 Likes

I think I solved this by scanning over all instances of the property, and if all are pointing to dictionary words, treat them like that even if it is also a valid string or routine.

Old (Zork1):

Object: 234
00BFD 00 00 00 00 FA EB 00 21 B7
  Attributes: 32, 33, 34, 35, 36, 38, 40, 41, 42, 44, 46, 47
  Parent = 250
  Next   = 235
  Child  = 0
  Properties address  = 21B7
    021B7 03 7E 97 42 4E A4 A5
    Description = "zorkmid"
    021BE 2B    4D 3E                   11/2  (PROP#11 Routine at 0x09A7C)
    021C1 2A    3E 29                   10/2  (PROP#10 Routine at 0x07C52)

and new:

Object: 234
00BFD 00 00 00 00 FA EB 00 21 B7
  Attributes: 32, 33, 34, 35, 36, 38, 40, 41, 42, 44, 46, 47
  Parent = 250
  Next   = 235
  Child  = 0
  Properties address  = 21B7
    021B7 03 7E 97 42 4E A4 A5
    Description = "zorkmid"
    021BE 2B    4D 3E                   11/2  (PROP#11 Routine at 0x09A7C)
    021C1 2A    3E 29                   10/2  (PROP#10 ZORKMI)
1 Like

Just saw the issue closed on GitHub… I’m so hyped! Thanks a ton!

Please file it as an issue at GitHub with maybe some more details what you have done. I have some ideas unfinished yet unfinished about the objects, but I’m happy for the input.

2 Likes

Just wanted to say I’ve been using UnZ recently in debugging the Dialog compiler and it’s a great help!

2 Likes

Okay. I’ll open a GitHub issue and add the details. Thanks again!

Happy to hear that. I don’t extract much information from Dialog files at the moment. There are big chunks that are unidentified (the predicate data and the big chunk at the start of static memory). I would love sometime to see a technical description of these, so I could extract some more information.

***** MEMORY MAP *****

00000-04653 DYNAMIC MEMORY
   00000-0003F Header table, 64 bytes.
   00040-0006C IFID, 45 bytes.
   00040-00FDF Main heap, 4,000 bytes.
   00FE0-01557 Aux and long-term heap, 1,400 bytes.
   01558-015D5 Object defaults table, 126 bytes.
   015D6-030BD Object tree table, 6,888 bytes.
   030BE-0386D Object properties tables, 1,968 bytes.
   0386E-040D9 Unidentified data, 2,156 bytes.
   040DA-041AD Global variables, 212 bytes.
   041AE-04585 Predicate data, 984 bytes.
   04586-04587 Abbreviation strings, 2 bytes.
   04588-04647 Abbreviation table, 192 bytes.
   04648-04653 Scratch area, 12 bytes.

04654-07CD7 STATIC MEMORY
   04654-06458 Unidentified data, 7,685 bytes.
   06459-07CD5 Vocabulary/Dictionary, 6,269 bytes.
   07CD6-07CD7 Padding, 2 bytes.

07CD8-5CBC7 HIGH MEMORY
   07CD8-4946F Z-code, 268,184 bytes.
   49470-5CBC7 Static strings, 79,704 bytes.