Z-machine dictionary alignment trouble

DavidG · February 4, 2022, 6:26am

How can I, in Inform 6 compiling to Z-machine, print the contents of an arbitrary dictionary entry? For instance, I want to be able to do PrintDictEntry(19) and then see the word outside.

heasm66 · February 4, 2022, 7:59am

I wrote a small routine in ZIL (sorry… I’m not familiar with the exact Inform syntax but it should be easy to translate for someone more fluent in Inform) that does the job.

As it is written the first entry is entry 0.

<ROUTINE PRINT-DICT-ENTRY (N "AUX" DICT-ADDR SEP-CNT WORD-SIZE DICT-START)
	<SET DICT-ADDR <GET 8 0>>	                     ;"Start of dictionary in header"
	<SET SEP-CNT <GETB .DICT-ADDR 0>>				 ;"Number of separator characters"
	<SET WORD-SIZE <GETB .DICT-ADDR <+ .SEP-CNT 1>>> ;"Length of dictionary words"
	<SET DICT-START <+ .DICT-ADDR 4 .SEP-CNT>>       ;"Add offset dictionary header"
	<PRINTB <+ .DICT-START <* .N .WORD-SIZE>>>>

otistdog · February 4, 2022, 12:29pm

Something like:

[ PrintDictEntry entry_num ;
    print (address) No__Dword(entry_num);
];

Note that dictionary entries begin with 0 and using an illegal entry number results in an RTE.

Also, FYI if you’re looking at this stuff, the Glulx version of the StdLib routine No__Dword() does not seem to work, and it does not seem like it worked in 6/11, either.

zarf · February 4, 2022, 3:24pm

Hm… oh, I see what happened. I changed the Glulx library code so that it always tracks dict word addresses, never dict entry numbers. So No__Dword() and Dword__No() are both the identity function.

If you need to do this anyway, I think the address is #dictionary_table + WORDSIZE + index * (7+DICT_WORD_SIZE).

zarf · February 4, 2022, 3:32pm

(This breaks if you use the $DICT_CHAR_SIZE=4 option for a Unicode dictionary. But then so does the rest of the 6/11 parser.)

DavidG · February 5, 2022, 12:37am

I’m looking into this stuff so I can better solve a bug in the Standard Library that only affects Zcode (https://gitlab.com/DavidGriffith/inform6lib/-/issues/124).

DavidG · February 5, 2022, 12:39am

Should I look into fixing this too? Right now, the broken specification problem I described above is the only thing holding up a new release of the 6.12 line.

otistdog · February 5, 2022, 1:13pm

If I’m understanding that bug report correctly, the issue derives from the specification for .grammar() routine return values found on DM4 p. 140 (DM4 §18: Making conversation), and it occurs only when two conditions hold:

the dictionary table includes address $8000 (the 16-bit minimum negative integer value)
there is a word address at $8000

As explained by @fredrik, an example of when the issue might occur is when there are several sequential dictionary words at locations $7FF7 ('take'), $8000 ('tip'), and $8009 ('tote'). The signed value of $7FF7 is 32759. Applying the unary negative operator to that value creates -32759, which when treated as an unsigned value is interpreted as $8009. As a result, there is no way to distinguish whether the return value in this example means -'take' ($8009) or 'tote' ($8009).

Would shifting the dictionary location by one byte under those circumstances solve the issue? To use the above example, the relevant word address values would be $7FF8 ('take'), $8001 ('tip') and $800A ('tote'). In this situation, -'take' would evaluate to $8008, so the ambiguity would not occur. This seems like it would preserve the specification and the validity of that section of the DM4 at the cost of 1 byte from the story file in the relatively rare cases under which this scenario would occur.

fredrik · February 5, 2022, 2:19pm

Yes, shifting the dictionary one byte would solve the issue. That would require a change in the compiler, which the compiler maintainers don’t want to do, because a bug in the library shouldn’t be fixed at the compiler level.

otistdog · February 5, 2022, 3:20pm

Is it really a bug at the library level, though? This seems to be an emergent problem arising under specific conditions, which partly originates in the specification itself, partly in the low-level operations of the Z-machine, partly in the specifics of any particular source code, and partly with the implementation of the parser.

If I’m not mistaken, at least one other alignment-related bug (emergent in a similar manner) has been solved via compiler change in the past. The pattern used there could be applied again. It seems like detecting the trigger conditions for this issue would be easy enough at the point of writing out the dictionary table.

fredrik · February 5, 2022, 3:48pm

The specification of this feature in DM4 is broken. It would have made perfect sense to say the routine should return 'verb' + 1 instead of -'verb' - then we wouldn’t have any problems.

Graham, or someone else, realized the specification is broken but didn’t want to change the specification, so they just added a check in the code to warn for this condition. Unfortunately, the check is only active in DEBUG mode, and the alignment of the dictionary is not likely to be the same as in non-DEBUG mode. And enabling the warning in non-DEBUG mode would mean a production build of a game could start with an ugly warning. A lot of games don’t use the grammar property anyway, so the warning wont’t even serve a purpose in most cases.

zarf · February 5, 2022, 6:39pm

To be clear, this is my call. I’m sticking to it though.

The situation has become worse with the new $ZCODE_LESS_DICT_DATA option. This causes the dictionary to have 8-byte entries (6-byte in v3). Now you can run into this library problem even when the dictionary isn’t aligned on $8000! E.g. with 8-byte entries, if $7FFC is a dictionary word, then $8004 is also a dict word, but these values are negatives of each other.

So the proposed compiler fix isn’t sufficient anyway. We could do more complex math, but it just seems like the wrong road to head down.

otistdog · February 5, 2022, 11:21pm

For what it’s worth, this code change (first if statement is new) for tables.c in 6.36 seems to handle your problem example from the bug report, and it seems to handle the case that zarf mentions as well (using your test code with $ZCODE_LESS_DICT_DATA=1 and the size of dummy at $5F05):

/*  ------------------------- Dictionary ------------------------------- */

// check for dictionary alignment problem possibility and add byte if needed
// note that constant 7 is based on number of dictionary table header bytes from following blocks
if ((mark < 0x8000) && (mark + dict_entries * DICT_ENTRY_BYTE_LENGTH + 7 > 0x8000)) {       // dictionary will span address $8000
    if (                                                                                    
         ((0x8000-(mark+7)) % DICT_ENTRY_BYTE_LENGTH == 0)                                  // Case 1: word entries align so word at $8000
       ||
         ((DICT_ENTRY_BYTE_LENGTH%2 == 0) &&                                                // Case 2: dictionary entries are even-sized
          ((0x8000-(mark+7)) % (DICT_ENTRY_BYTE_LENGTH/2) == 0))                            // and offset from $8000 is half the entry size
       ) {
        printf("ADVISORY: Dictionary moved by one byte.\n");                                // needs improved message
        mark++;                                                                             // move dictionary by one byte
    }
}

dictionary_at=mark;

dictionary[0]=3; dictionary[1]='.';        /* Non-space characters which */
                 dictionary[2]=',';                 /* force words apart */
                 dictionary[3]='"';

dictionary[4]=DICT_ENTRY_BYTE_LENGTH;           /* Length of each entry */
dictionary[5]=(dict_entries/256);                   /* Number of entries */
dictionary[6]=(dict_entries%256);

for (i=0; i<7; i++) p[mark++] = dictionary[i];

for (i=0; i<dict_entries; i++)
{   k = 7 + i*DICT_ENTRY_BYTE_LENGTH;
    j = mark + final_dict_order[i]*DICT_ENTRY_BYTE_LENGTH;
    for (l = 0; l<DICT_ENTRY_BYTE_LENGTH; l++)
        p[j++] = dictionary[k++];
}
mark += dict_entries * DICT_ENTRY_BYTE_LENGTH;

EDIT: For reference, attached is a patch file for the difference in tables.c.
EDIT 2: Code example and patch file updated Apr 2022 to correct error in Case 1 logic (such that the problem of 8-byte dictionary entries aligning at $8000 were being ignored).

dict-align.patch.zip (761 Bytes)

DavidG · February 6, 2022, 12:15am

It seems we can do:

Change the library to have a grammar property return 'verb'+1 instead of -'verb'.
This would necessarily break older games that make use of a grammar property and contradict the DM4 from that point on. This corrects the odd situation of negating an unsigned value and expecting things to work.
Have the author do parser_one = true; before returning a negated verb from a grammar property.
This was suggested by @fredrik as a way to resolve this ambiguity. It works, but I find it clunky .
Change the compiler to be careful when building the dictionary to avoid situations that can lead to the confusion that causes this bug.
Remove the #Ifdef DEBUG; around the check that Graham or whoever put in there. Then add to the warning a suggestion to add a line of Array dummy -> $01; to make it such that the requirements for the bug are no longer present in the compiled code.

Number 1 seems to have the most impact on authors, at least those who make use of grammar properties. As @fredrik points out, few authors seem to use that property at all, so annoyances should be minimal. If someone is going to read up enough on Inform 6 to decide that the grammar property would be fun to use, I’d imagine that the same someone will be keen enough to figure out about this change.

Numbers 2 and 4 can be implemented simultaneously. However, both are clunky.

Number 3 then seems most ideal. I agree with @zarf’s reluctance to address this problem in the compiler because it was caused by an error in crafting the Library. This brings me to @otistdog’s point about another alignment-related bug which was solved via a compiler change. What was this problem? I would like to see the commit that dealt with that. Now back to @zarf’s justification for not doing 3: “We could do more complex math…”. How complex? We could possibly justify this change by adding to the Z-machine standard a stricture saying that the dictionary SHALL NOT be aligned on $8000 or however it works out for 8-byte entries.

For convenient reading, this is the test in the Library that Graham added:

if (dict_start > 0 && dict_end < 0 && ((-dict_start) - dict_end) % dict_entry_size == 0)

It seems to me that this would work for 8-byte dictionary entries falling into this troublesome alignment. If this test is added to the dictionary-generation code in the compiler, would this enforce this proposed alignment requirement?

fredrik · February 6, 2022, 1:57am

I think your summary of the options to resolve this is correct.

I think the code currently in the library to test this is incorrect, even when dictionary entries are always 7 or 9 bytes. I think this is one way to test it that actually works:

if (dict_start > 0 && dict_end < 0 &&
      (((32767 - dict_start + 1) % dict_entry_size == 0) ||
	   (dict_entry_size & 1 == 0 &&
		   ((32767 - dict_start + 1) % dict_entry_size == dict_entry_size / 2))))
        print "** Warning: grammar properties might not work correctly **^";

otistdog · February 6, 2022, 2:50am

I agree with fredrik’s analysis on the bug report thread that the “warning” code does not seem to perform its intended function – both because a) the logic is incorrect and b) it is only shown in debugging builds and the situation that it tries to detect will be altered anyway for a release build. Fixing it is irrelevant; the goal at this point should probably be to remove that code from the library entirely.

Other than that, I’ll only say that option 3 preserves backwards-compatibility for all previous I6 code (minus any that suffer from this issue and do not have room for even one more byte in readable memory) and conforms with author expectations set by the DM4 for all future I6 code. And that I’ve already posted a proof-of-concept of some logic for the compiler that seems to cover the general case elucidated so well by fredrik and zarf.

The logic of the revised “warning” code in fredrik’s post is essentially identical to that in mine. [EDIT: On another review, I see that fredrik’s code above is the same that he posted a few weeks ago on the bug report thread, so credit where it’s due, which is with him. The only difference I can see is that my version uses the compiler source’s idiom. I also see that he anticipated the even-sized dictionary entry case, as well. (Sorry about that, @fredrik. I didn’t read your second version of the “warning” code there as closely as I should have the first time through.)][EDIT 2: I can also see that fredrik reported this issue (https://github.com/DavidKinder/Inform6/issues/130) to the compiler project in August of last year.]

The important question, though, is: Why use it to produce a nonfunctional warning when it can be used to just prevent the problem in the first place? As I once saw it put humorously: “If only I had a computer to handle these troublesome details for me!”

Piergiorgio_d_errico · February 6, 2022, 12:27pm

on no.1 I don’t have delved so deep into standard library’s innards, aside the historical work, but OTOH, I have a complete mirror of IFarchive’s inform games sources, AND *grep(1).

if you can provide some example of code whose can be broken, I can check into that large sample of inform 6 sources via grep/rgrep, so a sort of weighing of the cons of breaking old sources can be done.

Best regards from Italy,
dott. Piergiorgio.

fredrik · February 6, 2022, 1:22pm

Any games that use the grammar property. And library extension too, of course. Of course, most game authors don’t upload their source code to if-archive.

I wouldn’t be so sure that anyone wanting to use the grammar property will read the latest addendums to the Inform docs. DM4 is available in multiple locations, and has been available as a printed book too. The more recent additions to the library documentation are a bit harder to find than the DM4, and there are no pointers in DM4 saying always check site X and Y for the latest changes to the docs.

zarf · February 6, 2022, 4:00pm

The 6/12 library has departed from and extended the DM4 in lots of ways. Even the 6/11 library did, a bit, really.

If someone wants to write an “I6 Library Addendum” document, I’ll add it to inform-fiction.org.

otistdog · February 6, 2022, 5:14pm

I misremembered the details. The pattern that I recalled seeing may or may not have been motivated by bug fixes; it may have been in place since the original implementation.

If you review the construct_storyfile_z() function found in the tables.c source file, you’ll see that there are points where the routine that writes the Z-machine story file checks to see whether certain kinds of alignment will or will not happen and adds padding bytes accordingly. Most of the time it is seeking to make sure that a given alignment does happen in order to prevent problems – in the case at hand, it would seek to make sure that the given alignment doesn’t happen.

An example:

/*  -------------------- Objects and Properties ------------------------ */

/* The object table must be word-aligned. The Z-machine spec does not
   require this, but the RA__Pr() veneer routine does.
*/
while ((mark%2) != 0) p[mark++]=0;

Another:

/*  ----------------- A gap before the code area ----------------------- */
/*  (so that it will start at an exact packed address and so that all    */
/*  routine packed addresses are >= 256, hence long constants)           */
/*  -------------------------------------------------------------------- */

while ((mark%length_scale_factor) != 0) p[mark++]=0;
while (mark < (scale_factor*0x100)) p[mark++]=0;
if (oddeven_packing_switch)
    while ((mark%(scale_factor*2)) != 0) p[mark++]=0;

A third:

/*  ------------------ Another synchronising gap ----------------------- */

if (oddeven_packing_switch)
{   if (module_switch)
         while ((mark%(scale_factor*2)) != 0) mark++;
    else
         while ((mark%(scale_factor*2)) != scale_factor) mark++;
}
else
    while ((mark%scale_factor) != 0) mark++;

Those are the only three that I spotted in a quick sweep of the 6.36 version, but clearly the pattern is well-established at the compiler level, and this case very much seems to fit the same mold as the ones that prompted the above code.