Unless I’m misunderstanding, the way that these routines are used within the library is as a complementary pair. In routine ParseToken__()
, Dword__No()
encodes the index position of the dictionary entry (a nice, low number) and adds it to the constant REPARSE_CODE
as additional information:
! If we've run out of the player's input, but still have parameters to
! specify, we go into "infer" mode, remembering where we are and the
! preposition we are inferring...
if (wn > num_words) {
if (inferfrom==0 && parameters<params_wanted) {
inferfrom = pcount; inferword = token;
pattern-->pcount = REPARSE_CODE + Dword__No(given_tdata); ! <---- HERE
}
! If we are not inferring, then the line is wrong...
if (inferfrom == 0) return -1;
! If not, then the line is right but we mark in the preposition...
pattern-->pcount = REPARSE_CODE + Dword__No(given_tdata); ! <---- OR HERE
return GPR_PREPOSITION;
}
o = NextWord();
pattern-->pcount = REPARSE_CODE + Dword__No(o); ! <---- OR HERE
In NounDomain()
, the dictionary entry index position is extracted and fed to No__Dword()
to obtain the dictionary word’s memory address:
else {
! An inferred preposition.
parse2-->1 = No__Dword(pattern-->j - REPARSE_CODE); ! <---- HERE
#Ifdef DEBUG;
if (parser_trace >= 5) print "[Using preposition '", (address) parse2-->1, "']^";
#Endif; ! DEBUG
}
Likewise in PrintCommand()
:
[ PrintCommand from i k spacing_flag;
if (from == 0) {
i = verb_word;
if (LanguageVerb(i) == 0)
if (PrintVerb(i) == 0) print (address) i;
from++; spacing_flag = true;
}
for (k=from : k<pcount : k++) {
i = pattern-->k;
if (i == PATTERN_NULL) continue;
if (spacing_flag) print (char) ' ';
if (i ==0 ) { print (string) THOSET__TX; jump TokenPrinted; }
if (i == 1) { print (string) THAT__TX; jump TokenPrinted; }
if (i >= REPARSE_CODE)
print (address) No__Dword(i-REPARSE_CODE); ! <---- HERE
else
if (i in compass && LanguageVerbLikesAdverb(verb_word))
LanguageDirection (i.door_dir); ! the direction name as adverb
else
print (the) i;
.TokenPrinted;
spacing_flag = true;
}
];
So long as the two routines reverse each other’s mappings (without causing other issues like signed integer overflow), they will function in their current use within the library. That’s why the current Glulx implementations (just the identity function) don’t cause a problem. However, if these are to be true library routines, they should have a consistent functional meaning across Z-machine and Glulx, even if the implementation details differ. That’s why I’m saying that the Glulx implementations should be modified.
Routine Dword__No()
should:
- take the dictionary word’s memory address as input (call it
word_addr
)
- determine the correct starting memory address for the first word in the dictionary (call it
first_word_addr
)
- determine the correct memory size of each dictionary address (call it
entry_len
)
- use the difference between
word_addr
and first_word_addr
, divided by entry_len
, to determine the word’s dictionary index position
Routine No__Dword()
should:
- take the dictionary word’s index position as input (call it
word_index
)
- determine the correct starting memory address for the first word in the dictionary (call it
first_word_addr
)
- determine the correct memory size of each dictionary address (call it
entry_len
)
- use the product of
word_index
and entry_len
, added to first_word_addr
, to determine the word’s memory address
There are many places in the library where work is minimized by using constant expressions that mask informational details of what’s going on. As the compiler changes, some of the assumptions on which these expressions rely are becoming untrue. The new $ZCODE_LESS_DICT_DATA
setting is breaking the assumption of the value of entry_length
being 9. (I would suppose that those using Z3 format, which uses only 4 bytes of character storage and has an entry size of 7, have already run into this trouble if they ever tried to use Dword__No()
for its ostensible function.) I’m pointing out that the value of the offset between the dictionary’s start address and the first word’s address is also being assumed to always be 7, but that’s not a valid assumption if a non-standard number of word-separators is used.
I’m not sure that it’s even possible to modify the number and values of the word-separator characters in Inform right now; the current list of three may just be hardcoded in the compiler. I do know that the Z-machine format allows for lists of different sizes. It seems entirely possible that future compilers may allow the author to modify these from the current default. Since you’re modifying the routines to take into account the correct entry size (which, happily, would fix the Z3 issue), it makes sense to me to future-proof them while you’re at it.
Using Inform 6.36 and StdLib 6.12.5, I’ve done unit testing for the following for Z-machine with and without $ZCODE_LESS_DICT_DATA=1
specified and for Glulx with and without $DICT_CHAR_SIZE=4
specified:
#Ifdef TARGET_ZCODE;
[ Dword__No word_addr;
! takes advantage of work done by ZZInitialise() to reduce overhead
return (word_addr-dict_start)/dict_entry_size;
];
[ No__Dword word_index;
! takes advantage of work done by ZZInitialise() to reduce overhead
return dict_start + (dict_entry_size * word_index);
];
#Endif;
#Ifdef TARGET_GLULX;
[ Dword__No word_addr first_word_addr entry_len;
first_word_addr = #dictionary_table + WORDSIZE; ! See Glulx Inform Technical Reference section 4
! following assumes only legal values for DICT_CHAR_SIZE are 1 and 4; also assumes number of dict_parN bytes is fixed at 3
entry_len = (DICT_WORD_SIZE * DICT_CHAR_SIZE) + 7; ! 7 = 2 per dict_parN short plus 1 type byte
if (DICT_CHAR_SIZE == 4) entry_len = entry_len + 5; ! additional short and 3 extra bytes in this case
return (word_addr-first_word_addr)/entry_len;
];
[ No__Dword word_index first_word_addr entry_len;
first_word_addr = #dictionary_table + WORDSIZE; ! See Glulx Inform Technical Reference section 4
! following assumes only legal values for DICT_CHAR_SIZE are 1 and 4; also assumes number of dict_parN bytes is fixed at 3
entry_len = (DICT_WORD_SIZE * DICT_CHAR_SIZE) + 7; ! 7 = 2 per dict_parN short plus 1 type byte
if (DICT_CHAR_SIZE == 4) entry_len = entry_len + 5; ! additional short and 3 extra bytes in this case
return first_word_addr + (entry_len * word_index);
];
#Endif;
Note that this thread is quite long and involves two distinct and essentially unrelated topics: the first (and original) is about conversion of dictionary index entries to word addresses and vice versa, and the second is about how to resolve the .grammar()
routine ambiguous return value problem identified by fredrik. Doesn’t it make sense to split these into two threads at this point?