Confusion about 'read text_array parse_array'

Jaek · April 15, 2022, 2:46am

I’ve just started looking at Inform 6 as I think it suits me more as a programmer than Inform 7 does. I’m working my way through the DM4 and have run into some confusion regarding the read function. I’m referring specifically to the example code for read text_array parse_array; at DM4 §2: The state of play.

The part that’s confusing me is that the code seems to get two different values from any given element of parse_array depending on the accessor used (-> or -->).

Taking word position (w) 1 as an example, the line:

dict = parse_array-->(w*2-1);

resolves to array element 1 (1*2-1). When referenced with --> I find that this does in fact contain the correct dictionary key for word 1, as the manual suggests.

However:

parse_array->1;

which is explicitly accessing element 1 (as used in the for loop) returns the number of words parsed (again, as the manual suggests.)

How can I be getting two distinct (correct) values from the same element of the array depending on the use of --> or ->?

I did some experimenting involving creating an array and populating/accessing it using the two different accessors and got odd results, as I would expect.

Furthermore, the compiler itself gives a warning about using parse_array--> in the example code:

Using '-->' to access a -> or string array

I realise that I will probably never need to use read in this way, but it’s bothering me as to why this works. Can anyone shed any light on this?

zarf · April 15, 2022, 3:06am

The format of the parse_array in Z-code is pretty confusing, yeah.

The first byte is the maximum number of words that can be parsed. The second byte is the number of words that were actually found. The rest of the array is two-byte values, containing word values.

So that’s parse_array->0, parse_array->1, and then parse_array-->N for N >= 1. We’re using the array address in two ways, but the values don’t overlap.

(If you access parse_array-->0, or parse_array->N for N > 1, you get nonsense. Well, not nonsense, but useless values.)

Dannii · April 15, 2022, 3:13am

This is because the Z-Machine parses input into an array using both bytes and (16 bit) words.

Next, lexical analysis is performed on the text (except that in Versions 5 and later, if parse-buffer is zero then this is omitted). Initially, byte 0 of the parse-buffer should hold the maximum number of textual words which can be parsed. (If this is n, the buffer must be at least 2 + 4*n bytes long to hold the results of the analysis.)

The interpreter divides the text into words and looks them up in the dictionary, as described in S 13. The number of words is written in byte 1 and one 4-byte block is written for each word, from byte 2 onwards (except that it should stop before going beyond the maximum number of words specified). Each block consists of the byte address of the word in the dictionary, if it is in the dictionary, or 0 if it isn’t; followed by a byte giving the number of letters in the word; and finally a byte giving the position in the text-buffer of the first letter of the word.

That results in a parse-buffer that looks like this:

Byte index  0         1        2-3   4    5       6-7   8    9
Word index  0                  1     2            3     4
            Capacity  # Words  Word 1---------->  Word 2---------->
                               Addr  Len  Offset  Addr  Len  Offset

In Inform 6, the operator -> reads an array using a byte index, while --> reads an array using a word index.

So to read the number of parsed words you need to use parse_array->1. To get the dictionary address of the first word you use parse_array-->1, and to get its offset in the text buffer you use parse_array->5. To get the dictionary address of the second word you use parse_array-->3, and to get its offset in the text buffer you use parse_array->9.

zarf · April 15, 2022, 3:16am

Whoops – Dannii is more complete than me. When I said “The rest of the array is two-byte values,” I was forgetting some of the details.

Jaek · April 15, 2022, 3:34am

Thank you both for your swift responses. This now makes perfect sense.