So how exactly does ZIL's primary and secondary word type system work?

Per the Internal Secrets document (page 22), ZIL links vocab words[1] to verb, adjective, noun, preposition, and direction IDs in a truly arcane way. My understanding is:

  • Each word in the dictionary has one byte indicating what type(s) of word it is (“type byte”), and two bytes indicating its ID in each of those types (“primary ID byte” and “secondary ID byte”).
  • The type byte assigns a word any number of “primary” types and exactly one “secondary” type.
  • Possible primary types are noun, verb, adjective, direction, preposition, and other. (A word can have any subset of these, including all or none.)
  • Possible secondary types are noun, verb, adjective, and direction. (A word always has exactly one of these, never zero, never multiple.)
  • The primary ID byte is the word’s ID number in its primary type(s), which can be zero to mean “same as the secondary ID byte”.
  • The secondary ID byte is the word’s ID number in its secondary type, which cannot be zero.
  • The routine to check a word’s type is WT?. This can be called in two ways.
  • WT?(Word, X) returns TRUE if Word has X among its primary types, and FALSE otherwise.
  • WT?(Word, X, Y) returns the secondary ID byte if Word has secondary type Y, and the primary ID byte otherwise. X is ignored.

My question is: how is this useful? How do you effectively use this system to check something like “is this word a verb, and if so, what’s its ID” or “is this word a preposition, and if so, what’s its ID”—both things that the parser needs to do?

It seems like I’m not the only one to find this confusing, but if ZILF works in the same way, surely ZILF maintainers and programmers must have a good understanding of this. Is there a better way to explain it that will make more sense? (Because my explanation here doesn’t even make sense to myself…)


  1. Since these routines operate on bytes, “word” in this post always means “dictionary entry”, not “16-bit value”. ↩︎

3 Likes

The answer to both is “Call WT?”, right? I’m not entirely clear on the compiler’s logic, but I can see that the way WT? is used in the parser is consistently

<WT? .WRD ,PS?ADJECTIVE ,P1?ADJECTIVE>

…to get the adjective number. (The constants are 0x20, 0x02.) To get the verb number it’s

<WT? .WRD ,PS?VERB ,P1?VERB>

1 Like

X isn’t ignored. WT? does the type-check against type X first; it returns false if the word isn’t that type. If the word is that type, the return value depends on whether Y was provided. If it wasn’t, the function returns true (so it’s just a boolean test). If Y was provided, the function returns the adjective number, verb number, or whatever.

You have to pass the correct Y value for the type X you are testing. Otherwise you get garbage. This is baroque but not really any harder to use.

1 Like

Ah, if X isn’t ignored this makes much more sense. The Secrets document says that if Y doesn’t match, “the primary ID is returned regardless if it is a valid word type for the primary ID”, which made me think X wasn’t checked in that case.

So the idea is something like: the “primary types” are meant to be a list of all types this word can be, with the “secondary type” being a somewhat-arbitrarily-chosen “main” type, the “secondary ID” being the ID of the “main” type, and the “primary ID” being the ID of any other type that’s not the “main” one? That explains why the primary ID can be zero but the secondary ID can’t be.

In which case, WT? Word Adjective Adjective returns FALSE if it’s not an adjective at all, the secondary ID if the secondary type is Adjective, and the primary ID if the secondary type is not Adjective. That makes sense! And then a word can have any number of types, though all but one of them have to share an ID.

Unfortunately, I’m still not sure how you’d do “is this word a preposition, and if so, what’s its ID” with this system, since prepositions aren’t a valid secondary type. Something like FROM isn’t a noun, verb, adjective, or direction, so what secondary type would you give it? And how would you make a WT? call that always returns the primary ID, no matter what secondary type the word might have?

In most games, you could probably get away with just assuming that prepositions will never have any other word types, so by convention you always give them the “noun” secondary type, and then WT? Word Preposition Verb will always return the primary ID. But that doesn’t seem like a safe assumption to make in general. BEHIND can be either a preposition or a noun, for example, while OUT can be either a preposition (in the sense of “a fixed word in a grammar line”) or a direction.

1 Like

The primary/secondary type thing seems like a confusing way to frame it. Here’s how I think of it:

  • Each word has some number of types (“parts of speech”). One or two are always a possibility, but sometimes there can be more, depending on exactly what they are.
  • Some of those types have associated values. Specifically, verbs, directions, prepositions, and (in V3) adjectives.
  • There are only two slots for those values. This is where the limit on types comes from, and why it depends on the exact types: a word can’t be a verb, direction, and preposition all at once, because only two of the values could be stored.
  • Each word also carries an indicator saying which of its values is stored in the first slot.
  • Those indicators only exist for verb, adjective, and direction. The location of the preposition value (if present) is inferred: it’s always in the second slot, except when there’s no other value to put in the first slot.

If the PS?PREPOSITION bit is set in the flags byte, then it’s a preposition. To find its preposition value, mask off the type flags and look at the indicator: if it’s 0, return the value in the first slot; otherwise the second.

FROM is only a preposition, so its flags byte would just be PS?PREPOSITION. Since it has no other types, it has no “which comes first” indicator, so its preposition value is stored in the first slot.

Mu. All types are morally equal. You can check which type’s value is stored in the first slot, by masking off the type flags and looking at the indicator, but the answer doesn’t tell you anything except (at most) which definition the compiler encountered first.

3 Likes

That terminology does make a lot more sense! Just to make sure I understand right: you’re saying the way to get the preposition number of a word is to bypass WT? entirely and just look at its data bytes directly; WT? is a useful convenience for certain things, but it’s not meant to be the only way to access a word’s features?

You can still use WT?. ZILF’s equivalent is a little more readable but works the same way:

<ROUTINE CHKWORD? (W PS "OPT" (P1 -1) "AUX" F)
    <COND (<0? .W> <RFALSE>)>
    <SET F <GETB .W ,VOCAB-FL>>
    <SET F <COND (<BTST .F .PS>
                  <COND (<L? .P1 0>
                         <RTRUE>)
                        (<==? <BAND .F ,P1MASK> .P1>
                         <GETB .W ,VOCAB-V1>)
                        (ELSE <GETB .W ,VOCAB-V2>)>)>>
    .F>

To get the verb value: <CHKWORD? .W ,PS?VERB ,P1?VERB>
To get the preposition value: <CHKWORD? .W ,PS?PREPOSITION 0>
To test for the presence of a type and ignore the value: <CHKWORD? .W ,PS?OBJECT>

So I guess an even more straightforward (and more accurate) way to look at it is that there is a “which comes first” indicator for prepositions, but it’s zero instead of a named constant. (Theoretically, the preposition value could come first, but I’ve never seen it in practice.) See BACK in Beyond Zork below, which has the preposition value first.

2 Likes

Speaking of which, there is a difference between “word” in a dictionary and “word” in arrays, as in 2 bytes rather than 1 byte, right? I keep looking at documentation and I get really confused when they say “word” and then use it in reference to strings or objects.

Yes. I try to say “dict word” where the context is unclear, but it’s sometimes confusing anyway.

1 Like

We should also note that the constants ,P1?VERB, ,PS?VERB, and so on are hardwired in Infocom’s compiler. They are the same for every game (except the V6s, which I haven’t looked at, and Ko says something about Sherlock being different).

The Inform parser avoids this whole mess by not storing most of that information in the dict table. Only the verb number is stored, and that’s always in the first slot. Other type-of-speech info lives in the grammar table.

2 Likes

I suppose you could simplify the call with a macro. Define <CHKWORDVALUE W ,PS?VERB> so that it converts to <CHKWORD? .W ,PS?VERB ,P1?VERB>, and so on.

Infocom’s parser only calls <WT? X Y> a handful of times, though.

Apologies if I’m just being dense at this point, but:

Since 0 is the code for “noun first”, wouldn’t this return the wrong value if a word is both a noun and a preposition? Does ZILF just assume that never happens, and caution authors to never put a THROUGHWAY in their Z3 game?

Sherlock is the only game to use COMPACT-VOCABULARY? mode, which, according to my notes, “saves one byte per word in the vocab table, and allows all parts of speech to coexist except for verb + direction. It also reduces the preposition table entries from 4 bytes to 3, but increases the number of entries by including all synonyms.”

It works by moving preposition values out of the vocab table, leaving verb and direction as the only types whose values are recorded there, and reducing the number of slots from two to one. Preposition values are instead recorded in the preposition table.

The preposition table already existed before this, being used to write disambiguation questions (“What do you want to put your left foot in?”), but COMPACT-VOCABULARY? records the value as a byte instead of a word and stores preposition synonyms there as well.

Conveniently, the only forbidden type combination is one that the parser couldn’t handle anyway: if “down” were both a verb and a direction, then the command “down” would have to move you earthward and search the room for something drinkable.

Indeed, most calls actually go through this macro:

<DEFMAC WORD? ('W 'T)
    `<CHKWORD? ~.W
        ,~<PARSE <STRING "PS?" <SPNAME .T>>>
        ,~<PARSE <STRING "P1?" <SPNAME .T>>>>>

e.g. <WORD? .W VERB>

1 Like

Nouns don’t have meaningful values in the first place: they’re always 1, and in some (most?) games, the 1 isn’t recorded. (This is NEW-VOC? mode.)

I do see that Zork calls <WT? .WRD ,PS?OBJECT ,P1?OBJECT>. This actually returns the preposition value, not the noun (object) value. But it works, because preposition values are always nonzero, and that’s all the game ever checks for objects.

For instance, Beyond Zork has a word (BACK) that’s a noun, verb, and preposition. It has the preposition value in the first slot, verb value in the second slot, and an indicator of zero (P1?OBJECT):

In Seastalker, OFF is a noun and a preposition, with the preposition value in the first slot, a 1 (O?ANY) in the second slot that I believe never actually gets read, and an indicator of zero:

In fact, these days we can go right to the source and see how the code above was written by ZILCH. The numbers in square brackets correspond to the PS? and P1? constants (note that zero is used for multiple types):

(To understand the implementations below, it also helps to note that the ordering in WTYPETBL corresponds to descending order of PS? values and ascending order of P1? values.)

ADD-WORD tries to use the first slot first and then the second slot (whichever one doesn’t already have a nonzero value in it), and when it uses the first slot, it sets the indicator along with the type bit (lines 719-721):

And WTYPE-PRINT prints symbolic names in the assembly output, using the indicator as an index into WTYPETBL (which is why we see P1?OBJECT for prepositions):

This implementation would have disallowed the three-type word in Beyond Zork, but I suspect ZILCH had already been changed to store 0 for objects instead of O?ANY (1) by the time Beyond Zork came out.

2 Likes

Ahh, that’s the key piece I was missing—I didn’t realize noun meanings didn’t have IDs! That all makes sense, then. Thank you!