Help wanted writing I6 library tests

I’ve finished looking into the DROP ROCKS issue.

TL;DR: It’s not a bug, it’s a feature.

Read on for the gory details, some actual bugs, and some thoughts on inform6lib development.

What’s the issue?

When we run the test program for issue 1885, we see:

When all of the rocks are on the ground and we try to drop a single rock, parsing succeeds and we receive a response from the Drop action telling us that the rock is already here.

When we try to drop multiple rocks, parsing fails and no action is generated. Instead, we receive an error message from the parser.

When we order the girl to drop the rocks, this error is transformed into a NotUnderstood action, sent to the girl’s orders, and redirected to her life, where it ultimately produces the suboptimal default response that we see.

In this post, we’ll focus on the simpler DROP ROCKS command with the player as the actor.

What’s causing it?

At parser trace level 1, we can see the big picture:

There are 3 grammar lines for DROP, and “drop rocks” fails to match any of them. Line 0 (* multiheld → Drop) is one that we’d expect to match our input and the one that we’ll focus on here.

The grammar requires the word ‘drop’ followed by text matching a multiheld token. The DM4 describes multiheld as “one or more held objects”, but, as we can see from our success parsing DROP ROCK, this isn’t a hard and fast rule; the reality is a little more nuanced.

If we crank up the trace level to 5, we can see more detail as the parser tries to parse “rocks” as a multiheld token:

>drop rocks
...
[line 0 * multiheld -> Drop]
 [line 0 token 1 word 2 : multiheld]
  [Object list from word 2]
  [Calling NounDomain on location and actor]
   [NounDomain called at word 2]
   seeking definite object
...
   [NounDomain made 4 matches]
   [Adjudicating match list of size 4 in context 3]
   indefinite type: plural
   number wanted: all
   most likely GNAs of names: 4095
   Scoring match list: indef mode 1 type 8, satisfying 0 requirements:
     The rock (27) in the Somewhere : 156 points
     The rock (28) in the Somewhere : 156 points
     The rock (29) in the Somewhere : 156 points
     The rock (30) in the Somewhere : 156 points
   Best guess the rock (27)   
   Rejecting it
   Best guess the rock (28)
   Rejecting it
   Best guess the rock (29)
   Rejecting it
   Best guess the rock (30)
   Rejecting it
   Best guess ran out of choices
   Made multiple object of size 0]
...

This trace is mostly produced by NounDomain and Adjudicate in parserm.h. The call chain that got us there is:
InformLibrary::play → InformParser::parse_input → Parser__parse (Part G) → ParseToken__ (Part D) → NounDomain → Adjudicate

The parser calls ParseToken__ to match a multiheld token (with input “rocks”), which calls NounDomain to find the best matching object or objects in the actor’s scope.

Notice that, when NounDomain is first called, we’re “seeking definite object”, but, by the beginning of Adjudicate, we’re looking for all of an indefinite group of objects (“indefinite_type: plural, number wanted: all”). This happens because, when NounDomain calls SearchScope to find matching objects in the actor’s scope, TryGivenObject finds that a rock matches the plural dictionary word ‘rocks//p’. This causes the parser to switch into indefinite plural mode. We look for “all” (represented by 100) because no specific number has been requested (we didn’t type DROP TWO ROCKS):

            if (dict_flags_of_noun & DICT_PLUR) {
                if (~~allow_plurals) k = 0;
                else {
                    if (indef_mode == 0) {
                        indef_mode = 1; indef_type = 0; indef_wanted = 0;
                    }
                    indef_type = indef_type | PLURAL_BIT;
                    if (indef_wanted == 0) indef_wanted = 100;
                }
            }

In contrast, DROP ROCK does not trigger this code and remains in definite mode, accounting for the difference in results that we’ll see.

After NounDomain finds multiple objects that match “rocks” (“NounDomain made 4 matches”), it calls Adjudicate to score them and disambiguate among them. This process depends on the actor, action_to_be, the token that we’re trying to match, whether we’re in definite or indefinite mode, and the state of the world model at the time. It’s described in some detail in §33 of the DM4 on pages 240-242, about which more below.

During disambiguation, we reach a block of code specific to indefinite plural mode, and (spoiler) it rejects all four of our rocks! (For I7 people: this code is analogous to the deciding whether all includes activity.):

    if (indef_mode == 1 && indef_type & PLURAL_BIT ~= 0) {
        if (context ~= MULTI_TOKEN or MULTIHELD_TOKEN or MULTIEXCEPT_TOKEN
                     or MULTIINSIDE_TOKEN) {
            etype = MULTI_PE;
            return -1;
        }
        i = 0; offset = multiple_object-->0; sovert = -1;
        for (j=BestGuess() : j~=-1 && i<indef_wanted && i+offset<63 : j=BestGuess()) {
            flag = 1;
            if (j has concealed && j has worn) flag = 0;
...
            if (context == MULTIHELD_TOKEN or MULTIEXCEPT_TOKEN && parent(j) ~= actor)
                flag = 0;
...
            n = ChooseObjects(j, flag);
            if (n == 0) n = LibraryExtensions.RunWhile(ext_chooseobjects, 0, j, flag);
            switch (n) {
              2: flag = 0;  ! forcing rejection
              1: flag = 1;  ! forcing acceptance
             !0:            ! going with parser's decision
            }
            if (flag == 1) {
                i++; multiple_object-->(i+offset) = j;
                #Ifdef DEBUG;
                if (parser_trace >= 4) print "   Accepting it^";
                #Endif; ! DEBUG
            }
            else {
                i = i;
                #Ifdef DEBUG;
                if (parser_trace >= 4) print "   Rejecting it^";
                #Endif; ! DEBUG
            }
        }
 ...
        return 1;
    }

The for loop iterates over all candidate objects and accepts or rejects each one in turn based on a series of tests. For DROP ROCKS, all of the rocks are rejected, resulting in a multiple object of size 0 which percolates up the parser and triggers a parse error.

The specific test that rejects the rocks is " if (context == MULTIHELD_TOKEN or MULTIEXCEPT_TOKEN && parent(j) ~= actor)".

So, that’s what’s happening.

In indefinite plural mode, for a multiheld token, an object is rejected unless the actor is holding it, even when there are no other candidates.

In definite mode, an object might be disfavored for multiheld because it’s not held, but it’s not rejected outright when there are no other candidates.

This code raises a couple issues that I’ll discuss later in this post.

  • Why is MULTIEXCEPT also included in this test when it ostensibly has nothing to do with held objects but simply excludes the indirect object?
  • This indef plural code differs from the 6/11 version, with the initial value of flag reversed. The 6/12 version of the test for concealed and worn is incorrect.

Notice also that an author can provide a ChooseObjects function that can override the parser’s decision (stored in flag). See DM4 §33 page 239.

Is this behavior intended?

Yes.

It’s mentioned in the DM4’s explanation of disambiguation:

Section (iii) makes it clear that this is intended behavior.

For additional evidence, in issue 1488, filed against I7 about a multiexcept issue, but stemming from the exact same line of code, GN says:

(The I7 does the player mean rules are roughly analogous to ChooseObjects.)

Ok. It’s intended behavior. Why?

So that this happens:

instead of this:

(Similarly for DROP ALL.)

The idea behind the indef plural disambiguation is that we want to decide what should reasonably be included in “all” or a group of objects based on the context of the actor, the action, and the world.

The particular rule that caused our issue says that, when an actor is performing an action that prefers held objects, those are the objects that the parser should choose preferentially. It only breaks down when there are no candidate objects that meet the criteria, as in DROP ROCKS when all rocks are on the ground.

So what can we do?

If we want the best of both worlds, we can write a ChooseObjects routine for our test program that will allow the parser to reject rocks on the ground when there are better candidates, while forcing it to accept rocks on the ground when there are none. (This also handles Take which suffers from a similar issue.)

[ ChooseObjects obj code x;
  if (code == 2) rfalse;
  if (action_to_be == ##Drop && obj ofclass Rock && obj notin actor) {
      objectloop (x in actor) {
          if (x ofclass Rock) rfalse;
      }
      rtrue;
  }
  if (action_to_be == ##Take && obj ofclass Rock && obj in actor) {
      objectloop(x ofclass Rock) {
          if (TestScope(x, actor) && x notin actor) rfalse;
      }
      rtrue;
  }
  rfalse;
];

With this ChooseObjects in place, we get these results:

Can we do more?

We could change the parser to relax this rejection filter when there is an undifferentiated set of candidates while still using it to disambiguate among candidates with differing characteristics.

However, it would take a bit of restructuring since the indef plural loop rejects candidates one at a time in decreasing BestGuess order. It’s only later in Adjudicate that objects are grouped into equivalence classes.

Also, it’s not clear that everyone would agree that the behavior in the transcript above is desirable. To quote the DM4’s introduction to ChooseObjects:

My conclusion: use ChooseObjects.

Here ends the discussion of DROP ROCKS. Next up are a couple of side issues, and then some thoughts about I6 library development moving forward.

Side issue: MULTIEXCEPT

When looking at the test that caused our issue:

if (context == MULTIHELD_TOKEN or MULTIEXCEPT_TOKEN && parent(j) ~= actor)

and reading code elsewhere in the parser, it wasn’t clear to me why MULTIEXCEPT was so often grouped with MULTIHELD. The DM4 summarizes it as “one or more in scope, except the other object” and later describes it in more depth, saying:

What does “everything but a particular object” have to do with held objects? The reference to a player’s possessions is suggestive, but it was only when I looked at grammar.h that it became clear.

These are the verbs in 6/12 that use MULTIEXCEPT:

Verb 'drop' 'discard'
    * multiheld                                 -> Drop
    * multiexcept 'in'/'into'/'down' noun       -> Insert
    * multiexcept 'on'/'onto' noun              -> PutOn;
...
Verb 'insert'
    * multiexcept 'in'/'into' noun              -> Insert;
...
Verb 'put'
    * multiexcept 'in'/'inside'/'into' noun     -> Insert
    * multiexcept 'on'/'onto' noun              -> PutOn
    * 'on' held                                 -> Wear
    * 'down' multiheld                          -> Drop
    * multiheld 'down'                          -> Drop;

All of the grammar lines that use the multiexcept token involve Insert or PutOn actions. In other words, placing held objects somewhere. The purpose of the “except” part of multiexcept is to exclude the indirect object so that things are not placed in or on themselves, which would cause cycles in the object tree (making it not a tree (well, not a forest)).

So, MULTIEXCEPT really should be thought of as MULTIHELDEXCEPT. I don’t know what the original intention was, but that’s what it’s become in practice today.

Side issue: changes to “rule 7ip” indef plural code since lib 6/11

The indef plural code in Adjudicate that we quoted above differs from the lib 6/11 version.

Here’s 6/11:

        for (j=BestGuess() : j~=-1 && i<indef_wanted && i+offset<63 : j=BestGuess()) {
            flag = 0;
            if (j hasnt concealed && j hasnt worn) flag = 1;

Here’s 6/12:

        for (j=BestGuess() : j~=-1 && i<indef_wanted && i+offset<63 : j=BestGuess()) {
            flag = 1;
            if (j has concealed && j has worn) flag = 0;

In both cases, the meaning of the flag variable is the same. flag == 1 means accept an object, and flag == 0 means reject it.

In 6/11 the flag is initialized to 0 (reject), and various tests set it to accept object j if it’s a good choice in context. In 6/12, the flag is initialized to 1 (accept) and various tests clear it to reject object j if it’s not a good choice.

In 6/11, we see that j is provisionally accepted if it’s not concealed and it’s not worn. This makes sense in the context of commands like TAKE ALL (where we wouldn’t want to include concealed objects) or DROP ALL (where we wouldn’t want to drop things that the actor is wearing), and the excerpt from the DM4 that we quoted above is consistent with this: “mark each as accept unless (i) it has worn or concealed”.

In 6/12, because the default value of the flag was reversed, this test also had to be reversed. Instead of accepting if j isn’t concealed and isn’t worn, we want to reject j if it is concealed or is worn. However, the code, as written, rejects it only if it’s both concealed and worn. This is a bug.

The code should read:

if (j has concealed or worn) flag = 0;

Inform 6 Library Development and Inform 7’s I6 Template Code

In an earlier post in this thread, I described the source of an error in how the parser constructs its results after calling an actor’s grammar routine and provided a patch for it along with some concerns (multiple objects, noun = 0 and second ~= 0).

Let’s look at the history of this issue.

In 6/11, we have this:

        i = RunRoutines(actor, grammar);
...
        if (i == 1) {
            results-->0 = action;
            results-->1 = noun;
            results-->2 = second;
            rtrue;
        }
...

This is clearly a bug, because the parser uses the first four words of results to store the action, the number of parameters (0, 1, or 2), the noun, and the second respectively. Here, someone forgot to place the number of params in results–>1 and instead shifted both noun and second one word closer to the beginning of the array. This creates a bad num params value based on the obj id of the noun, ignores the noun, treats the second as the noun, and leaves the second unset.

In 6/12, this fix was applied:

       if (i == 1) {
            results-->0 = action;
            results-->1 = 2;            ! Number of parameters
            results-->2 = noun;
            results-->3 = second;
            rtrue;
        }

However, this wasn’t quite correct, because it unconditionally sets num params to 2 and so doesn’t handle cases where the grammar routine sets noun or second to 0.

Then we have my recent proposed fix:

       if (i == 1) {
            results-->0 = action;
            results-->1 = 0;            ! Number of parameters
            results-->2 = noun;
            if (noun ~= 0) (results-->1)++;
            results-->3 = second;
            if (second ~= 0) (results-->1)++;
            rtrue;
        }

This ensures that num params matches the number of nonzero parameters, but, as I said, I had concerns about some cases. In considering this, I decided to see what Inform 7’s I6 template code does.

I7 does this (translated to inform6lib-speak):

        if (i == 1) {
            results-->0 = action;
            results-->1 = 0;
            results-->2 = noun;
            results-->3 = second;
            if (noun) results-->1 = 1;
            if (second) results-->1 = 2;
            rtrue;
        }

This looks like the way to go for 6.12.1. Though I’ll note that it allows grammar to return noun as a multiple object but not second. I’m not sure if we can ever have actions with a single noun and a multiple second (DRAW MUSTACHE ON ALL STATUES). Maybe it’s not a big deal, because we could reverse noun and second and effectively DECORATE ALL STATUES WITH MUSTACHE.

So, looking at this history, I wonder how many other fixed bugs are sitting in the I6 template code and not incorporated back into the regular I6 lib or vice versa?

Should we consider the I7 template layer permanently forked from inform6unix? The template layer docs suggest so:

Would it be possible to share code between the two projects – in particular, the parser code? One of the main differences seems to be the use of activities to allow the game to intercede instead of the traditional entry points and LibraryExtensions hooks. What if we implemented rulebooks in the I6 lib (we have I7’s generated intermediate code as proof of concept) and used #ifdefs to conditionally define BeginActivity, ForActivity, and EndActivity?

If the two codebases won’t be able to converge, should we allow them to diverge freely? Does I7 depend in any way on I6 lib remaining the same? After the automated tests are in place, should we decruft/refactor the library? Should we add features to it?