Rewriting the multiple-command part of the parser

Draconis · June 13, 2019, 11:04pm

The parser machinery for multiple commands has caused quite a lot of headaches over time, since it’s very difficult to tap into it from the I7 level, and any preprocessing code has to happen before the commands are split up.

I’d like to make an extension to change this. But I’m curious if there are any potential pitfalls I need to look out for: it would basically split “reading a command” into two activities, “reading an input” and “extracting a command”, then flow something like this:

Call “reading an input”, which should fill in buffer1
Reset doneflag
While doneflag is not set:
- Call “extracting a command”, which should fill in buffer2 and optionally set doneflag
- Call the actual parser on buffer2

Now input modifications like Punctuation Removal would happen in “after reading an input”, while input modifications like adverb recognition would happen in “after extracting a command”.

Are there any major flaws in this design?

Draconis · June 13, 2019, 11:07pm

This would also help deal with this:

Inform parses a command like “ALICE, JUMP. BOB, JUMP.” as “asking alice to try jumping” followed by “answering alice that ‘bob, jump’”

Which “precedence” to give to commas versus periods could be handled by the “extracting a command” activity.

matt_weiner · June 14, 2019, 12:10am

I’m not sure that extracting a command can be separated from the parser while preserving the current behavior, because of the ambiguity of the comma. In order to figure out whether “bow, go north” is a request or a chained command to bow and go north, the extractor will need to know whether “bow” is a verb or a name (in case Clara Bow is around).

OTOH that behavior is pretty erratic anyway–it only works if the command before the comma is a single word. “north, jump” or “i, jump” will be processed as chained commands, “go north, jump” and “take inventory, jump” won’t. So it’s probably not too high a cost to mess up chained commands with commas in some cases.

matt_weiner · June 14, 2019, 12:12am

I mean who chains commands with commas anyway.

drpeterbatesuk · June 14, 2019, 9:16am

I assume this ability to chain (some) commands with commas was an effort by Graham Nelson to replicate the behaviour of the Infocom parser in that respect? I wonder if this is an archaism worth the headache of preserving. As Matt asks, does anybody do this? I’ve played text adventures since the 1980s and didn’t know (or had forgotten) it was even a thing.

drpeterbatesuk · June 14, 2019, 9:41am

I guess if you are going to try to replicate the current parser’s interpretation of periods and commas you would need to start by digging into and building a flowchart of the relevant parser logic and entry points… Good Luck!

drpeterbatesuk · June 15, 2019, 2:56am

`With a bit of preliminary digging I have so far come to the following conclusions:
1. The parser starts by substituting 'then' tokens (THEN1__WD) for any periods in the command line and  'comma' tokens (comma_word) for any commas.
2. If it doesn't recognise the first word in a command as a verb:
     2.1.	It looks for some special cases (again, undo, directions- which it reinterprets as 'go direction'
     2.2.	Otherwise, it looks to match [text] comma against a talkable or animate object in scope.
          2.2.1.	If it can do this, it changes the actor to said object and reparses the command, starting from the first word after the comma, as a command (or sequence of commands) to said object.
          2.2.2.	Parsing of this command (or sequence of commands) will succeed or fail according to the usual parsing rules.
          2.2.3.	If parsing of the command (or sequence of commands) fails, the parser reinterprets the entire command line as ‘Answer object [topic]’ topic being all the text after the comma
3. If the first word is recognised as a verb acting on something, the parser interprets further words after a comma as multiple further objects to be acted on by the same verb, as in:
take pot, kettle, skillet
pot: taken
kettle: taken
skillet: taken
    3.1.	If the word(s) after the comma cannot be interpreted as additional object(s) to be successfully acted on, the command is rejected and suitable errors generated. This might be for example ‘You can’t use multiple objects with that verb’ as in ‘examine pot, kettle’ or ‘You can’t see any such thing’ as in ‘drop yeti, sasquatch’
    3.2.	This is why ‘drop pot, go east’ fails with ‘You can’t see any such thing’, as the parser is trying to interpret ‘go east’ as something else to drop.
4. However, should the parser successfully match a grammar line without encountering the possibility of the comma indicating a talkable/animate object to be spoken to or a list of multiple objects to be handled, it treats ‘then words’ (‘then’ or period) and commas the same- as indicating the start of a further command.
4.1.	In practice, on account of its preference for interpreting commas as indicating lists of objects, this only ever happens when the comma follows a verb which acts on nothing (usually a single word command such as ‘e’ or ‘jump’) so only commands of this sort can be chained together with commas
5.	Multiple commands are handled by moving the word parsing marker to the beginning of the command after the demarcating ‘then word’/’comma’, blanking out the command line up to the ‘then word’/’comma’, then re-tokenising the command line (stripping away these leading blanks in the process) and starting the parsing process anew.
`

drpeterbatesuk · June 16, 2019, 1:22pm

I think the fundamental issue here is whether you wish to replicate the current parser’s handling of commas- it’s the ambiguity of commas that makes it difficult to ‘chop up’ a multiple command line ahead of time.

The current parser doesn’t finally decide what it thinks about a comma (or series of commas) until the middle of command processing, when it is potentially matching grammar lines including tokens representing multiple-objects which might be represented in the command line by a comma-delimited list of objects.

The dilemma for positioning an entry point to interfere with the initial command in a multi-command line is that it’s not clear whether to treat commas as ‘then-words’ until command-processing is performed, but paradoxically command-processing might proceed differently if we had already interfered before it started…

The parser isn’t very smart about this since when parsing a verb with multi-token grammar, if it notices that it can’t match the text after a comma as continuing a valid multi-token it bails and generates an error rather than going back to consider whether instead the player intended the comma as a ‘then word’. The latter is far more likely to be the case if the text after the comma and before the next comma/‘then word’/end-of-line can be matched as oops/undo/again, a compass direction, a talkable/animate object or text beginning with a valid verb (this being the sequence the parser goes through sequentially when trying to match the very beginning of any command).

Any routine to redesign the multi-command behaviour of the parser would want to improve on rather than replicate this, I guess, as well as the current assumption that all commands after an ‘NPC, command’ are additional commands to be issued to the same NPC even if (as in your example) they start by addressing a different NPC.

Draconis · June 16, 2019, 5:28pm

Yes indeed; in this hypothetical redesign, commas would not be allowed as command separators (only THEN*_WD). The current behavior strikes me as a bug more than a feature.

drpeterbatesuk · June 16, 2019, 5:47pm

I’ve had a quick play with Suspect and the Infocom parser in that doesn’t show the same behaviours, so the present Inform parser doesn’t even have the merit of historical accuracy. Given the substantial complications of using commas as THEN*__WDs and the very doubtful benefits I would vote for simplicity- and restricting commas to use in lists of objects and introducing conversation with NPCs.

drpeterbatesuk · June 16, 2019, 5:54pm

I guess if you wanted to retain the convenience of being able to issue a sequence of commands to an NPC without having to repeatedly type ‘NPC,’ each time, these could be separated by semicolons as in Alice, ask the Mad Hatter about the teapot; drink some tea; wake the dormouse’

Draconis · June 16, 2019, 6:07pm

That could work! Currently I believe Inform does that with periods, but the convenience of being able to distinguish “Alice, X; Y” from “Alice, X. Bob, Y” would be worth adding semicolons to the parser.

drpeterbatesuk · June 16, 2019, 6:17pm

Continuing the theme of avoiding complications and abiguity for the parser without inconveniencing the player, it’s also a bad design decision to allow commas for introducing conversation when a colon or a dash could be used-
Alice: take the biscuit, teacup and Victoria sponge; ask the Hatter about the Hare; wake the dormouse
Alice- take the biscuit, teacup and Victoria sponge; ask the Hatter about the Hare; wake the dormouse

drpeterbatesuk · June 16, 2019, 6:30pm

The above scheme would allow a simplicity in getting started in parsing a command line that is very evidently absent in the current parser code required by present conventions-

find the first period or semicolon and use that to demarcate the first command
look for a colon or dash and if found, attempt parsing as an attempt at an order or conversation
else attempt parsing as a command starting with a verb

matt_weiner · June 16, 2019, 6:43pm

Right now any chain of commands after a comma will be interpreted as chained requests. So “jump. Bob, jump. x me then go north” is interpreted as the player jumping, then Bob jumping, Bob examining the player, Bob going north.

If you try “Bob, jump. Alice, jump” the actions are Bob jumping and then answering Bob that “Alice, jump.” So it’s as if you’d typed “Bob, jump” and then “Bob, Alice, jump.”

So again there’s nothing to be lost by eliminating commas as command separators.

About using semicolons or dashes or whatever–that could be nice but also seems like something the game would have to explain explicitly if it were being used. The thing about the “Alice, jump” and period/then syntaxes is that it is well established so people will use it and we want to leave it the way it is as much as possible. And then the thing with commas as command separators is that it doesn’t seem as well established and people don’t use it so it’s dispensable.

drpeterbatesuk · June 16, 2019, 6:58pm

I think that’s right, and these (sequences of multiple commands etc.) are probably all too infrequent uses of the parser to become new conventions unless the authors of Inform were in due course to take them up and publicise them.

That said, the difficulties presented to the parser by the present conventions is illustrated by the point that despite two decades of development & refinement by as frighteningly an intelligent bunch as Mr Nelson and his assistants, it still doesn’t perform as well in these areas as the Infocom parser.

Building an extension introducing the possibility of using colons, semicolons and dashes- which don’t currently feature in the recognised typed input of the parser- would not necessarily need to compromise the ability of players to continue using the existing rather poorly understood and partially-implemented system of commas and periods simultaneously…

bikibird · June 16, 2019, 7:16pm

Are the rules for parsing input documented anywhere, for example in BNF notation, either for Inform 7 or Infocom? I’ve seen vague descriptions, but never a detailed spec.

Draconis · June 16, 2019, 7:17pm

If so, I’ve never seen them. Everything I know I know from looking at the parser internals.

drpeterbatesuk · June 16, 2019, 8:39pm

Are the rules for parsing input documented anywhere, for example in BNF notation, either for Inform 7 or Infocom?

I think the closest I have come across (for Inform) is the Inform Designer’s Manual, which is both incomplete and out-of-date.

Apart from that, like Daniel, just reverse-engineering the code.

dfremont · October 29, 2019, 3:55pm

In case anyone doesn’t already know about this, the literate programming version of the parser (and the template layer more generally) is somewhat more documented and easier to read than the raw I6 code. The “sources” page doesn’t seem to be on the Inform 7 website anymore, but you can find an archived version here: check out Woven/index.html, or Woven/B-parst.pdf for the parser in particular.