Fair enough, but having had a look at Counterfeit Monkey itself, it’s clear that the root cause of the ‘curly apostrophe/?’ trouble is the inbuilt extension Punctuation Removal by Emily Short, which consists of numerous phrases that invoke VM_Tokenise(buffer, parse) then refresh players_command but not num_words. The relevant code used by Counterfeit Monkey is in the Presentation Details modular extension:
Presentation Details by Counterfeit Monkey begins here.
Chapter 3 - Typographical Conveniences
Section 2 - Input Editing
[There are a number of situations -- when typing a question with a ? at the end, in a command like TYPE "BROWN" ON COMPUTER, etc. -- where we want to throw away any punctuation that the player has added to the input, and no situations in which we want to keep it (except for full stops, which might indicate the beginning of another command). So we edit those out.]
Include Punctuation Removal by Emily Short.
First after reading a command (this is the remove stray punctuation rule):
remove stray punctuation.
There might perhaps be merit to a failsafe here, but the reason ParseTokenStopped() is misbehaving is that num_words is stale when it is called. As long as num_words is fresh, ParseTokenStopped() correctly returns GPR_FAIL.
Ah, you found the first domino! Still, looking around, I see other places in parser code where VM_Tokenise() is called without a paired update of both num_words and players_command (VM_ReadKeyboard() in Glulx.i6t, SetPlayersCommand() in Text.i6t, TestKeyboardPrimitive() in Tests.i6t, Keyboard() in Parser.i6t), just as you spotted in SpliceSnippet(), so that’s not the only error creating hostages of fortune.
Is there any situation in which one would legitimately want to call VM_Tokenise() but not update those two globals? It does seem worthwhile to make the code to update them part of workings of VM_Tokenise() itself – especially if, as seems to be the case, even the people who are literally the world’s most familiar with the code base are prone to forget about this detail.
One doesn’t immediately occur to me. I think it’s worth at least putting it to @GrahamNelson. The parser is undergoing a major overhaul for the next release to make it fully Unicode compliant, which will inevitably break a lot of existing parser-extension code, so this might be a good opportunity to make some other changes to render it more robust. Personally, I would incline to your suggestion of retiring num_words altogether, possibly players_command also. @zarf (or others) : any thoughts?
I took a crack at retiring them yesterday as an experiment. It was about an hour’s work to eliminate num_words entirely from the 6M62 templates. It’s not possible to eliminate players_command without modifying the I7 compiler, because (as seen above) it outputs the variable name as a hardcoded string in snippet-related functions. (I’m not sure whether the same happens for num_words anywhere. It doesn’t happen in a minimal project.) Other than that, there was no obstacle that I could see.
In doing so, I noticed that in many of the places where num_words is read, it’s being done as part of a loop continuation clause, so – perhaps ironically – I started to see the value of having it cached somewhere in the spirit of minimizing wasted work. Arguably, this could be done locally to each function using it, but in places like the core parser routine or NounDomain() locals are scarce.
Binding the update of these globals more tightly to the operations that modify the state that they’re supposed to be caching may be good enough to eliminate problems like this one, and to allow other code to rely on these globals as fresh. It’s certainly less disruptive.
For reference, here’s the (zipped) story.ni from my test project for removing num_words (6M62 style). I don’t know how to put it through the standard battery of regression tests – is that public?