Parentheses and Punctuation Removal

severedhand · October 21, 2022, 2:12am

(6M62)

Unless my command parsing debug statements aren’t showing me what I think they are, the I7 parser accepts parentheses in the player’s command if the player types them.

Next, my observation is the parser cannot match parentheses in a normal grammar token. I mean if you say

Understand "door (big)" as big-door

it will just never match, even if the player types that exactly.

I have some room names I would like to present in this parentheses style, and my idea about the best way to match them if they’re typed is to have the parser strip parentheses from player input, then to have grammar tokens that don’t include parentheses.

I could hack any routine from Emily Short’s Punctuation removal to strip a left parenthesis, then another to strip the right, but not being I6y myself, I don’t know the syntax for a single routine would strip both the left and right.

For instance, here’s the routine that strips question marks:

Include (-

[ Questionstripping i;
	for (i = WORDSIZE : i <= (buffer-->0)+(WORDSIZE-1) : i++)
	{ 
		if ((buffer->i) == '?') 
		{	buffer->i = ' ';  
		}
	}
	VM_Tokenise(buffer, parse);
];

-)

Assuming all my other observations have been correct, could someone please tweak this to remove both left and right parentheses in a single blow? Thanks.

-Wade

Draconis · October 21, 2022, 2:26am

Include (-

[ ParensStripping i;
	for (i = WORDSIZE : i <= (buffer-->0)+(WORDSIZE-1) : i++)
	{ 
		if ((buffer->i) == '(' or ')') 
		{	buffer->i = ' ';  
		}
	}
	VM_Tokenise(buffer, parse);
];

-).

To strip parentheses: (- ParensStripping(); -).

zarf · October 21, 2022, 2:44am

I think that counts as a bug (in 6M62; haven’t tried 10). The I6-level tokenizer only treats " , . as single-character words, but the I7-level Understand statement disagrees.

Draconis · October 21, 2022, 3:35am

As a side note, because I find little quirks like this interesting: I6 has an unusual feature, as far as programming languages go, where you can include multiple options on the right side of a comparison operator (x == 1 or 3 or 5).

This dates from the earliest versions of Inform, when it was really just a glorified Z-machine assembler, because it’s a feature of (some of) the Z-machine’s comparison opcodes: @je x 1 3 5 jumps if x is equal to any of the other values.

However, it’s generally not recommended for anything except equality. It’s supported, and it works in the “logical” way, testing if (say) x is less than any of the other values—but the Z-machine doesn’t have instructions to test if a number is ≥ another, or ≠ another, for example. Instead, it has a feature to invert the results of any comparison opcode. And this means the “logical” way for or to work actually ends up being an and, because the whole comparison gets inverted: x <= y or z actually means not (x > y or z).

As a result, or is seldom used for anything except checking if a variable matches a list of constants. Like in this case, checking if the current character in the buffer is either a ( or a ).

severedhand · October 21, 2022, 5:13am

Well, you already know what my I6 is like (i.e. officially non-existent, but I can sometimes hack pre-existing blobs by looking at what’s adjacent). I was just patting myself on the head for understanding for the first time the == in the original bit of the extension I posted, because I’ve recently been watching some Roblox scripting tutorials to help my nephew. And they’re in a version of LUA. So they taught me about the syntax of ==, <= etc.

You’d think it might have been obvious. After all, in most versions of BASIC of which I am a veteran, >= means greater than or equals. But in BASIC, you just use = for equals. That’s why the double == had always confused me until I was explicitly told ‘that’s equals’.

-Wade

Draconis · October 21, 2022, 4:18pm

Yeah, for some reason the convention that caught on across a wide swath of programming languages is = for assignment, == for equality (and sometimes even === for “stricter” equality). I’ve always thought = for equality and := for assignment would be clearer, but that’s just how it is sometimes with conventions.

Even more fun, since a lot of Inform 6’s syntax was developed ad hoc as Graham hacked more and more features into his early assembler, it ends up violating a lot of these usual conventions. For example he decided early on that ! should be used to mark comments (or “remarks” as BASIC calls them), which means later he couldn’t use != for ≠, or ! for logical negation. So “not equals” is ~= and logical negation is ~~.

The language is full of these little idiosyncrasies, which I find really cool because they’re like hidden artifacts telling the story of how the language developed. But it can also make it annoyingly frustrating to dip your toes in (to write just a couple little routines for an I7 extension) without learning the whole thing. (And I still always forget the semicolon after a routine definition.)

Zed · October 21, 2022, 4:27pm

I’d like to see languages finally bust through the ASCII ceiling and use ← for assignment.

Draconis · October 21, 2022, 4:32pm

I think Haskell was on the right track, letting people define their own operators using any Unicode punctuation characters. The question is how many people are willing to install a special keyboard layout to type ←…

Zed · October 21, 2022, 4:45pm

Admittedly, that bit’s a problem. There’s always <-.

To (v1 - a value of kind K) <- (v2 - a K): (-
  if (KOVIsBlockValue({-strong-kind:K})) {
    BlkValueCopy({-by-reference:v1},{v2});
  }
  else {
    {-by-reference:v1} = {v2};
  }
-).