Better guessing of what object is intended

Warrigal · February 6, 2021, 12:50am

I was conscious of that, but if you do the maths, I’m pretty sure that when you only provide a noun, the object without adjectives will always provide a better match than the object with adjectives.

Perhaps this is a good case for providing a list of adjectives and a list of nouns for each object instead of the stupid list of names that we currently have and the need to write lots of little parse_name routines. If we had that, then, once again, the object without adjectives will provide a better match than the object with adjectives.

Anyway, that’s not the immediate issue. Can you think of a technique that @DavidG can use to prevent the disambiguation question being repeated in this situation in a general-purpose way?

DavidG · February 6, 2021, 4:16am

Your 100% versus 50% matching idea is pretty much what I’m after. What eludes me is exactly how to go about it and that’s why I started looking into comparing a string contained in an address versus one in an array. This strikes me as sloppy, but effective enough for what I want. If someone can help me figure out the percentage approach, that would be much better. Adjudicate() seems to be the place to do it.

Here’s my test game:

Click to expand!

Constant INFIX;
Constant DEBUG;

Constant Story "AMBIGUOUS OBJECTS";
Constant Headline "^An Interactive Bug Reproduction^";

Include "parser.h";
Include "verblib.h";

Object room "Room"
	with description "You are in a room.",
	e_to otherroom,
	has light;

Object ironkey "iron key" room
	with name 'key' 'iron';

Object silverkey "silver key" room
	with name 'key' 'silver';

Object key "key" room
	with name 'key';

Object rock "rock" room
	with name 'rock',
	description "Just a rock.";

Object polkadotkey "ruby key" room
	with name 'ruby' 'key';

Object otherroom "Other Room"
	with description "You are in a different room.",
	w_to Room,
	has light;

[ Initialise;
	location = room;
];

Include "grammar.h";

Warrigal · February 6, 2021, 5:52am

It looks like the solution is in the AskPlayer routine in parser.h. By the time we get there, the matching objects are in match_list. It prints the disambiguation question in L__M(##Miscellany, 45) or L__M(##Miscellany, 46), followed by the list of matching objects in match_list. It gets the answer to the question using answer_words = Keyboard(buffer2, parse2);, checks whether the answer is ‘all’ or starts with a verb and, if it doesn’t, it inserts the answer into the original input buffer and reparses it.

In my simplified view of the world, it looks like you just need to jump in before those last two steps, and compare the object in the answer in parse2 against the matching objects in match_list, rather than reparsing.

Am I on the right track, or should I shut up and crawl back into my hole?

fredrik · February 6, 2021, 11:58am

Not sure if you consider this sufficient. If all other attemps to single out the best match have failed, this checks if there’s an object which has a single word as its name and that word corresponds exactly to what was typed (up to nine characters, to account for the dictionary resolution).

At line 3660 in parser.h, just after if(flag) { add this:

			if(match_length == 1) {
				k = 0;
			    for (i=1 : i<=number_of_classes : i++) {
			        while ((match_classes-->k) ~= i or -i) k++;
			        j = match_list-->k;
					if(MatchWord(j, match_from)) return j;
			    }
			}

Also add the MatchWord routine:

[ MatchWord p_obj p_word_no len_name len_input start_input i;
	p_word_no--;
	@output_stream 3 StorageForShortName;
	print (name) p_obj;
	@output_stream -3;
	len_name = StorageForShortName --> 0;
	if(len_name > 9) {
		for(i = 9 : i < len_name : i++) if(StorageForShortName -> (WORDSIZE + i) == 32) rfalse;
		len_name = 9;
	}
	len_input = (parse + 2) -> (4 * p_word_no + 2);
	if(len_input > 9) len_input = 9;
	if(len_name ~= len_input) rfalse;

	start_input = (parse + 2) -> (4 * p_word_no + 3);
	for(i = 0 : i < len_name : i++) {
		if(StorageForShortName -> (WORDSIZE + i) ~= buffer -> (start_input + i))
			rfalse;
	}
	rtrue;
];

mirality · February 6, 2021, 12:05pm

Meanwhile all of this is solved in exactly one line of I7 code…

fredrik · February 6, 2021, 12:20pm

…which translates to a lot more lines of I6 code.

How is this reply helping the discussion?

DavidG · February 8, 2021, 5:09am

This one appears to do the trick. Thanks! Now I’m working on making sure it works for Glulx.

auraes · February 8, 2021, 5:37am

There is a problem if you want to determine the length of a word in the dictionary with accented characters. For example, the length of the word ‘élève’ returned by StorageForShortName -->0 is 3 and not 5.

Edit:
It’s not sure that this is a problem here, by the way.

Edit:
Since you are using the buffer array, it is quite possible that this is a problem.

DavidG · February 8, 2021, 6:54pm

I’m having trouble figuring out differences in how Glulx handles the input and parse buffers from how the Z-machine does it. Could I get some assistance @zarf? The relevant code is in the simple2 branch at Commits · simple2 · David Griffith / inform6lib · GitLab.

zarf · February 9, 2021, 11:03pm

Pretty sure it should be

len_input = parse --> ((p_word_no+1) * 3 - 1);

start_input = parse --> ((p_word_no+1) * 3);

Allowing for the fact that you’ve decremented p_word_no, and start_input is supposed to be an offset from buffer. But that’s just eyeballing it and comparing to the definitions of WordAddress() and WordLength(). I haven’t tested it.

DavidG · February 10, 2021, 5:08am

It seems I got the code to work by doing this instead for start_input:

start_input = parse --> (((p_word_no+1) * 3)) - 2;

I tested this with the base noun varying from 1 to 9 characters. It worked fine for 2 to 9, but for a noun of one character, I got something bizarre. For Glulx, the game would compile without comment. For Z-machine, I got warnings from the compiler (not the library) like this:

Inform 6.34 for Unix (21st May 2020)
line 14: Warning:  'name' property should only contain dictionary words
>  with name 'v'
line 17: Warning:  'name' property should only contain dictionary words
>  with name 'v'
line 20: Warning:  'name' property should only contain dictionary words
>  with name 'v'
line 23: Warning:  'name' property should only contain dictionary words
>  with name 'v'
Compiled with 4 warnings

For both Z-machine and Glulx, the game would respond like this:

Room
You are in a room.

You can see an iron v, a silver v, a v, a ruby v and a rock here.

>TAKE V
You can't see any such thing.

This behavior happens before the changes discussed. Why does the compiler complain about objects with a a single character in a name property? Why are the complaints printed only for Z-machine builds?

It seems like Inform6 was never able to deal with single-character nouns. If my guess is correct, that’s unfortunate because we often talk about putting an ‘A’ in a particular place when talking about words and their construction. It would make a Sesame Street game rather difficult to code.

fredrik · February 10, 2021, 5:20pm

Inform and the Z-machine handles single character nouns fine. You have to add // to tell the compiler that it’s a dictionary word and not a character, like: ‘v//’

zarf · February 10, 2021, 5:28pm

I’m surprised the compiler doesn’t show the same warning when compiling on Glulx. I must have taken it out when I added the Glulx back end, but I can’t remember why or think of a valid reason to suppress it.

EDIT: Maybe I was planning a special case where single-letter 'x' literals are interpreted as dict words when in a name property? There’s already a special case where double-quoted "foo" literals are interpreted that way in a name property… but if that was the idea, I never got around to it.

drpeterbatesuk · February 10, 2021, 6:32pm

See DM4 1.4 for explanation… and the confession ‘The
handling of dictionary words is probably the single worst-designed bit of syntax in
Inform’

DavidG · February 11, 2021, 8:54pm

I tried compiling the test game with élève as the base name. Inform didn’t like that:

line 14: Error:  Character can be printed but not input: (ISO Latin1) $ffffffa9
>  with name 'élève' 'iron'
line 14: Error:  Character can only be used if declared in advance as part of 'Zcharacter table': (ISO Latin1) $a9, i.e., '�'
>  with name 'élève' 'iron';
line 14: Error:  Character can only be used if declared in advance as part of 'Zcharacter table': (ISO Latin1) $a8, i.e., '�'
>  with name 'élève' 'iron';

fredrik · February 12, 2021, 12:21am

Try -Cu to set the character encoding to UTF-8?

DavidG · February 12, 2021, 6:06am

Now that I have the test game compiled, it seems to work fine.

You can see an iron élève, a silver élève, a élève and a rock here.

>TAKE élève
Taken.

>I
You're carrying:
  a élève

>

fredrik · February 12, 2021, 7:51am

This is according to specification.

The dictionary uses six bytes to store a dictionary word in z5. Six bytes can hold nine five-bit codes. a-z all take a single five-bit code each. An accented character takes four five-bit codes. So for your example word:

é: 4 five-bit codes
l: 1 five-bit code
è: 4 five-bit codes
v: 1 five-bit code
e: 1 five-bit code

After ‘élè’ you have used nine five-bit codes and the word gets truncated.

You can change this by putting accented characters into the alphabet table.

(None of this matters for the solution I proposed, since it doesn’t use the dictionary)

auraes · February 12, 2021, 9:16am

Using the “Zcharacter” directive can change the size of a special character, but I don’t know if the program code can detect it. How do I know if ‘é’ has a length of 4 or 2?
I had this problem when I wanted to reproduce the Tokenise instruction for version 3, because I had to compare a word entered in the buffer with the words in the dictionary. In the end, I did it another way.
But this is not the topic here.

fredrik · February 12, 2021, 9:26am

The details are in chapter V, §36 in DM4.

Zcharacter ‘é’; puts this character in the third row of the alphabet table, giving it the cost of 2 five-bit codes.

I prefer using this form instead, to get full control:

Zcharacter “abcdefghijklmnopqrstuvwxyz”
“ABCDEFGHIJKLMNOPQRSTUVWXYZ”
“0123456789!$&*():;.,<>@{386}”;

For Swedish, I replaced qwz with åäö, since qwz are all very rare in Swedish.

All characters on row 1 cost one code, while the characters on row 2 and 3 cost two codes each. Characters not in the table cost 4 codes.