Dead code in PrefaceByArticle()?

otistdog · April 11, 2024, 3:08pm

I was looking more closely at the PrefaceByArticle() routine, and I noticed something that looks like a bug going back to at least StdLib 6/11.

There’s a section

artform = LanguageArticles
    + 3*WORDSIZE*LanguageContractionForms*(short_name_case + i*LanguageCases);

#Iftrue (LanguageContractionForms == 2);
if (artform-->acode ~= artform-->(acode+3)) findout = true;
#Endif; ! LanguageContractionForms
#Iftrue (LanguageContractionForms == 3);
if (artform-->acode ~= artform-->(acode+3)) findout = true;
if (artform-->(acode+3) ~= artform-->(acode+6)) findout = true;
#Endif; ! LanguageContractionForms
#Iftrue (LanguageContractionForms == 4);
if (artform-->acode ~= artform-->(acode+3)) findout = true;
if (artform-->(acode+3) ~= artform-->(acode+6)) findout = true;
if (artform-->(acode+6) ~= artform-->(acode+9)) findout = true;
#Endif; ! LanguageContractionForms
#Iftrue (LanguageContractionForms > 4);
findout = true;
#Endif; ! LanguageContractionForms

the purpose of which seems to be to compare the “contraction forms” of articles in the LanguageArticles table for a given article case, and, if a difference is found, set a flag findout to true. As an example, the LanguageArticles array looks like this for English:

Array LanguageArticles -->

 !   Contraction form 0:     Contraction form 1:
 !   Cdef   Def    Indef     Cdef   Def    Indef

	 "The " "the " "a "      "The " "the " "an "          ! Articles 0
	 "The " "the " "some "   "The " "the " "some ";       ! Articles 1

Engish has LanguageContractionForms == 2, so the applicable line for determining the value of the findout flag would be

if (artform-->acode ~= artform-->(acode+3)) findout = true;

This would compare, say, the first and fourth values of row “Articles 0” ("The " and "The ", which are the same), or the third and sixth values of the same row ("a " and "an ", which differ).

If findout is set, then later code will print the relevant portion of the noun phrase to a buffer to determine which of the differing values should apply in this case.

However, the condition (artform-->acode ~= artform-->(acode+3)) can never evaluate to true given the way that Inform 6 stores strings. The two strings "The " (column 1) and "The " (column 4) in the first row of LanguageArticles will have different string addresses even though they have the same text. As a result, findout is always set to true.

Obviously, it works, but it’s doing more work than it theoretically could in many cases, and more work than the code seems to intend. In the absence of a compiler feature to notice identical strings and set them to the same address, the only fix is to use string constants in the LanguageArticles table. That works but is a little ugly to look at.

Side note: I also noticed that short_name_case seems to be totally unused. It’s not mentioned in DM4 at all, so presumably it’s a fossil from an earlier iteration of the system.

zarf · April 11, 2024, 3:48pm

It looks like it’s allowing for the language file (which might not be English.h) to use string constants if it wants. It’s not dead code, it’s just that English.h isn’t taking advantage of the fast comparison.

I agree that using string constants in English.h would be nice. Although that’s completely untested in I6 history, so you’d definitely want to be careful making that change.

I also noticed that short_name_case seems to be totally unused.

I don’t remember what’s up with that.

otistdog · April 11, 2024, 4:17pm

I tried this out

Constant	art_CThe = "The ";	! compiler ignores capitalization of identifiers
Constant	art_the = "the ";
Constant	art_a = "a ";
Constant	art_an = "an ";
Constant	art_some = "some ";

Array LanguageArticles -->

 !   Contraction form 0:        Contraction form 1:
 !   Cdef     Def     Indef     Cdef     Def     Indef

	 art_CThe art_the art_a     art_CThe art_the art_an          ! Articles 0
	 art_CThe art_the art_some  art_CThe art_the art_some;       ! Articles 1

and it seems to work OK.

zarf · April 11, 2024, 5:00pm

Did anybody ever pick up my attempt to build all the DM4 exercises as a regression test suite? (https://github.com/erkyrath/Inform6Lib-Testing) That’s what I would consider “being careful”. Recompile a hundred or so test cases that exercise all documented parts of the library. Plus Advent, Balances, maybe Bronze. Verify that each one produces the same output as before.

fredrik · April 11, 2024, 7:50pm

This may have been used for a feature inspired by Swedish Inform, where short_name routines receive information on whether the library means to print the object name with an initial capital or not, and can adhere to this, or change it. One way to use this is to create an object which is never capitalized.

In Swedish, the definite article is omitted unless the object name is preceded by one or more adjectives, so a short_name routine needs to know if a capital version of the name is required in any case.

This feature is descibed (in Swedish, hopefully translatable and still comprehensible) at Instruktioner för Swedish Inform v1.6

I do recall discussing this feature on a popular forum, but I don’t remember if it was decided to include this in the library, and I can’t find the discussion now.

otistdog · April 12, 2024, 3:29pm

If that’s the intent, I’m having trouble understanding the semantics of the way that short_name_case is used.

It’s a global, without assignment by default so initializing at 0. There are two uses of it.

In the first use, it determines the correct entry in a table stored in a property array articles. The key block is:

if (obj provides articles) {
	artval=(obj.&articles)-->(acode+short_name_case*LanguageCases);
	if (capitalise)
		print (Cap) artval, " ";
	else
		print (string) artval, " ";
...

It looks to me like obj.&articles is intended to be an array of strings, structured like a single row of LanguageArticles. Based on the usage, the purpose of short_name_case would then be to identify a specific “contraction form” (which is an article form driven by the first letter of the word).

In the second use, it is part of code determining the correct row in LanguageArticles, i.e. the one that contains all of the contraction forms for a given case. The key block is:

i = LanguageGNAsToArticles-->i;

artform = LanguageArticles
    + 3*WORDSIZE*LanguageContractionForms*(short_name_case + i*LanguageCases);

Here it is added to the term i*LanguageCases, where i is the contraction form specified by the the GNA code of the object, and LanguageCases corresponds to the number of rows in LanguageArticles. Based on the usage, the purpose of short_name_case would be to act as a row offset from the determined contraction form.

Maybe I’m just not sufficiently enlightened to see the sense of it today. Is there something I’m missing?