I6 compiler bug, line error reference

8bit_era · May 13, 2023, 10:01am

Probably one for @zarf. I actually noticed this some months ago but it didn’t really bother me until I started developing Ghosts of Blackwood Manor.

It seems that the line error reference calculation in the compiler is wrong. So when it notices an error, the integer output of the line reference is actually expected number x2 (sometimes rounded, so one off). Looks like it could be a problem with 32-bit / 64-bit integer conversion but it might be something else as well. Here is an example.

As you can see here, the compiler claims an error found in line 1517.

While what it means is the missing semicolon in line 759.

I am using the “in development” compiler and I compile it myself but it’s just a regular compilation with no unusual flags involved, just as you compile a plain C program with either GCC or Clang. And that’s probably important to know a well: the bug exists in my main development system, which is 64-bit Debian based (using GCC) but also on my mobile workstation, which is an Apple M1 MacBook Pro run with MacOS Ventura, where I am using Clang to compile the compiler itself.

Piergiorgio_d_errico · May 13, 2023, 2:12pm

I confirm the issue, also in inform 4 unix 6.41-r5, compiled here with GCC 3.2.0:

file /usr/local/bin/inform-6.41 
/usr/local/bin/inform-6.41: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=9080ccf40f75a9ed5119b07d482b10d9dd12cbbc, for GNU/Linux 3.2.0, stripped

This machine is another 64bit Debian:

uname -a
Linux Duilio 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux

I have created the same type of error, a missing semicolon at the end of an line, chosen because conveniently was line 70, but seems that the error isn’t exactly rounded to one-off:

inform -s turkthings.inf coo.z5
Inform 6.41 for Linux (22nd July 2022)
line 137: Error:  Expected comma but found rfalse
>   rfalse

the involved code:

    before [;
	  Go: if (noun == e_obj) {
		print 
			"You follow the familiar windings of the path, to the"
		rfalse; 
			}
 ],

The line 70, where I have recreated the bug condition, is the "You follow… but was reported the next line, 71; and was reported as line 137, not, as surmised by Stefan, between 139 (70 times 2 off down by 1) and 143 (71 times 2 off up by 1)

Best regards from Italy,
dott. Piergiorgio.

zarf · May 13, 2023, 4:00pm

Is it possible that the .inf file has DOS linebreaks (\r\n) rather than Unix linebreaks (\n)?

8bit_era · May 13, 2023, 4:58pm

You know what? That’s actually it. I am using VSCode for development and for some reason this and the other file I used for testing had CRLF linebreaks which is the Windows standard. I changed this to LF now and it works again.

My files were definitely created in a Unix environment but my Debian is running inside WSL2 on a Windows machine and when I am developing my IF in Debian, I am using VSCode on the Windows machine to remotely connect to the Debian console. It might be that it defaulted to CRLF when I created a copy of my template for my new game. And the Mac just kept the CRLF linebreaks when accessing my work in progress game via GIT.

zarf · May 13, 2023, 8:06pm

So this is a bit of a question.

The I6 compiler opens source files with fopen(filename, "r") – that is, text mode. It also converts \r to \n when parsing. So if you run it on Unix or Mac, a DOS line break gets converted to two line breaks and this throws off the error line count. On the up side, the archaic Mac line break (\r alone) is handled correctly.

If you run the compiler on Windows, fopen() converts DOS line breaks invisibly so the error line count isn’t wrong. But maybe that’s a problem? There’s no solid reason to assume that the source file is in the platform’s native text format. Your WSL setup demonstrates that.

Maybe the compiler should use fopen(filename, "rb") and do its own CRLF conversion? This would be a modest pain in the ass, but it would ensure that it behaved exactly the same on all machines.

(I’m pretty sure that the line break difference only affects error messages, not the compiled game. There’s nowhere in the language where a double newline behaves differently from a single newline. I think.) (Yes, I tested line breaks inside strings.)

8bit_era · May 13, 2023, 11:12pm

There is probably no easy answer to that. I have one, but there is a philosophical aspect to it. The Inform compiler is written in portable C, with the actual portability aspect being an important feature. In that context, the compiler should work as expected in whatever environment you throw it into.

As diverse as the host systems for the Inform compiler are, as diverse can the source files be. Varying encodings from DOS codepage 437 to Unicode and differing line breaks. And as you said, there is no reason to assume that the source file is in the native text format of the platform. I completely agree with you here. It’s not just about fancy setups like WSL, you also have to consider that the Inform compiler has a long history. The user may be on MacOS or Linux today, but may want to try a historic Inform source file that someone made in MS-DOS many moons ago.

There are good reasons for the compiler not to assume that the source file is in the native format of the platform. For usability reasons, it should understand its host as well as the source file. So doing your own CRLF conversion is probably the best way to ensure that the source format itself won’t be a problem in any scenario.

ramstrong · May 14, 2023, 4:07am

If it’s me, I’d just throw out a warning

CRLF detected! Convert to \n?

or whatever format it’s supposed to be.

Piergiorgio_d_errico · May 14, 2023, 11:36am

Harry, a warning on non-native EOL introduces the ulterior complexity of having to known the environment is running, (ex: #ifdef __UNIX__ ) at compile time; on top of it, having a warning asking user input, as in your example, is guaranteed to defeat the logic of the shell scripting, whose is at the base of the editor/compiler/debugger integration… but a compiler throwing a non-interactive warning about non-native EOL can be helpful to editor/compiler integration scripting, perhaps ?

Later I’ll test Zarf’s solution…

Best regards from Italy,
dott. Piergiorgio.

Piergiorgio_d_errico · May 14, 2023, 11:59am

tested Zarf’s solution. The EOL issue is confirmed, but still there’s an off-by-2:

The test setup is the same as above, the error is the same, missing ; in the line 70, error reported at the rfalse in the line just below, line 71, whose is reported as line 69:

inform -s turkmess.inf coo.z5
Inform 6.41 for Linux (22nd July 2022)
line 69: Error:  Expected comma but found rfalse
>   rfalse

I suspect that in counting lines, Inform 6 don’t count the !% lines at the very top of the source, whose indeed are two in this case:

      1 !% -~S
      2 !% $OMIT_UNUSED_ROUTINES=1

Best regards from Italy,
dott. Piergiorgio.

EDIT: The plot thickens: Deleting the two directives lines the line count is still off by two:

inform -s turkmess.inf coo.z5
Inform 6.41 for Linux (22nd July 2022)
line 67: Error:  Expected comma but found rfalse
>   rfalse

65     before [;
     66           Go: if (noun == e_obj) {
     67                 print 
     68                         "You follow the familiar windings of the path, to the"
     69                 rfalse; ! <-- continue with the regular action
     70                         }
     71  ],

Now I’m officially at loss of the cause of the off-by-two, and I must point that the fencepost error is my bete noire, so I’ll stop here in investigating the remaining issue.

Best regards from a perplexed
Dott. Piergiorgio.

zarf · May 14, 2023, 2:10pm

If you can detect a CRLF, you can count it correctly, and then the user doesn’t need to change their file.

zarf · May 14, 2023, 2:10pm

Can you post a complete example of a file that shows this?

Piergiorgio_d_errico · May 15, 2023, 4:55pm

I’ll provide this later, because the test file I have used (because of the convenient line number…) is rather embarassing.

To explain in a graceful manner, here in Italy “cose turche” (turkish things) is a colloqualism meaning “strange, absurd, illogical things” so illogical, absurd that notionally can happens only… in turkey (the country astride the Mediterrannean and black sea, not the US bird & dish), one of the very few “non-PC” term in Italian common usage. So, the filename came because I indeed use this .inf for looking & messing/fooling upon strange bugs (admittely, a doubled and off-by-2 line count IS a strange bug, a “cosa turca” for the average Italian…); the issue is that usually the testing there often are themed around, well… ahem, perhaps highly questionable jokes on turkey and turks; the code around line 70 being not only convenient in line number, but one of the more presentable line… and I guess that a bowdlerised edition can defeat the scope of your request.

I’ll look around my …/inf6/messing directory and I hope to find a good (and PC) replacement…

My apologies (also to Turk people…) from Italy,
dott. Piergiorgio.

DavidG · May 15, 2023, 5:47pm

In the US, there’s a similar expression: “It’s Greek to me”

nephar · May 15, 2023, 6:22pm

Apology accepted.
Now I am wondering about the circumstances that gave rise to that “cose turche” expression.

8bit_era · May 15, 2023, 9:25pm

There also is a similar expression in German: “Das kommt mir spanisch vor”. Sincerely sorry to all the wonderful people in Spain

DavidG · May 16, 2023, 12:07am

Perhaps all languages have expressions for this concept that dump on some other language. At least all the ones I can speak/understand are like this.

zarf · May 16, 2023, 1:47am

Tracing that chain is a well-known linguistic hobby.

Chinese is a very popular dumping ground, for good reasons. Apparently in China they say “That’s Martian to me” or, per Wikipedia, “ghost language”? Which is great.

zarf · May 16, 2023, 1:56am

Anyway, getting back to the off-by-two error, I tried this:

[ Main val;

	switch (val) {
		1: print "Message 1";
			return;
		2: print
			"Message 2"
			return;
	}
];

Indeed it reports an error on line 6, which is 2: print.

Turns out, Inform is careful to report the error line number as where the offending statement starts. I think this is just out of caution. A syntax error at the start of a line could cause the compiler to go down a garden path of mistaken parsing, and not notice a problem until many tokens later. So, conservatively, the report goes back to the earliest the error could have been, which is right after the last successfully parsed statement.

In this case, the missing semicolon means that the print statement never ended. So the error is reported where that statement starts.

Warrigal · May 17, 2023, 8:32am

That might explain why I’ve been seeing some weird things lately. I normally expect the error to be reported where the error was detected, not where the error occurred. If this backing-up behaviour has always been there, I certainly haven’t noticed it before.

zarf · May 17, 2023, 1:19pm

It’s always been there, or at least since Inform 6.0.