Does modern Inform 7 handle unicode in object names?

rileypb · February 12, 2023, 8:27pm

Station α is a room.

Welcome
An Interactive Fiction
Release 1 / Serial number 230212 / Inform 7 build 6M62 (I6/v6.34 lib 6/12N) SD

Station

>

I can solve this by setting the printed name, of course, but I would really like to have everything.

Draconis · February 12, 2023, 8:32pm

Sadly no, Inform 7 does not support Unicode outside of quoted strings. You’re limited to Latin-1.

This is something that I hope will change now that it’s open-source, though.

zarf · February 13, 2023, 7:06am

I’m surprised that it doesn’t report an error when you try this. The non-Latin1 characters seem to be entirely ignored in the source.

Piergiorgio_d_errico · February 13, 2023, 10:46am

you can use “the printed name of […]is”:

stationB is a room. The printed name of stationB is "Station ß".

(DISCLAIMER: I’m typing from memory)

Best regards from Italy,
dott. Piergiorgio.

fredrik · February 13, 2023, 11:40am

Possibly, this is what OP was referring to when he wrote “I can solve this by setting the printed name, of course”

Zed · February 13, 2023, 9:14pm

It looks like characters above Latin-1 in identifiers are taken to be spaces that are then treated like any other whitespace in unquoted source: excess leading or trailing whitespace is ignored; multiple spaces are converted to a single space.

This compiles:

lab is a room.

αbααx is a number that varies.

when play begins: 
now αbααx  is 12;  
now bααx  is 13;  
now αbαx  is 14;
[ showme αbx; ]
[ showme bx; ]
showme bαx;
showme b x;

and produces:

“b x” = number: 14

“b x” = number: 14

The commented out lines cause compiler errors if they’re uncommented.

zarf · February 13, 2023, 9:22pm

Yeah, that’s not great.

It’s reasonable for the compiler to treat non-printing characters as whitespace. (Ask me about U+202D: LEFT-TO-RIGHT OVERRIDE creeping into text.) Even Unicode punctuation and symbols could be ignored. But when letters and digits are involved, I’d like the compiler to either accept them or explain that they’re not accepted.

Zed · February 13, 2023, 9:25pm

Outside of identifiers, it’s weirder: some characters are converted into Latin-1 characters (like alpha to a and beta to b) and thus cause compilation errors, but others are treated per my description of unicode in identifiers. So adding this, it still compiles and prints 16.

let q be b x plus ☃︎2⚗;
say q;

On the bright side: more opportunities to drop unicode snowmen in your code!

(Edited: Now I can’t reproduce the converted to Latin-1 case, but I swear I saw a compilation fail where the error message showed the α → A, B → B conversion.)
(Further edited: never mind, now I think I was just confused.)

Zed · February 13, 2023, 11:12pm

Looks like if a string of above-Latin-1 unicode characters are space-separated, they can become parts of identifier names that aren’t objects.

This:

xyz ß is a number variable.

to say ß (t - a text): say "ß [t].";

when play begins:
  now xyz ß is 3 + xyz ß;
  say ß "[xyz ß]";

outputs “ß 3” but if you add now xyz is 4 + xyz;, it won’t compile.