Does modern Inform 7 handle unicode in object names?

Station α is a room.

Welcome
An Interactive Fiction
Release 1 / Serial number 230212 / Inform 7 build 6M62 (I6/v6.34 lib 6/12N) SD

Station

>

I can solve this by setting the printed name, of course, but I would really like to have everything.

Sadly no, Inform 7 does not support Unicode outside of quoted strings. You’re limited to Latin-1.

This is something that I hope will change now that it’s open-source, though.

3 Likes

I’m surprised that it doesn’t report an error when you try this. The non-Latin1 characters seem to be entirely ignored in the source.

you can use “the printed name of […]is”:

stationB is a room. The printed name of stationB is "Station ß".

(DISCLAIMER: I’m typing from memory)

Best regards from Italy,
dott. Piergiorgio.

Possibly, this is what OP was referring to when he wrote “I can solve this by setting the printed name, of course” :wink:

2 Likes

It looks like characters above Latin-1 in identifiers are taken to be spaces that are then treated like any other whitespace in unquoted source: excess leading or trailing whitespace is ignored; multiple spaces are converted to a single space.

This compiles:

lab is a room.

αbααx is a number that varies.

when play begins: 
now αbααx  is 12;  
now bααx  is 13;  
now αbαx  is 14;
[ showme αbx; ]
[ showme bx; ]
showme bαx;
showme b x;

and produces:

“b x” = number: 14

“b x” = number: 14

The commented out lines cause compiler errors if they’re uncommented.

1 Like

Yeah, that’s not great.

It’s reasonable for the compiler to treat non-printing characters as whitespace. (Ask me about U+202D: LEFT-TO-RIGHT OVERRIDE creeping into text.) Even Unicode punctuation and symbols could be ignored. But when letters and digits are involved, I’d like the compiler to either accept them or explain that they’re not accepted.

1 Like

Outside of identifiers, it’s weirder: some characters are converted into Latin-1 characters (like alpha to a and beta to b) and thus cause compilation errors, but others are treated per my description of unicode in identifiers. So adding this, it still compiles and prints 16.

let q be b x plus ☃︎2⚗;
say q;

On the bright side: more opportunities to drop unicode snowmen in your code!

(Edited: Now I can’t reproduce the converted to Latin-1 case, but I swear I saw a compilation fail where the error message showed the α → A, B → B conversion.)
(Further edited: never mind, now I think I was just confused.)

Looks like if a string of above-Latin-1 unicode characters are space-separated, they can become parts of identifier names that aren’t objects.

This:

xyz ß is a number variable.

to say ß (t - a text): say "ß [t].";

when play begins:
  now xyz ß is 3 + xyz ß;
  say ß "[xyz ß]";

outputs “ß 3” but if you add now xyz is 4 + xyz;, it won’t compile.