Which Unicode characters are "safe" in Glulx?

Is there a sort of consensus on which non-ASCII Unicode characters have widespread (or even universal) support on interpreters?

I understand that the basic issue boils down to “sure, you can put it in your code, but if the font the 'terp is using doesn’t have that character, you get a question mark, hollow box, or other replacement character,” so I suppose what I’m really asking is which characters are known to exist in as many fonts as possible for many interpreters across as many operating systems as possible.

I have a complex machine I’m designing, with analog readouts intended to be incomprehensible to the PC, so I’m tempted to use astrological and alchemical symbols, musical notes, and the like, but I also want that not to break for individual players, so I’m curious how to balance these concerns.

1 Like

ASCII has been extended multiple times in many different ways. Depending on how backwards compatible you want to be, not even all ASCII characters are safe, but anything modern enough to run a Glulx terp should be fine.

I don’t think there’s any official survey of character availability, but guessing from historical knowledge, I’d expect the tiers to be:

3 Likes

You could make that work if you made a bound game with Lectrote and bundled it with relevant fonts. But it’s hard to see how it would be viable if you wanted to just distribute a blorb… I’d worry that half the commentary on the game would end up being people complaining about font issues.

1 Like

Wikipedia has a Unicode history list: Unicode - Wikipedia . MacOS and Windows have typically supported each Unicode release within a year.

1 Like

For interpreters though, if they’re a more traditional interpreter it really depends on what font they use. But if they’re a web based one (including Lectrote) then you pretty much don’t need to worry - the HTML engine will go looking and find an alternative font for any characters it doesn’t recognise. Unless it’s a really new Unicode character it should be able to find a font for almost all characters.

2 Likes

Thanks for the detailed input, everyone. Since I’m attached to the current blorb-based packaging decision, it’s not central to the plot of the story, and I’d hate for anyone to have font problems, I suppose I’ll stick with words instead of symbols.

Again, thanks, everyone.

I believe that native OS font rendering also goes looking for alternative fonts. However, the way the interpreter works is probably relevant. Gargoyle, in particular, uses an unusual rendering system (SDL graphics) so it might run into more problems.

1 Like

This is correct. Gargoyle renders fonts directly using FreeType, after finding the font files themselves using a platform-specific method. Adding a substitution feature would be great, but wouldn’t be straightforward.

1 Like

One possible example of that is that I could never get Scroll Thief to display all Unicode characters correctly in Gargoyle. E.g. when reading the illuminated scroll in the second room, most of them would come out as question marks. Lectrote, on the other hand, would run the game without any problems albeit a tiny bit slower.

I’m not saying Gargoyle can’t do it, just that it wasn’t obvious to me how.

It seems that the game tries to check if “fullwidth letters” and “script letters” are available before printing them, but apparently this isn’t foolproof?

1 Like

I didn’t see this question when you initially posted it, sorry! You’ve possibly been following the Gargoyle Unicode discussion here, and the quick answer is yes, this isn’t foolproof, as Gargoyle just lies and says it can print anything.

As noted in that other discussion, though, I’ve been working on font substitutions, which should largely alleviate this problem. With the current test code I’ve got, the illuminated scroll you mentioned now looks like this:

1 Like

Going back to OP’s question, one of the things ive been considering is; what unicode characters can be relied upon across terps. Which is a similar concept.

I think someone should suggest a standard.

For me, multi-byte char support is:

open and close double quotes.
open and close single quotes.
long hyphen
ellipsis.

Any other suggestions. ie for a bare minimum.?

Are you talking about a list of characters that authors can rely on, or a list of characters that interpreters should try to support?

If the former, you don’t get to pick. You have to go test all the interpreters (all the ones you care about) and find out what the answer is.

If the latter, the answer is “as many as possible”. It doesn’t really make sense to mandate curly quotes specifically. Any interpreter that can handle those can almost certainly handle the rest of the punctuation block (0x02??). Any interpreter that can’t, well, you writing a list won’t make it possible.

1 Like

I think the former and the latter are the same. ie those that authors could rely upon is the same as the set that terps try to support.

I dont agree that, somehow interpreters can handle the rest of the block. Unicode is so wide that it’s sensible to only load relevant gliphs into memory. And you also need the bold and italic variants too. And you need to check your fonts have the gliphs you require.

What happens when authors’ text includes curley quotes or appostrophes, for example? ( I get this a lot from authors using modern editors).

If these symbols don’t work, why not?

1 Like

You just marked my previous answer as the solution… but I wouldn’t say the same thing now that I did 2 years ago.

As of the latest release, 2023.1 Gargoyle now bundles unifont. So far as I know, that was the only major interpreter app that wouldn’t automatically draw upon system fonts.

It may be the case now that about the only people left who would be expected to have troubles with Unicode symbols are people deliberately running a minimal Linux and they know how to install a font if they want to. So… I finally think it’d be pretty reasonable to count on being able to use the Basic Multilingual Plane.

Of related interest, the (unscheduled) next release of Inform will be able to handle Unicode input (on glulx) and won’t impose an artificial limit on what characters can be output or used in texts.

3 Likes

Good new info. Thanks, Zed! I’ve marked your new response as the answer.

1 Like