How to use Unicode characters

Apologies if this is a naive question, I am quite new to TADS, and have never messed with charsets before.

I’ve been trying to figure out how to include a unicode character in my game, but all my attempts result in it simply displaying a question mark. I can insert some characters using the HTML notation, but only up to U+0080 (after this all characters just display as U+0080.) However, I understand that it should be possible to include any unicode character, but I haven’t figured out how to do it. Is there a simple example somewhere of how to include arbitrary unicode characters? I’m using TADS Workbench if that’s relevant.

1 Like

In the preamble at the top of the file, use

#charset "utf-8"

And make sure your source file is encoded correctly.

You can do this on a file by file basis.

2 Likes

The syntax for including Unicode characters in a string literal are documented in the TADS 3 System Manual.

Can you include a code sample of what you’re trying to do? Even if it’s just the string you’re attempting to print, that would help.

2 Likes

My source files are still us-ascii but I’ve used \uxx codes to achieve characters with accents and dieresis etc. (\u00e9 is an “e” with an accent mark)

2 Likes

I have the charset set to utf-8 and I confirmed that that is indeed how the file is encoded. I was trying to use the \uxxxx syntax before, but I’ve played around with it a little more. For example, simply evaluating

"\u00B6 \u00FC \u00FF \u0118 \u0126 \u266A"

will print the first three symbols correctly, the fourth it will print as a regular “E” when it should have a diacritic, and the last two just print question marks. The limit seems to be \u00FF. Is there some extra work that needs to be done to pass this limit, or is it a fundamental limitation of the charset or the compiler or something?

2 Likes

What interpreter are you testing on? Different interpreters might handle printed characters differently, though I cannot confirm this for Unicode characters, specifically. I do know that some interpreters can handle <li>-based lists, even if I doubt it’s technically something that (HTML-)TADS is meant to support.

Try displaying these characters with the QTADS interpreter, if you haven’t already. If QTADS prints them correctly, then it might be another difference between HTML and non-HTML interpreters, or might be something that QTADS specifically can handle.

If QTADS doesn’t print correctly, then it might be the compiler, because that could mean the characters are being altered before the compiler outputs. A lot of interpreters change how TADS games are printed by filtering/removing/replacing certain things, so there’s still a chance the compiler might actually be fine.

I’m only recommending testing this with QTADS because it also handles <li>, <h1>, and other data that i would normally expect to only work in a web browser, as opposed to a TADS interpreter, so it seems to be quite flexible with the data you throw at it, and is the best candidate to rule out interpreters as the issue here.

4 Likes

My reading comprehension suffers from lack of sleep, it seems. :grin: I assume this means you’re also using the standard HTML TADS interpreter. I would still recommend testing this on QTADS!

1 Like

Ah ha! Thanks for this! Yes, it is an interpreter issue. I was indeed using the standard HTML interpreter, but QTADS (and a few of the online interpreters I then tried) all seem to render the characters no problem.

3 Likes