I’m pretty sure this is a bug, though I can’t figure out where it comes from.
When compiling for the Z-machine, Dialog is fine outputting Unicode characters, but it won’t accept them in input. So it’s fine to have a word like there’s (with a fancy apostrophe) in output, but it can’t appear in anything that’s used for parsing (i.e. in a (collect words)).
However, it also complains if they appear in a closure. Here’s a minimal example:
(program entry point)
(query {there’s})
To cause the error, compile like so:
dialogc mwe.dg stdlib.dg -t z5
The result:
Error: Unsupported character U+2019 in dictionary word '@there’s'.
And now here’s the bizarre part. This bug disappears if you don’t include the standard library, even though no code from the standard library ever gets run.
My guess (knowing absolutely nothing about how the compiler works) is that the compiler just adds every words in closures to the dictionary as long as there is a (collect words), without trying to check if they can actually be collected?
It seems like a difficult/impossible problem to solve in general. Your example is simple so there’s definitely some improvement that could be made, but what if that query failed and the program proceded with the standard library code?
Or in an unrelated case, every variable could potentially contain a closure, and so on. So I guess the compiler simply took the easy path.
The way the Z-machine’s I/O works, there’s a 10-bit character encoding that’s somewhat configurable by the game (letting you put a certain number of non-ASCII characters into the charset), used for most input and output, and also a special opcode to print any 16-bit Unicode character, separate from the string-encoding system.
Currently Dialog doesn’t use any of that. If a non-ASCII character is output, it’s replaced with an ASCII equivalent (usually ?); if a non-ASCII character appears in a (collect words), the compiler rejects it.
Here’s the problem: if any closure ever appears inside a (collect words) or similar construction, Dialog considers every closure to be potentially used inside a (collect words), because internally, all closures are compiled together into a big routine.
And in the Standard Library, well, that’s always true, because (the full $) is used to parse disambiguation questions, and that uses closures in the list-printing mechanism.
That means this fails:
(program entry point)
*(repeat forever)
(line)
(get input $Words)
(collect $Obj)
(determine object $Obj)
*(object $Obj)
(from words)
*(dict $Obj)
(matching all of $Words)
(into $List)
$List
(query {ašdf})
(empty $Words) %% Fail and repeat if not empty
#apple
(dict *) apple red fruit
#pear
(dict *) (query {pear})
I’ve made a pull request to change this error to a warning. Now, it will instead say:
Warning: Unsupported character U+0161 in dictionary word ‘@ašdf’. This word will never be recognized by the parser.
So if the author can guarantee that the {ašdf} closure will never be used in parsing—which I can, in this example—then there’s nothing to worry about!