In the Z-machine, DICT_WORD_SIZE is the number of bytes in the dictionary word. It’s always 6 (unless you go back to v4).
Z-machine encoding is variable-width. Those six bytes can store nine letters, but only four punctuation marks (if I remember this correctly). So “aliphatic” fits in a word but “moon-calf” does not.
(The Z-machine parser code never uses DICT_WORD_SIZE for anything, anyhow. The dictionary is handled by the interpreter directly.)
In Glulx, DICT_WORD_SIZE is treated slightly differently. It’s a number of characters, and characters are normally stored in the dictionary one-per-byte, so it’s also the number of bytes. But if you use the experimental setting DICT_CHAR_SIZE=4, then characters are stored in the dictionary one-per-word. So then a dict word has 4*DICT_WORD_SIZE bytes.
I know this is a little clumsy. It makes the library code simple, though.
(The standard I6 parser does not support DICT_CHAR_SIZE=4 yet.)
Those six bytes can store nine letters, but only four punctuation marks (if I remember this correctly).
And each accented character takes the place of four characters. There is not much place left, if you have an accented character in your word. So we don’t put accents in the words of the dictionary.
The source code of the games can be encoded in UTF-8, modern interpreters handle UTF-8 inputs and outputs well, but the Z-code dictionary remains limited to nine characters.
I am sure that David Griffith, when he translates these Inform 6.12 libraries into Spanish, with voices, tenses and á, é, í, ó, ú in words, will understand what I mean.
Since the dictionary is the interpreter’s job, is it complicated to increase his capacity?
If so, forget what I just said.
You could define a new Alphabet Table (section 3.5 of the Z-machine standard). Since you never need to store uppercase letters in the dictionary anyway, those 26 codepoints could be used to encode accented lowercase letters using two slots instead of four. Meanwhile, uppercase letters in your string constants will occupy more space, but again the accented lowercase letters will occupy less, so the net effect is probably still a decrease in size.
The Zcharacter directive used in this way makes these characters available in ZSCII, but not part of the alphabet table. This means each of these characters use up four times as much space as the cheapest characters (normally lowercase a-z and space). You may want to create your own alphabet table to fix this.
You should perform all Zcharacter directives at the very beginning of the game, before any strings are declared, including Story and Headline. Otherwise, mayhem ensues.
For the Swedish translation of the library, I created a file called SweAlpha.h, to be included at the very start of the game source.
It seems that there is a difference between the Zcharacter directive in libraries and the Zcharacter table in the source code. And it seems that for the Zcharacter directive, I only have rights to 10 accented characters. I’ll check it out.
If I’m not misreading the docs, Inform will let you specify 26 characters to be encoded with a single Z-character, 49 to be encoded with 2 Z-characters, and the rest with 4 Z-characters. As this matters most for the dictionary, it makes sense for row 1 of the alphabet table to be only lower case, but it can be any lower case characters. For works in other languages it would pay to consider carefully the frequency distribution of the alphabet for the language. You could swap some of the less used non-accented characters in row 1 for other more frequently used accented characters.
So from this site, maybe a good alphabet table row 1 for French would be eastirnulodmcpévhgfbqjàxèê. Once you’ve determined which ones you want they can be listed in any order however, you don’t have to keep them in order of frequency.
For Swedish, I ended up putting the accented characters in the same positions as the characters I removed from the first row. In this way, interpreters which can’t handle custom alphabet tables will still display something that can be read (kinda), and a few strings that are encoded so early that you can’t stop them from getting it wrong (like “Class” and “Object” IIRC) will still display correctly or almost correctly.
The length of the words max for v3 is 6 characters, but they seem to use 7 characters in the dictionary.
I use the No__Dword() function from the standard library, that I modified to use it with v3, but for the result to be correct, the value 9 must be replaced by 7 and not by 6: