[I6] Word length in dictionary Zcode & Glulx

auraes · September 5, 2014, 9:10am

How can i know the length of a word in the dictionary (I6: Zcode or Glulx). DICT_WORD_SIZE in Zcode return 6 instead of 9.!% -v8 [ Main key; print DICT_WORD_SIZE, "^"; @read_char 1 -> key; ];

And what about Glulx?
http://www.eblong.com/zarf/glulx/technical.txt

Draconis · September 5, 2014, 1:19pm

In Glulx the dictionary resolution can be modified with DICT_WORD_SIZE. In the Z-machine (iirc) it’s always 9.

zarf · September 5, 2014, 2:37pm

In the Z-machine, DICT_WORD_SIZE is the number of bytes in the dictionary word. It’s always 6 (unless you go back to v4).

Z-machine encoding is variable-width. Those six bytes can store nine letters, but only four punctuation marks (if I remember this correctly). So “aliphatic” fits in a word but “moon-calf” does not.

(The Z-machine parser code never uses DICT_WORD_SIZE for anything, anyhow. The dictionary is handled by the interpreter directly.)

In Glulx, DICT_WORD_SIZE is treated slightly differently. It’s a number of characters, and characters are normally stored in the dictionary one-per-byte, so it’s also the number of bytes. But if you use the experimental setting DICT_CHAR_SIZE=4, then characters are stored in the dictionary one-per-word. So then a dict word has 4*DICT_WORD_SIZE bytes.

I know this is a little clumsy. It makes the library code simple, though.

(The standard I6 parser does not support DICT_CHAR_SIZE=4 yet.)

auraes · July 24, 2019, 7:49am

Those six bytes can store nine letters, but only four punctuation marks (if I remember this correctly).

And each accented character takes the place of four characters. There is not much place left, if you have an accented character in your word. So we don’t put accents in the words of the dictionary.

The source code of the games can be encoded in UTF-8, modern interpreters handle UTF-8 inputs and outputs well, but the Z-code dictionary remains limited to nine characters.
I am sure that David Griffith, when he translates these Inform 6.12 libraries into Spanish, with voices, tenses and á, é, í, ó, ú in words, will understand what I mean.
Since the dictionary is the interpreter’s job, is it complicated to increase his capacity?
If so, forget what I just said.

lft · July 24, 2019, 10:38am

You could define a new Alphabet Table (section 3.5 of the Z-machine standard). Since you never need to store uppercase letters in the dictionary anyway, those 26 codepoints could be used to encode accented lowercase letters using two slots instead of four. Meanwhile, uppercase letters in your string constants will occupy more space, but again the accented lowercase letters will occupy less, so the net effect is probably still a decrease in size.

Dannii · July 24, 2019, 11:01am

For reference, here’s how to define the alphabet table in I6: https://www.inform-fiction.org/manual/html/s36.html#p270

auraes · July 24, 2019, 12:01pm

Thanks.
French accent table:

!% -Cu

Zcharacter 'à';
Zcharacter 'â';
Zcharacter 'ä';
Zcharacter 'ç';
Zcharacter 'è';
Zcharacter 'é';
Zcharacter 'ê';
Zcharacter 'ë';
Zcharacter 'î';
Zcharacter 'ï';
Zcharacter 'ô';
Zcharacter 'ö';
Zcharacter 'ù';
Zcharacter 'û';
Zcharacter 'ü';
Zcharacter 'ÿ';
! Zcharacter 'æ';
! Zcharacter 'œ';

I need to find the right place in the libraries to put it.
The DM4 TABLE 2B : HIGHER ZSCII CHARACTER SET says that the cedilla for ç is ‘@,c’ but it’s ‘@cc’.

fredrik · July 30, 2019, 8:20am

The Zcharacter directive used in this way makes these characters available in ZSCII, but not part of the alphabet table. This means each of these characters use up four times as much space as the cheapest characters (normally lowercase a-z and space). You may want to create your own alphabet table to fix this.

You should perform all Zcharacter directives at the very beginning of the game, before any strings are declared, including Story and Headline. Otherwise, mayhem ensues.

For the Swedish translation of the library, I created a file called SweAlpha.h, to be included at the very start of the game source.

auraes · July 30, 2019, 8:46am

I put Zcharacter directive, in the libraries, at the beginning of the frenchU.h file equivalent to english.h, and it works. It’s better than putting it in the source code of the game.

auraes · July 30, 2019, 9:25am

It seems that there is a difference between the Zcharacter directive in libraries and the Zcharacter table in the source code. And it seems that for the Zcharacter directive, I only have rights to 10 accented characters. I’ll check it out.

auraes · July 30, 2019, 9:47am

I put accented letters in the name property of an object and displayed the dictionary:

àxxxxxxx
âxxxxxxx
äxxxxxxx
çxxxxxxx
èxxxxxxx
éxxxxxxx
êxxxxxxx
ëxxxxxxx
îxxxxxxx
ïxxxxxxx
ôxxxxxxx 
öxxxxxxx
ùxxxxxxx
ûxxxxx
üxxxxx
ÿxxxxx
æxxxxx
œxxxxx

It seems I can use 13 different accented characters with the zchar directive. Unless I’m talking nonsense.

13*2 = 26 : That’s exactly what Linus Åkesson seemed to be saying. I should have read it better.

Dannii · July 30, 2019, 11:39am

If I’m not misreading the docs, Inform will let you specify 26 characters to be encoded with a single Z-character, 49 to be encoded with 2 Z-characters, and the rest with 4 Z-characters. As this matters most for the dictionary, it makes sense for row 1 of the alphabet table to be only lower case, but it can be any lower case characters. For works in other languages it would pay to consider carefully the frequency distribution of the alphabet for the language. You could swap some of the less used non-accented characters in row 1 for other more frequently used accented characters.

So from this site, maybe a good alphabet table row 1 for French would be eastirnulodmcpévhgfbqjàxèê. Once you’ve determined which ones you want they can be listed in any order however, you don’t have to keep them in order of frequency.

fredrik · July 30, 2019, 12:03pm

For Swedish, I ended up putting the accented characters in the same positions as the characters I removed from the first row. In this way, interpreters which can’t handle custom alphabet tables will still display something that can be read (kinda), and a few strings that are encoded so early that you can’t stop them from getting it wrong (like “Class” and “Object” IIRC) will still display correctly or almost correctly.

This is what I ended up using:

Zcharacter "abcdefghijklmnopårstuväxyö"
           "ABCDEFGHIJKLMNOPÅRSTUVÄXYÖ"
              "012345.,!?'/-:()wqzWQZé";

fredrik · July 30, 2019, 12:05pm

Zcharacter can do a few different things, using different syntax.

This is a good read: http://www.firthworks.com/roger/informfaq/aa20.html

auraes · January 17, 2021, 10:18am

Word length in the dictionary for the z3 version.

The length of the words max for v3 is 6 characters, but they seem to use 7 characters in the dictionary.
I use the No__Dword() function from the standard library, that I modified to use it with v3, but for the result to be correct, the value 9 must be replaced by 7 and not by 6:

[ No__Dword n; return HDR_DICTIONARY-->0 + 7 + 9*n; ];
v3:
[ No__Dword n; return HDR_DICTIONARY-->0 + 7 + 7*n; ];

zarf · January 17, 2021, 8:59pm

The length of a dictionary entry is 7 or 9 bytes(*). Part of that is the word text (6 or 9 Z-characters, but they fit into fewer bytes than that). The rest is various flags.

(* Actually the Z-machine allows them to be longer, but Inform always generates 7 bytes for v3 games and 9 for v4+.)