Extract from an Inform 7 game all the words it can understand?

I’m using a speech recognition layer implemented outside Inform 7 to feed input into my Inform 7 game. I’d like to bias my speech recognition algorithm towards the words that my game can understand. Is there any way to just get a list of all those words?

1 Like

I don’t think there’s an officially supported way to do this. However, you could try applying the following regular expression on the generated Inform 6 source:

(?<!!.*)'(.[^&<>=|,.+'-\s]+)'(?!.*")

That should give you all the single-quoted dictionary words (like 'this'), which are all the ones the parser can even attempt to understand.

1 Like

In theory, one should be able to add:

Include (- Message "----------"; Trace dictionary; -) after "Output.i6t".

and then see the complete list of words in the story’s dictionary as part of compilation output on the Results/Progress tab.

In practice, this seems to work when compiling for Z-machine, but not for Glulx (though Glulx output does tell how many words are in the dictionary and is formatted in a way that suggests it is intended to output the list of words).

Note that this is shows the dictionary entries as encoded, which means words can be truncated, especially those including special characters.

2 Likes

The Glulx version of “Trace dictionary” only got implemented this year, so it’s not available in the current I7 release.

2 Likes

In the meantime, you can add the following to your game (Z-Machine or Glulx):

Include (-

#Ifdef TARGET_ZCODE;

[ ListAllDictWords    i wsc da des dec;
    da = HDR_DICTIONARY-->0;
    wsc = da->0; ! word separator count
    des = (da+wsc+1)->0; ! dictionary entry size (in bytes)
    dec = (da+wsc+2)-->0; ! dictionary entry count

    print "Words recognized by this story (", dec, "):^^";
    for (i = da+wsc+4: i<da+wsc+4+(des*dec): i=i+des)
    print "    ", (address) i, "^";

    new_line; new_line;
];

#Endif;

Ifdef TARGET_GLULX;

[ ListAllDictWords    da dec ce;
    da = #dictionary_table;
    dec = da-->0;

    print "Words recognized by this story (", dec, "):^^";
    for (ce = da+WORDSIZE: dec>0: ce=ce+1+DICT_WORD_SIZE+2+2+2) {
         print "    ", (address) ce, "^";
    dec--;
    }

    new_line; new_line;
];

#Endif;

-).

To list all known dictionary words:
    (- ListAllDictWords(); -).

When play begins: [or set up your own debugging verb]
    list all known dictionary words.
4 Likes

I suspect that if you’re tuning speech recognition, you’re going to want to divide this list up further by hand. You’ve got

  • Words that you expect the player to use
  • Words that you threw in because the player might use them (off-the-wall synonyms); you don’t want them getting in the way of primary vocabulary
  • Words that I7 throws into every game which might not be relevant. (Every number from “one” to “thirty”, for example. “Lit”, “lighted”, and “unlit”.)
2 Likes

I’m now up to two extensions “by Otis T Dog” in my external dir (the other being Unavailable Things).

Thanks @ArdiMaster, that regexp could well be useful, but @otistdog wow you nailed it, thanks!

I was a little pleased to see my game understands more than 1,000 words.

This is totally a good idea. A fair amount of work but probably worth doing.