Is the world ready for a new Inform6 grammar format?

zarf · March 5, 2024, 3:33pm

I think that if you want to use this feature at all, you should never set the meta flag on the verb. That’s a fallback to prevent surprises in older game code.

Thus, the negative flag isn’t necessary.

heasm66 · March 6, 2024, 2:04pm

Funny thing… Theoretically I calculated that GV3 would save 648 bytes on my test case. In reality it saved… tada!.. 648 bytes.

There’s a working branch (first effort) of Inform6 with GV3 here. At a first glance it looks right but I don’t have a library yet that can run it. And for those that are worried it still produces GV2 that are identical to GV2 in Inform version 6.42.

heasm66 · March 6, 2024, 2:49pm

The current specification is:

GV3 is a variant of GV2 with a more compact data structure.  GV3 only use
2 bytes for each token and removes the need for the ENDIT marker.  In GV3 
an individual grammar table has the format:

    <number of grammar lines>          1 byte

followed by that many grammar lines.  A grammar line have the form:

    <action number>  <token 1> ... <token N>
    ----2 bytes----  -2 bytes-     -2 bytes-

The action number is actually contained in the bottom 9 bits of the word
given first: the top five contains the number of tokens in this grammar 
line, which leaves.

    action_number & $400

as a bit meaning "reverse the order of the first and second parameters
if this action is to be chosen", and

    action_number & $200

as a bit meaning "this is a meta action".

There can be anything from 0 to 31 tokens, and each occupies two bytes, 
arranged as:

    <token type>   <token data>
    -- byte ----   --- byte ---

Token type bytes are divided into the top two bits, the next two and the
bottom four.

The "next two bits" are used to indicate alternatives.  In a sequence of
tokens

    T1 / T2 / T3 / ... / Tn

then T1 will have $$10 in its "next two bits", and each of T2 to Tn will
have $$01.  Tokens not inside lists of alternatives always have $00.  (Note
that at present only prepositions are allowed as alternatives, but the
format is designed to open the possibility of extending this to all tokens.)

The bottom four are the "type" of the token.  The top two indicate what kind
of data is contained in the token data.  Strictly speaking this could be
deduced from the bottom six bits, but it's convenient for making backpatching
GV3 tables a simple matter within the compiler.

    Type  Means                       Data contains              Top bits
    0     illegal (never compiled)
    1     elementary token            0   "noun"                 00
                                      1   "held"
                                      2   "multi"
                                      3   "multiheld"
                                      4   "multiexcept"
                                      5   "multiinside"
                                      6   "creature"
                                      7   "special"
                                      8   "number"
                                      9   "topic"
    2     'preposition'               adjective number         	 01
    3     noun = Routine              parsing-routine-number     10
    4     attribute                   attribute number           00
    5     scope = Routine             parsing-routine-number     10
    6     Routine                     parsing-routine-number     10

GV3 identify a particular preposition or parsing-routine using a numbering system.  
GV3 numbers parsing-routines upwards from 0 to 255, in order of first use.  
A separate table translates these into routine packed addresses: the 
"preactions" table.  The preactions table is a simple --> array.

Prepositions are also identified by their "adjective number".  Adjective
numbers count downwards from 0 to 255, in order of first use.  They are 
translated back into dictionary words using the "adjectives table".

The adjectives table starts with two bytes containing the number of
"adjectives" in the table. Each "adjective" entry then are two bytes:

    <dictionary address of word>
    ----2 bytes-----------------

The constant #adjectives_table refer to this table.


As in GV2, fake actions in GV3 are numbered from 4096 upwards.


Note that although GV3 reintroduces the preaction and adjective table,
the omission of the ENDIT marker and two byte tokens instead of three 
byte, should produce a more economical grammar table.

Comparison table between the different grammar versions:

                                        Limit in: 
										GV1    GV2          GV3

    Prepositions per game               76     unlimited    256
    Parsing routines (general ones,
       noun= filters, scope= routines
       all put together) per game       32     unlimited    256
    Tokens per grammar line             6      unlimited    31
    Actions per game                    256    1024         512
    Inform verbs per game               256    256          256

A meta-flag on a verb propagates down the flag to each individual grammar line but not the other way around.
No meta-flag on grammar lines for glulx, GV2.
No negative flags.
No reverse-flag in verb definition.

I’m currently pondering two modificatiomns to the format…

Repurpose top bits for preposition and parsing-routine and use them for numbering, setting the max number of each to 1024.
Only use 4 bits for number of tokens (limit them to max 15) and leave room for a, yet undetermined, bit to use as a grammar line flag.

zarf · March 6, 2024, 3:57pm

Neat! I will take a look at the code when I have a chance.

Only use 4 bits for number of tokens (limit them to max 15) and leave room for a, yet undetermined, bit to use as a grammar line flag.

Hadean Lands goes up to 8-token grammar lines:

Verb ‘look’ * ‘inside’ / ‘in’ / ‘into’ / ‘within’ / ‘through’ noun ‘at’ noun

15 tokens would be a lot but it’s not unimaginable.

zarf · March 7, 2024, 5:05am

This looks good at first glance. I haven’t done any testing, other than to verify that GV1/2 games are unaffected.

I have a slightly modified branch here: GitHub - erkyrath/Inform6 at grammar_version_3

I think you need to copy the meta flag in the extend_verb() case. See this commit: https://github.com/erkyrath/Inform6/commit/f1fbc0a9e020ff939238c0695812879ffa12236d

I also dropped the total -= 2 * no_adjectives bit in rough_size_of_paged_memory_z(), since that is supposed to be an overestimate.

In make_adjective_v3(), I see you just call dictionary_add() rather than going through the dictionary_prepare() check. What was your reasoning there?

heasm66 · March 7, 2024, 9:54am

The thinking around make_adjective_v3 was that the code that makes/retrieves the dictionary address and store it in the token_data in GV2 only make a call to dictionary_add:

                 bytecode = 0x42;
                 if (grammar_version_number == 3)
                     wordcode = make_adjective_v3(token_text);
                 else
                     wordcode = dictionary_add(token_text, PREP_DFLAG, 0, 0);

The only difference as I see it is that make_adjective_v3 store the value in the adjectives_table instead of the token_data.

Am I missing something?

You are correct about the meta flag when extending a verb, I missed that.

I have a moodified library 6.12.6 that handles GV3. I need to test it a bit further but it only seem to require some small changes to the routines AnalyzeToken and UnpackGrammarLine in parser.h to work.

heasm66 · March 7, 2024, 9:58am

changes to AnalyzeToken and UnpackGrammarLine:

#Iftrue (Grammar__Version == 3);

[ AnalyseToken token;
    found_ttype = (token->0) & $$1111;
    found_tdata = (token->1);
	if (found_ttype == PREPOSITION_TT)
		found_tdata = #adjectives_table-->found_tdata;
	if (found_ttype == ROUTINE_FILTER_TT or GPR_TT or SCOPE_TT)
		found_tdata = #preactions_table-->found_tdata;
];

[ UnpackGrammarLine line_address i tokens;
    for (i=0 : i<32 : i++) {
        line_token-->i = ENDIT_TOKEN;
        line_ttype-->i = ELEMENTARY_TT;
        line_tdata-->i = ENDIT_TOKEN;
    }
    action_to_be = 256*(line_address->0) + line_address->1;
    action_reversed = ((action_to_be & $400) ~= 0);
    meta = ((action_to_be & $200) ~= 0);
	tokens = ((action_to_be & $f800) / 2048);
    action_to_be = action_to_be & $1ff;
    params_wanted = 0;
    for (i=0 : i<tokens : i++) {
        line_address = line_address + 2;
        line_token-->i = line_address;
        AnalyseToken(line_address);
        if (found_ttype ~= PREPOSITION_TT) params_wanted++;
        line_ttype-->i = found_ttype;
        line_tdata-->i = found_tdata;
    }
    return line_address + 2;
];

#Endif; ! Grammar__Version 3

heasm66 · March 7, 2024, 12:39pm

I also changed version.h

System_file;

Constant LibSerial       "240307";
Constant LibRelease      "6.12.7";
Constant LIBRARY_VERSION  612;

! This constant is defined by the compiler to GV1 (Z-code) or GV2 (Glulx).
! This library defaults to GV2 if nothing else is explicitly defined before.
! If GV1 is really desired it has to be changed here, otherwise GV3 can be
! defined before the inclusion of "parser.h".
#Iftrue (Grammar__Version == 1);
Constant Grammar__Version 2;
#Endif;

Is it only a historical artefact that the compiler still sets the Grammar__Version to GV1 for Z-code if the constant is not defined in the story-file or the library?

zarf · March 7, 2024, 4:06pm

Okay. I was misled by the partial parallelism between make_adjective() and make_adjective_v3(). I might rearrange that code to be clearer.

Yes, but we’re keeping historical artifacts unless there’s a very strong reason to change them.

heasm66 · March 8, 2024, 1:08pm

I played Curses (more complex commands than in Adventure) in both GV2 and in GV3 using a walkthrough and compared the transcripts, and everything looks alright. Source and binaries are here.

I got a new idea of a small change that may be helpful. By creating a new null-token (token_type = 0, currently illegal) and allowing it to be used in preposition-chains.

Example (from Curses, maybe a bit extreme):

Verb 'wash'
    * 'my' 'mouth' 'with' held                          -> Wash
    * 'my' 'mouth' 'out' 'with' held                    -> Wash
    * 'mouth' 'with' held                               -> Wash
    * 'mouth' 'out' 'with' held                         -> Wash;

In GV2 this costs 60 bytes (4 actions * 3 + 16 tokens * 3), currently in GV3 it costs up to 48 bytes (4 actions * 2 + 16 tokens * 2 + 4 prepositions * 2). GV3 cost is probably a bit lower because the prepositions are likely reused in multiple actions on other verbs.

This could be replaced with:

Verb 'wash'
    * 'my'/null 'mouth' 'out'/null 'with' held          -> Wash;

or maybe:

Verb 'wash'
    * 'my'/'' 'mouth' 'out'/'' 'with' held              -> Wash;

The cost will now only be up to 24 bytes (1 action * 2 + 7 tokens * 2 + 4 prepositions * 2).

Worthwhile?

zarf · March 8, 2024, 3:57pm

I’m not sure. You’d have to add in the cost of the library code needed to deal with that possibility. PrepositionChain() would have to return a new value meaning “word didn’t match, but this preposition is optional so keep going”; then the caller would have to deal with that.

On the compiler side, '' is currently invalid at the lexer level and null is a library symbol. It would take work either way. You could use '//', which is currently legal but useless, but then the verb declaration looks like line noise.

I dunno. Adding this feels like a lot of complication for both the parser and the compiler.

heasm66 · March 8, 2024, 4:31pm

Hmm, // or nothing could be an option…

I’ll have look at the library how complicated it would be.

otistdog · March 8, 2024, 5:12pm

I only sniff around the edges of compiler stuff, but I’m observing this thread with interest. May I ask why there is a desire to have the meta feature be part of the grammar at all?

I understand that it’s historically attached to the Verb in I6, and that moving it to the grammar line level would help prevent some ambiguity, but why not let setting this flag be handled solely by action routines? Isn’t that the true place where the logic of the system should be deciding whether an action is in-world or out-of-world?

zarf · March 8, 2024, 5:24pm

If a new library version wants to work this way, that’s great. But we have to support backwards compatibility as much as possible. Even in a new grammar version, old code may want to make only the minimum possible change.

fredrik · March 12, 2024, 5:18pm

One reason is that putting it in the grammar line should make story files smaller, which is perfectly in line with Henrik’s reason for proposing this new grammar version.

If there was a nice way to attach this flag to actions or action routines, that would perhaps be even (slightly) better, but I can’t see a good way to do that. And the grammar line level is definitely an improvement over the verb level.

zarf · March 13, 2024, 12:35am

So here’s what I think overall:

I’d like to separate the meta-flag-in-grammar-line idea from the GV3 idea. GV3 works as described above.

The meta-flag feature would be optional (and opt-in), with a compiler setting like $GRAMMAR_META_FLAG=1.

The meta flag would go in the grammar line:

action_number & $800 in Z-code GV2
action_number & $800 in Z-code GV3
flag & $02 in Glulx GV2

Note that this leaves four bits for token-count in GV3. As I said, this might be a wee bit cramped. But then this is an optional feature. You’d be able to specify Grammar__Version=3, $GRAMMAR_META_FLAG=0 and get five bits for the token-count. It’s a tradeoff, like all of these byte-squishing options.

(The library would need to check #ifdef GRAMMAR_META_FLAG to handle both cases.)

The null-preposition idea (token-type 0) is still up in the air as far as I’m concerned. But it can also be separate. It too works equally well for GV2 and GV3.

heasm66 · March 13, 2024, 9:33am

I agree with everything @zarf writes. Using $800 is better than $200 because it opens up to use the meta-flag on gramar lines for GV2 too, 15 tokens are probably enough for most cases and there’s always the option to revert to GV2 or not using the new meta-flag.

I made some preliminary research on the null-preposition but have not reached a working solution. But it is as said, a seperate issue.

On the subject of seperate issues and optional features, I got a few ideas…

$OMIT_PAGE_PADDING option to not pad the story-file with $00-bytes to an even 512, but instead only emit the actual story-bytes.
$ZCODE_GRAMMAR_VERSION an option to specify the default (currently 1) Grammar__Version constant on the command line (not limited to z-code per se, but Glulx only supports GV2 at the moment, as I understand it).
$IGNORE_INLINE_OPTIONS only use options specified at command line, ignore definitions (inside story-file) of DEBUG, Grammar__Version, all !% definitions, and maybe a few more. As of now I think last defined gets precendence, but in my mind at least it is a bit unclear what will happen when there’s a conflict.
When using a custom alphabet there’s no way to apply the alphabet on the first for objects (Class, Object, Routine and String), the desscriptions are always done in the default alphabet for these objects. Guess it’s because they are defined before a single line in the story-file is read. Not very important but it is a bit annoying.

(I see that the Grammar__Version already have an issue at the project page, but I’ll add the other with maybe some code suggestions…)

heasm66 · March 13, 2024, 6:33pm

For purely egoistic reasons (make it easier for UnZ, Infodump and similiar tools) I don’t tjink it’s necessary to allow switching between 4 or 5 bits for token-count. 15 tokens is enough for most cases, and should the need for more arise a grammar line could probably be split into two lines and in the worst case there’s always GV2.

(Do you want to create an issue on the project page of the meta-flag, or should I?)

zarf · March 13, 2024, 10:21pm

I did: Meta flag for grammar/action lines · Issue #278 · DavidKinder/Inform6 · GitHub

Note followup comment.

zarf · April 1, 2024, 12:19am

I have a new branch demonstrating the GV3 format.

GitHub - erkyrath/Inform6 at newgv3

This is based on your test branch, but I’ve reworked verbs.c quite a bit to support the $GRAMMAR_VERSION and $GRAMMAR_META_FLAG options. (Those are already committed.) So I had to reimport your changes.

Could you look it over and make sure everything still makes sense?