Truncation of long words in input and BlkValueWrite error

Hi, it hasn’t anything per se to do with player input (as you can see from the minimal example).

tldr: set a text variable to “” before setting it equal to a word extracted from a longer text.

OK, deep breaths, here we go…

The I6 template function the bug arises in - TEXT_TY_BlobAccessI() - is a moderately complex one implementing a finite state machine that provides various functions related to counting/extracting/replacing parts of a text. TEXT_TY is shorthand for I7’s text type, ‘blobs’ is how the template refers to smaller parts of one of I7’s dynamically allocated block values, of which texts are an example. Other examples of block values include lists and stored actions. In the case of texts, blobs can for example be characters, words, punctuated words, unpunctuated words, lines or paragraphs- ideas manifest in the various phrases used in I7 to search and modify texts.

In your original code and my minimal example, TEXT_TY_BlobAccessI() is (indirectly) called in order to extract an enumerated word from a longer text. In your code, the longer text is derived from the player’s input. In mine it’s a simple local text variable.

The problem potentially arises when TEXT_TY_BlobAccessI() has to return a text value. This is supplied by writing to one of the parameters of the function, ctxt, a text block value. Simplifying things slightly, block values comprise two parts. The first is a short header (called the short block), whose address is directly referenced by the (for example) text variable. The short block contains information describing the block value and a reference to the address of the first of a doubly-linked list of data blocks (called long blocks), potentially scattered through memory, each consisting of a header referencing the address of the previous and next long block in the chain followed by (in this case text) data. Long blocks are dynamically allocated and of variable size and number, but their size in bytes (including header) is always a power of 2. The maximum data storage of a block value therefore comprises the sum of the sizes of its chain of long blocks, less the room taken up by their headers. The actual amount of data storage used may be a little less than the maximum, with padding from there to the end of the final long block in the chain. In the case of texts, the end of the data can be found by stepping through the data in each long block in the chain until reaching a zero. Ultimately then, block value texts are a storage format for dynamically- allocated zero-terminated strings.

For more complete information about block values, you can copy the BlockValues.i6t and Flex.i6t files to be found in the /Internal/I6T/ folder of your Inform 7 install directory. Open these copies in a text editor to reveal partly-annotated versions of the I6 template functions used to work with block values.

The above description illustrates the possibility that the same data (i.e. chain of long blocks) can in theory be referenced by two different short blocks, i.e. two different I7 text variables. e.g. ‘let commandWord be “frotz”; let magicWord be commandWord’ creates two text variables both now pointing to the same long block data- “frotz”. If one of these 2 variables is then changed, e.g. 'now commandWord is “xyzzy”, the new text ‘xyzzy’ can’t simply be written to the existing long block- otherwise now magicWord, which points to the same long block data will be “xyzzy” too, rather than remaining as “frotz”. Inform deals with this by keeping a count of how many variables are referencing a given chain of long block data. If Inform needs to change long block data referenced by two or more variables, it first makes a new copy of that data so that in this case there are now two long block copies of “frotz”- one pointed to by commandWord and the other pointed to by magicWord. This is called making commandWord ‘mutable’. It then overwrites the “frotz” pointed to by commandWord with “xyzzy”, so we end up with commandWord as “xyzzy” while magicWord remains as “frotz”.

To quote the BlockValues.i6t template, ‘Subtle and beautiful bugs can occur as a result of making a value mutable…’ This is an example of one of those bugs. At the start of the function, TEXT_TY_BlobAccessI() records the current maximum data storage available in the long block chain allocated to its return text variable parameter, ctxt, in a local variable- csize. When writing to ctxt, it keeps an eye on csize and if it realises it is going to end up needing more data storage than will fit, it dynamically reallocates more storage to ctxt’s long block chain to make room. This works fine, except in the case where ctxt’s long block chain is also referenced by another variable. In this case, as soon as TEXT_TY_BlobAccessI() tries to write to ctxt, the template code notices this and before writing, makes ctxt ‘mutable’ by making for ctxt its own copy of its long chain data. Unfortunately, although the long block created is sized to fit that data, it is not guaranteed to have exactly the same size and structure as the long block chain it was copied from. For example (to oversimplify), “frotz” would fit equally well within a block of 16 bytes or one of 32 bytes. Consequently, the maximum data storage capacity of ctxt may change when it is made ‘mutable’- but TEXT_TY_BlobAccessI() does not notice this and update csize accordingly. The end result is that when ctxt is made mutable, and in doing so its maximum storage capacity is less than it was before, TEXT_TY_BlobAccessI() does not notice if it begins to try to write beyond the end of ctxt’s reduced data storage capacity unless and until it tries to write beyond the limit previously defined by the original csize.

Your original example and my minimal case both create an edge case where this can happen. When commandWord is reused in the 2nd iteration of the repeat loop, the temporary I6 variable ctxt created as a parameter for TEXT_TY_BlobAccessI() is pointing to the same long block data as commandWord. After certain sequences of past-and-present ‘blobs’ ctxt is now pointing to a long block chain with significant ‘padding’ beyond the actual data, such that when ctxt is made ‘mutable’ the copy made of its long block data is ‘economically’ created with a smaller-capacity long block chain- containing the same data but less ‘padding’. If the new data being written to ctxt is sufficiently longer than that in commandWord to overwrite not only the old data but also that shorter padding and beyond, the error will occur.

In my example, on the second iteration of the repeat loop ctxt starts with a maximum data storage of 44 characters in its long block, consisting of 22 text characters, a terminating zero, and 21 characters of ‘padding’. When TEXT_TY_BlobAccessI() tries to write the first character to ctxt, the template makes ctxt mutable and in doing so creates a long block with a maximum data storage of 28 characters- more than enough to hold the existing copy of the data, but nothing like enough to hold the 33 characters plus terminating zero TEXT_TY_BlobAccessI() is about to write. TEXT_TY_BlobAccessI() is oblivious to this, still thinking (because csize is 44) that there is plenty of room. As soon as the 29th character is written (to index 28 in the data block, which are zero-indexed) the errors start, and are repeated through to the terminating zero being written to index 33.

The error occurs due to a combination of slightly unusual circumstances in specific and possibly not-very-commonly-invoked template code. I’m not sure how widespread elsewhere in the template such edge cases might occur. However, for this circumstance there appears a simple fix: set a text variable to “” before making it equal to a word or other type of ‘blob’. This should ensure that ctxt does not start out with a dangerously extensively-padded long block chain that in being copied might end up being shortened.

e.g.

Lab is  a room.
 
When play begins:
	let C be "1234567890123456789012 123456789012345678901234567890123";
	let commandWord be "xyzzy";
	repeat with CWI running from 1 to the number of words in C:
		now commandWord is "";
		now  commandWord is word number CWI in C;
		say "Word [CWI] is:  [commandWord].";

EDIT: looking closer at this, the reason this simple example I7 fix works is not what I originally thought- what it does is make ctxt mutable by removing the reference that commandWord was holding from the 1st iteration of the loop to the same long block data pointed to by ctxt.

Without this fix, after the first iteration of the loop the I6 temporary variable ctxt (represented in the I7 code by ‘word number CWI in C’) and the I7 local variable commandWord are left both pointing to shared long block data, representing a copy of the 1st word of C -“1234567890123456789012”.

When the second iteration starts 'now commandWord is “” ’ points commandWord elsewhere, to “”, reducing the number of variables referencing ctxt’s long block data from 2 to 1, making ctxt mutable and thus not triggering the bug.

4 Likes