Inform6 "print"-like string concatentation

sarashinai · November 22, 2020, 10:43pm

@otistdog After integrating the WORDSIZE change (I’d seen that all over and not understood what it was for, much clearer now, thank you), I had to change the code to what’s shown below to make it work.

[ Initialise val;
   ! reduced test statements for readability

    val = BuildString("The answer is ", 42);

    print val-->0, "^"; ! still outputs 16, all good

    print (char) val->(0 + WORDSIZE), "^"; ! had to change the index to 0 as 2 printed the third character 'e'
];

[ StringOrArray str i;
    if (str ofclass String) {
        print (string) str;
    } else {
        for (i = 0: i < str-->0: i++) {
            print (char) str->(i + WORDSIZE);
        }
    }
];

The call print (StringOrArray)val; now works in Initialize as long as the changes to StringOrArray shown above are made.

There seems to an inconsistency somewhere regarding the -->0 value. Your explanation makes sense but then doesn’t seem to be the case in the code.

otistdog · November 22, 2020, 10:50pm

You got it. 2 is the value of WORDSIZE under Z-machine (which I had incorrectly assumed you were using, and which I used in my test). Arrays are zero-indexed. So, 0+WORDSIZE is the first byte of the string data in a buffer array (i.e. just past word -->0, which contains the length of the string contents), regardless of the virtual machine used. Note that --> is the word access operator; this also adjusts automatically to the virtual machine in use.

sarashinai · November 22, 2020, 10:59pm

Okay, so now that we’ve got all that sorted out, I hit the next string problem. I have this…

Array AppName string "HappyAppy";

That could very well be declared incorrectly but it compiles and seems to fit the DM. Now, that doesn’t respond well to -->0 (a huge number is the result) and my StringOrArray obviously doesn’t work because it gets caught in the loop for thousands of iterations, so…

sarashinai · November 22, 2020, 11:07pm

Brought istring.h into my project. I’m glad you’re clearing things up for me but why should I do all this work if someone has already done it and probably better. Is there an inclusion order trick because I’m getting these compile errors:

istring.h(128): Error:  Expected an opcode name but found read_char
istring.h(133): Error:  Expected an opcode name but found read_char
istring.h(144): Error:  Expected an opcode name but found output_stream
istring.h(146): Error:  Expected an opcode name but found output_stream

My includes are:

#include "infglk.h";
Include "parser";
Include "verblib";
Include "grammar";
Include "istring";

otistdog · November 22, 2020, 11:36pm

There is a difference in function between arrays subtypes string and buffer. The string subtype is covered in DM4 (p. 43, part of the section at https://inform-fiction.org/manual/html/s2.html#s2_4 online). The buffer subtype was introduced post-DM4, and it is covered in the release notes. Read those sections carefully – they will probably make more sense given the points discussed in this thread.

The number that you got reading -->0 of a string subtype array makes sense if you consider the differences between a string array and a buffer array, the difference between the -> and --> operators, and the differences in WORDSIZE (and maximum integer values) between Z-machine and Glulx.

Unfortunately, the istring.h library was written pre-Glulx, so it probably won’t work for your immediate purposes. (My apologies for the confusion; again, I had incorrectly assumed that you were compiling to Z-machine.) However, you may be able to gain some inspiration/insight by looking over its source code. (Just glancing at it now, it looks like it pretty much sticks to the kinds of techniques that you’ve already explored.) Everything in it should be adaptable to Glulx with a modicum of effort.

sarashinai · November 22, 2020, 11:49pm

@otistdog Well, here’s my first question. Am I looking at the wrong release notes because the description of buffer I found in the Directives section of Inform - Support - Inform63 makes it seem like they behave the same as string.

Array…buffer is similar to Array…string and Array…table.

Array array buffer N ;
Array array buffer expr1 expr2 … exprN ;
Array array buffer " string ";

This creates a hybrid array of the form used by string .print_to_array and the new library routine PrintToBuffer( ), in which the first word array -->0 contains N and the following N bytes contain the specified expression values or string characters.

A directive such as Array myArray buffer 100; initialises the first word myArray–>0 to 100 and the following 100 bytes myArray->WORDSIZE through myArray->(WORDSIZE+99) to zero. This behaviour is consistent with the handling of string and table arrays, as defined in the DM4.

Note that Strict mode generates a warning if you use “->” to address an array of words, or “–>” to address an array of bytes. No such warnings are generated when addressing a buffer array, making this a useful declaration for any data structure which mixes byte and word values.

otistdog · November 23, 2020, 4:39am

You have the right release notes. Remember what I said about “string arrays” not being the same as “strings”? Some preconceptions that you might have from other programming languages could be causing confusion. The text you cited is clear to me, but not long ago I would have found it quite confusing myself, so I’m sympathetic.

Since the nature of arrays in I6 has a lot of nuance compared to arrays in other languages (and since this may be helpful to other newcomers in the future), here’s a summary of what I’ve learned about them:

There are several different types of arrays in Inform 6. Expectations of content and usage differ by type.
There are two major types: “word arrays” and “byte arrays”.
Word arrays expect access via --> (the word access operator), and the compiler will warn if it sees access via -> (the byte access operator).
Byte arrays expect access via the -> operator, and the compiler will warn if it sees access via -->.
Arrays can be declared either empty (by specifying a number of entries), or pre-populated (by specifying a list of values). If providing a list, a single entry should be enclosed in square brackets, i.e “[” and “]”. (Use of square brackets for single-entry lists is scheduled to start being enforced in the 6.35 release.)
Aside from the two basic types, there are three specialized subtypes of arrays: table, string, and buffer. The buffer subtype was invented after DM4 was written; it is documented in the release notes for Inform version 6.31 (and should be included in the release notes of the current version).
A table array is a word array in which -->0 holds the number of entries (aside from the 0th); a run-time error (RTE) results if you try to write to the -->0 entry.
A string array is a byte array in which ->0 holds the number of entries (aside from the 0th); an RTE results if you try to write to the ->0 entry. Since ZSCII characters have one-byte codes, string arrays are useful for holding strings of characters, but they are not the same as strings. They can also be declared with a string literal, a convenience that can be misleading to newcomers. [Note: To save memory, string literal text is encoded in a compressed form called “packed text” on the Z-machine, which is why a byte array is not the same as a general string. It’s also why a string array can’t be printed with print (string) ... which is designed to work with packed text. When a string array is declared with a string literal, the characters in the string literal are stored as a byte array of ZSCII character values, not in their usual form of packed text.]
A buffer array is a little weird. It’s primarily a byte array, but the first “entry” (i.e. the first word’s worth of bytes) is designed to be read as a word. Thus, -->0 tells how many characters to expect, but individual characters should be accessed via ->. The Standard Library global variable WORDSIZE will be set to the number of bytes in one word on the virtual machine in use (either Z-machine or Glulx), so entry ->WORDSIZE will be the first stored character. For a buffer array, the compiler will not complain about mixed access via both --> and ->, nor will trying to modify the -->0 entry trigger an RTE. The particular structure of a buffer array is well-suited to use with the String class’s .print_to_array() method and also to use with the Standard Library’s PrintToBuffer() routine.

Here are some array declarations to illustrate typical variations:

Array my_word_array --> 20; ! entries -->0 through -->19 all contain zero
Array powers_of_three -> 3 9 27 81 243; ! entry ->0 contains 3, ->1 contains 9, ... ->4 contains 243
Array barnyard_animals --> cow pig goat; ! entries -->0 through -->2 contain objects

Array my_table table 20; ! entry -->0 contains 20, entries -->1 through -->20 contain zero
Array barnyard_animals table cow pig goat; ! entry -->0 contains 3, and entries -->1 to -->3 contain objects
Array single_entry_table table [15]; ! entry -->0 contains 1, and entry -->1 contains 15.

Array not_a_string string 30; ! entry ->0 contains 30, and entries ->1 to ->30 contain zero
Array really_just_bytes string "illusion"; ! entry ->0 contains 8, and entries -->1 to -->8 contains chars 'i' ... 'n'

Array my_buffer buffer 30; ! entry -->0 contains 30, and entries ->(0+WORDSIZE) to ->(29+WORDSIZE) contain zero

Here are some incorrect array declarations:

Array powers_of_three -> 3 9 27 81 243 729; ! not allowed because 729 is too large of a number to fit in one byte
Array keywords table 'plugh'; ! compiler treats word literal as high value and makes giant blank table

Hopefully, that all makes sense. (… and all of it is correct.) If not, then perhaps someone else will come along that can do a better job of clarifying this material.

sarashinai · November 23, 2020, 9:19am

Let’s assume that you’ve got every detail correct. This is a rather complicated system that, I suspect, is prone to user error. Are there reliable means for determining one type from the other?

E.g. is it possible to write a single function that can take any type of array as an argument, determine it’s makeup and know the size of the array either from -->0 / ->0 or from some other built-in, and then use the correct accessor (->, -->) , so all the internal data can be accessed correctly?

mirality · November 23, 2020, 12:27pm

No. Array accesses are conventions rather than actual types; they can be mixed if you’re careful (or not careful enough). And Inform isn’t really designed for generic programming anyway, it’s designed for restricted memory space; such that you write the minimum amount of code because you already know the correct types.

Similarly I don’t think there’s any way to tell whether a particular memory address is a dictionary value, an object, or an array – you just have to know that in advance. You can distinguish strings (actual strings, not string arrays) and routines from other kinds of value, though.

zarf · November 23, 2020, 5:33pm

sarashinai:

[ StringOrArray str i;
    if (str ofclass String) {
        print (string) str;
    } else {
        for (i = 2: i <= str-->0: i++) {
            print (char) str->i;
        }
    }
];

This code was quoted above, but I don’t recommend doing it this way. The value of x ofclass String is actually undefined when x is an array address. It’s not guaranteed to be false.

(This goes along with what mirality is saying above: “Array accesses are conventions rather than actual types.”)

If you want to print a buffer, use a routine that does only that.

Also, your handling of the array length still isn’t correct. This is why you were confused above about WORDSIZE.

This is the correct implementation:

[ PrintFromBuffer str i len;
	len = str-->0;
    for (i = WORDSIZE: i < WORDSIZE+len: i++) {
        print (char) str->i;
    }
];

Demonstration (works in Z-code/Glulx):

Array testbuf buffer 100;

[ TestExample;
	PrintToBuffer(testbuf, 100, "This is a starting message.^");

	PrintFromBuffer(testbuf);
];

With all of that said, it should be clear that string concatenation is a giant nuisance in I6. The VM, the language, and the library are all built around the assumption that you should never have to do it. It will always be simpler and more flexible to rearrange your code so that ordinary print statements and functions do all the work.

When I say “more flexible”, I mean that string concatenation has obvious limitations. You need to know the maximum length in advance. Also, an array buffer can’t store style changes; the output of PrintFromBuffer() cannot contain italics or boldface.

If you want to construct an output from several strings, it’s easiest to write a function that does the work and then call that. You could even call the function inside the print statement, using a dummy argument:

! This function could be arbitrarily complicated. The argument is ignored.
[ namefunc dummy;
	print "Name";
	print " of ";
	print "Person";
];

	! ... and then ...
	print "Hello, ", (namefunc) 0, ". I hope you are well.^";

zarf · November 23, 2020, 5:37pm

(You might want to write

! Don't do this:
print "Hello, ", namefunc(), ". I hope you are well.^";

…but this produces a spurious “1”. Use the slightly awkward format above. Or, of course, you could write several lines:

print "Hello, ";
namefunc();
print ". I hope you are well.^";

Whatever style you like.)

Draconis · November 24, 2020, 5:34am

Yeah, one lesson to take away here is that I6’s syntax is kind of a mess. It evolved from a glorified assembler over the course of many years and a lot of things would probably be done differently if it were redesigned from scratch today, but can’t be changed now without breaking existing code.

Some of these eccentricities, like “strings in ROM are packed in complicated ways that makes them different from arrays of characters” or “buffers for receiving captured text are a strange mixture of bytes and words”, are fundamental to the target virtual machines, so even a language redesign wouldn’t be enough to fix them. Unfortunately, the designers of the Z-machine just didn’t expect to need complicated string manipulation. So there’s not much that can be done about those parts.

zarf · November 24, 2020, 6:02am

And then the Inform language and library were designed to not need it. So the designer of the Glulx VM didn’t put in the capability because nothing needed it…

sarashinai · November 24, 2020, 8:52am

I feel like this should be nominated for the Simultaneously Most Helpful and Most Frustrating Thread Award

sarashinai · December 11, 2020, 10:15pm

Just to round off the discussion, I came up with this:

[ BuildStringStart;
    if (~isBuildingString && EnableMemoryStream(doubleHugeBuffer, DOUBLE_HUGE_LENGTH)){
        isBuildingString = true;
        rtrue;
    }
];

[ BuildStringFinish;
    if (isBuildingString){
       DisableMemoryStream();

        isBuildingString = false;

        if (doubleHugeBuffer-->0 == DOUBLE_HUGE_LENGTH) {
            print (UtilsRuntimeError)"BuildString -> combined length too long; please increase DOUBLE_HUGE_LENGTH in JSCommunication.h";
        }

        return doubleHugeBuffer;
    }

    return NULL;
];

which is part of a group of string routines that I’m going to put up on GitHub soon.
It lets you do things like this:

if (BuildStringStart()){
    print "JSReturnedValueAs -> Trying to convert return value of type ", (JSCommType)rType, " into a string.";
                        
    print (UtilsRuntimeError) BuildStringFinish();
}

or this:

if (BuildStringStart()){
    print "console.log('";
    print "Event: type -> ";                print ev-->0;
    print ", window -> ";                   print ev-->1;
    print ", e2 -> ";                       print ev-->2;
    print ", e3 -> ";                       print ev-->3;
    print ", context -> ";                  print context;
    print ", input length -> ";             print inputBuffer-->0;

    if (inputBuffer-->0 == 0 || inputBuffer >= 256){
        print ", inputBuffer -> empty"; 
    } else {
        print ", inputBuffer -> ";          print (StringOrArray)inputBuffer; 
    }

    print "');";

    JSCommunication.RunCode(BuildStringFinish());
}