Inform 7: memory overruns occuring in z-machine and not glulx

So my big question is what the title says. I’m 99% sure it’s something with Inform, but I’m also 99% sure there may be a constant I can set so that things don’t blow up.

Below is my list of tables. I suspect the names aren’t important, but I want to give some background to the problem.

Array TableOfTables → TheEmptyTable T0_final_question_options T1_locale_priorities T2_ordinary_status T3_fri_finds T4_fri_milestones T5_edu_finds T6_edu_milestones T7_sup_finds T8_sup_milestones T9_peo_finds T10_peo_milestones T11_stu_finds T12_stu_milestones T13_las_finds T14_las_milestones T15_fri_near_misses T16_edu_near_misses T17_sup_near_misses T18_peo_near_misses T19_stu_near_misses T20_scenery_3 T21_scenery_4 T22_scenery_5 T23_scenery_6 T24_scenery_7 T25_scenery_8 T26_scenery_9 T27_scenery_10 T28_silly_randoms T29_elevensies T30_dirmatches T32_silly_jokes 0 0;

outside-area is a privately-named room. printed name of outside-area is "Sector [your-sec]". "[outside-rand]"

to say outside-rand:
	unless set to unabbreviated room descriptions:
		continue the action;
	else:
		say "[one of]You feel as if you both should and shouldn't know this area[or]There's a [one of]smaller[or]larger[at random] than usual crowd by the [one of]teleports[or]vertical tubes[at random] here. Well, it can't always be constant[or]The sidewalks go from too crowded to too empty to unremarkable as you walk around[or]You think you see someone from a few blocks ago, coming the opposite way again, but you can't just go up and ASK them[or]Suddenly you remember how, as a kid, you wanted to visit every single sector one day[or]Someone complains their map-tracker has led them the wrong way[or]You stop to wonder how confusing old times must have been, when city subsectors had curvy bent borders[or]You wonder what living in a smaller city, one where diagonal streets were legal, would be like[in random order].[no line break]";

Now what happens after running certain test scripts repeatedly in Z-machine is, I get an error like

[** Programming error: tried to read from ->-4 in the array “T26_scenery_9”, which has entries 0 up to 31 **]

But this array/table is nowhere near the code above! In fact it’s only accessed in very specific instances I didn’t get close to testing.

So it looks like something is getting overwritten here. I tried this code which worked for Threediopolis

Use maximum indexed text length of at least 3000. [2000 is usual]

Use dynamic memory allocation of at least 16384. [8192 is usual]

But I am still getting the errors, and I can’t increase the dynamic memory allocation any more.

Is there anything else I can reasonably try to get a z-machine binary? Should I maybe try converting tables to lists, or something? Or am I just stuck with glulx?

(Note: it would be possible to reorganize everything from tables but that’s more code-munging than I’m willing to undergo at the moment. It’d be nice but not critical to have a z8/zblorb binary.)

Thanks!

1 Like

What does To say your-sec: do?

1 Like

Oops … it’s just a stub I forgot to get rid of.

Long story short the game is only one room programmatically but it tracks going north and south and such and changes the sector e.g. 100 is above 000, 010 is north and 001 is east.

I’ve run some additional tests and it seems like z-machine gets overwhelmed a bit even if I convert tables to lists. It’s odd because the [one of] [or] structure doesn’t have any fancy text in it, but somehow it winds up reading another area.

For instance if I put the silly text in a list and recompiled, it threw the same errors but with the T25 table instead.

Are there any debug commands that could be used to suss this all out? I’m not very good with TRACE etc but would be willing to track stuff down.

What version of I7 are you using for this? Do you have any I6 inclusions already? What makes you certain that it’s this particular code you’ve posted that is causing the error?

The operation being performed when the error occurs ({table}-->N) should be about trying to get the array holding a particular table column’s data. The RTE’s complaint is that the array index is out of bounds. Based on what you posted, there is an attempt to access the array at index -4, which doesn’t really make a lot of sense.

It’s possible that there’s some integer overflow happening – there was an odd bug a while ago from someone who was getting a stack pointer problem in Z-Machine due to a routine with no guards against overflow. I don’t see any reason (yet) that explains why that would be happening here.

Note that the message that you’re getting looks like it’s coming out of the I6 veneer code, but even that doesn’t make sense, because the veneer routine most likely to be associated with that error has been modified in I7 specifically to avoid producing it.EDIT: Definitely not that; see below.

Also note that I am assuming that you missed a dash when transcribing the error message. If it really says from ->-4 in the array instead of from -->-4 in the array, then something even weirder is going on.

This may be one of those ones that you need to get down to a minimum example to post. It would be very hard to diagnose without some code to inspect.

1 Like

I’d say the cause of the error could be anywhere. I’m guessing it’s some kind of array overrun, although otisdog’s suggestion of a 16-bit integer overflow is also a possibility. Either way, it could be happening absolutely anywhere in the code; the RTE shows up later when the corrupted values trip something else up.

This is not the easiest kind of bug to track down. :/

(In an ideal world, the kit/library code would print a RTE before overflowing or overrunning or whatever is going on. That error check is clearly missing. It would be nice to diagnose this simply so we can get that check in place.)

2 Likes

The only place I know of where that kind of output can be produced is via RT__Err() in the veneer code:

[ RT__Err crime obj id size p q;
	print "^[** Programming error: ";	! <-- 1st part
	...
	if (crime < 32) {
	    print "tried to ";				! <-- 2nd part
	    if (crime >= 28) {
	        if (crime == 28 or 29)
	            print "read from ";
	        else
	            print "write to ";
	        if (crime == 29 or 31)
	            print "-";
	        print "->", obj, " in the";	! <-- 3rd part
	        switch (size & 7) {
	            0,1:    q=0;
	              2:    print " string"; q=1;
	              3:    print " table";  q=1;
	              4:    print " buffer"; q=WORDSIZE;
	        }
	        if (size & 16)
	            print " (->)";
	        if (size & 8)
	            print" (-->)";
	        " array ~", (string) #array_names_offset-->p, "~, which has entries ", q, " up to ",id," **]";	! <-- 4th part
	    }
	...

I was thinking of RT__ChLDW() as the likely culprit, but I was wrong. That routine uses crime code 25 in the normal I6 veneer, and (as zarf once pointed out to me) the version from ZMachine.i6t doesn’t call RT__Err() at all. So it’s not the cause.

It looks like the function calls for array bounds violations are inlined when the -S flag is set for compiling. I think that’s why zarf is saying that it could be from anywhere. (At least, anywhere that there’s an array read.)

The bad news is that the in-lined code can’t be modified except at the compiler itself. The good news is that, unlike an I7 RTP, an I6 RTE should be printed out immediately. That means it’s possible to find out when the bad read happens, though it might not be obvious what’s causing it. To do this it may be necessary to activate tracing for every function call, which can be done by compiling the auto.inf file with the g switch:

<path>/inform6 -gwxE2kSDv8 $huge auto.inf output.z8

Note that to get the resulting file down to a legal size for z8, it will be necessary to use:

Use OMIT_UNUSED_ROUTINES of 1.

in your I7 source, and it will almost certainly be necessary to trim the source down to a minimal example, as an empty 6M62 game will compile to a size of about 450K (leaving only about 70K for your own content).

1 Like

Thanks! Yeah, it’s hard to get a minimal example, because, well, if I just take the table/list I’m using on its own, it works okay. And if I remove certain chunks of the code, it still blows up.

My current auto.inf-s are compiled with 6G60. If the problem here is the old version of I7, and there’s no realy way to do maintenance on it, that is okay, and I’ll just use Glulx.

c:\games\inform\fourdiopolis.inform\Build>"C:\Program Files\Inform-10-1-2\Compilers\inform6.exe" -gwxE2kSDv8 $huge auto.inf new-output.z8

But if there is error checking that I6 should have, or a genuine place where the I6 code is wrong, then I’d be glad to help with that.

I do have a compiled version of new-output.z8 which clocks in at 705c8 bytes.

compiling details
Dynamic +---------------------+   00000
memory  |       header        |
        +---------------------+   00040
        |    abbreviations    |
        + - - - - - - - - - - +   00042
        | abbreviations table |
        +---------------------+   00102
        |  header extension   |
        +---------------------+   0010a
        |  property defaults  |
        + - - - - - - - - - - +   00188
        |       objects       |
        + - - - - - - - - - - +   005b0
        | object short names, |
        | common prop values  |
        + - - - - - - - - - - +   01445
        | class numbers table |
        + - - - - - - - - - - +   01475
        | symbol names table  |
        + - - - - - - - - - - +   01b27
        | indiv prop values   |
        +---------------------+   024cd
        |  global variables   |
        + - - - - - - - - - - +   026ad
        |       arrays        |
        +=====================+   0d76c
Readable|    grammar table    |
memory  + - - - - - - - - - - +   0de47
        |       actions       |
        + - - - - - - - - - - +   0df5b
        |   parsing routines  |
        + - - - - - - - - - - +   0df5d
        |     adjectives      |
        +---------------------+   0df5d
        |     dictionary      |
        +=====================+   0ecb8
Above   |       Z-code        |
readable+---------------------+   56728
memory  |       strings       |
        +---------------------+   705c8

So far when I run new-output.z8, it gives a lot of similar debug text. I’ve held down the space bar to get through (more) but after a minute, still nothing. Should I just keep waiting/going? Is it useful to copy and paste what I have?

Thanks again!

There are a lot of function calls in an I7 game. It will eventually reach the command prompt, at which point new input will be accepted. It will be difficult to separate game output from debug output, so it would be best to have a set sequence of commands that reliably generate the error. If it’s short enough, you could enter the whole sequence at the first command prompt, e.g. >N. LOOK. JUMP. TAKE WIDGET. Making the first command TRANSCRIPT ON may be helpful to capture the output to a file.

Once the context of the RTE (i.e. last routine called before it shows up) is identified, you can switch modes. Look in the auto.inf file for that specific routine, and put an asterisk (*) between the routine name and any arguments in its first line. For example:

[ SomeI6Routine a b c ;

would become

[ SomeI6Routine * a b c ;

That sets up debug tracing for the individual routine, even if it is compiled without the -g switch. At that point you can look for routines that call the trouble routine by searching for “SomeI6Routine(” in the rest of the code, and add tracing to all of those functions. Then when you run the game, you will be able to tell which routine called the one issuing the RTE, say ParentRoutine(). You can then turn off tracing for everything but SomeI6Routine() and ParentRoutine() and add tracing to anything that calls ParentRoutine(). Repeat as necessary until you can see the call chain leading to the error.

It can be a real hassle to do this, but if there is a low-level problem somewhere, it would be worth it to identify the root cause. If your auto.inf will already compile without trying to reduce its size, you could post it here (or via PM), and I would take a look.

2 Likes

No. I meant some other code is messing up the table layout, and that could be happening anywhere. The RTE is merely a symptom which turns up an unknown amount of time later.

Finding the RTE is fairly easy – relatively easy, anyway. Just put an I6 print statement before every line that accesses T26_scenery_9 (or wherever the message has got to.) (You need to grab a copy of the auto.inf which compiles to the errant game file and do your debugging on that.)

Unfortunately, as I said, I don’t think that’s the root cause at all.

That is indeed very old. A lot of Inform bugs have been fixed since 6G60.

I am somewhat torn here. I don’t want to ignore bugs. And there’s some chance that the bug still exists when compiling for Glulx; it just doesn’t show up the same way.

On the other hand, there’s a pretty good chance that we could spend a lot of time tracking down the bug and then discover it was fixed in 6M62. As you say, we’re not doing maintenance on old Inform compilers.

If I’m understanding correctly, the parameters for the call to RT__Err() (including upper boundary value and index for array name within #array_names_offset) are hardcoded at compilation. These are the only source of RT__Err(29,...) calls that I know of.

I hadn’t realized that the array name parameter was hardcoded via inlining, too. To me that implies that the call must be originating with a genuine attempt to access that particular array, since the inlined routine code (resident in packed memory on Z-machine) can’t be altered at run-time. The only piece of information that’s not hardcoded in the output of the RTE is the index value to be read from the array.

@zarf, I see how that means that the search for the RTE trigger can be limited to attempts to access the array cited by the RTE. I don’t see what you mean about how some other code could be messing up the table layout in a way that could produce the RTE later, though. Absent I6 tinkering, any attempt to access a table’s array will usually be done by a loop that iterates over the range 1 to {table}-->0. (Most often this is via TableFindCol().) I’m not sure how such a loop could ever return a -4 value for the column index, to cause the RTE that aschultz is seeing.

I’m curious enough to try to track down the root cause even if it is from a fixed bug. If the flaw is something in the template layer, it’s possible that the fix could be backported to 6G60 via an inclusion. If the root cause looks like something that persists to v10 (or the current version of I6), I can raise the red flag.

1 Like

Thanks for following up! I had been trying to reproduce the bug, but it’s tricky … my main recourse is to try to create a command that consistently loops & I was tinkering with what would be best.

I can give more data with a longer test run – the big problem is just giving my computer time to run as I hold the space bar down or something.

That’s correct.

I don’t see what you mean about how some other code could be messing up the table layout in a way that could produce the RTE later, though.

Hm, I was thinking that more of the run-time-checking structure was in writable memory (as opposed to compiled into the code segment). You may be right. But I have faith that overrunning arrays can mess up runtime behavior in more ways than you thought possible. :)

(For that matter, the I6 that shipped with 6G60 also had a fair number of bugs.)