Glulxe fatal error: Memory access out of range (2519010C)

capmikee · September 29, 2014, 8:43pm

I’m always pushing limits here!

I’ve been running a big suite of tests in Kerkerkruip and I just got this error:

Are there any settings I can increase to avoid it?

zarf · September 29, 2014, 9:10pm

That’s not a running-out-of-memory error, that’s a bug that accesses a garbage memory address.

Have you looked at inform7.com/mantis/view.php?id=1390 ?

capmikee · September 30, 2014, 2:24am

That’s kind of a relief. It was crashing at the same point every time, so maybe it’s something I can actually fix.

capmikee · September 30, 2014, 4:52pm

Okay, I think I found the point where the crash happens, but it’s weird. I’m not sure the code is always failing at the same point, but it often fails on the line “blank out the whole of…”

A last Standard AI rule for a person (called P) (this is the select an action and do it rule): log "select an action and do it for [P]"; [this log entry USUALLY shows up] blank out the whole of the Table of AI Action Options; log "blanked out table of AI Action Options"; [if there is a crash within this rule, this log entry NEVER shows up] ...

The Table of AI Action Options contains stored actions. Could that be causing a problem somehow?

Table of AI Action Options Option Action Weight a stored action a number with 20 blank rows

zarf · September 30, 2014, 5:34pm

What version of I7? The bugs I linked to were in 6L02; should be fixed in 6L38; but perhaps you’ve found another case that was missed.

capmikee · September 30, 2014, 5:39pm

This is the updated Mac version of 6L38. From the About box:

Do you have any suggestions for how to replicate the bug on a smaller scale? I don’t even know where to begin at this point.

capmikee · September 30, 2014, 8:04pm

While looking for a workaround, I did learn that repeating through the table is ok, but trying to blank out an individual row also causes the error. And I also learned that “choose row 1” does not choose the first nonblank row. Isn’t there a phrase to do that?

[code]To cautiously blank out (contents - a table name):
while the number of filled rows in contents > 0:
choose a random row in contents;
log “blanking out [option entry]: [action weight entry][line break]”;
blank out the whole row;

A last Standard AI rule for a person (called P) (this is the select an action and do it rule):
log “select an action and do it for [P] - [number of filled rows in table of ai action options] rows”;
cautiously blank out Table of AI Action Options;
[/code]

…
select an action and do it for zombie toad - 4 rows
blanking out the blood ape teleporting: -995

blanking out the blood ape attacking you: -20

blanking out the blood ape waiting: -19

blanking out the blood ape concentrating: 102

blanked out table of AI Action Options

Now there are 4 action selections
done standard AI for the zombie toad
The new main actor is you .
checking if zombie toad cowered
zombie toad cowered 0 against a target of 0 percent
checking if you cowered
you cowered 0 against a target of 0 percent
checking if blood ape cowered
blood ape cowered 0 against a target of
19 percent
done testing effects of cower-counting until next turn
done taking a player action
The new main actor is the blood ape
The new main actor is you .
checking if zombie toad cowered
zombie toad cowered 0 against a target of 0 percent
checking if you cowered
you cowered 0 against a target of 0 percent
checking if blood ape cowered
blood ape cowered 0 against a target of
19 percent
done testing effects of cower-counting until next turn
done taking a player action
The new main actor is you .
checking if zombie toad cowered
zombie toad cowered 0 against a target of 0 percent
checking if you cowered
you cowered 0 against a target of 0 percent
checking if blood ape cowered
blood ape cowered 0 against a target of
19 percent
done testing effects of cower-counting until next turn
done taking a player action
starting standard AI for you
Got this far
select an action and do it for zombie toad - 4 rows
blanking out the zombie toad attacking the blood ape: -18
[crashes here]

Draconis · September 30, 2014, 10:14pm

The method used in some of the example code is “repeat through [table]: [do stuff]; break.” If it gets past the break statement, you’re out of non-blank rows.

capmikee · October 1, 2014, 2:43am

Wow, that seems like a really ugly hack.

Dannii · October 1, 2014, 2:59am

I’m unable to reproduce the error with a short test case.

capmikee · October 1, 2014, 3:03am

Yeah, me too. I guess I should commit my code so other people can mess around with it. But even the test that crashes doesn’t always do it - it seems to depend on certain starting conditions that I can’t identify. It’ll have to wait until tomorrow, though.

Dannii · October 1, 2014, 3:13am

When you do, commit it to a branch please (Actually it would probably be good to keep all the test stuff in a branch until you’re completely finished.)

capmikee · October 1, 2014, 6:55pm

For those of you not following on github, here’s a link to the relevant file on the bugfix branch:

github.com/i7/kerkerkruip/blob/ … 20Core.i7x

To replicate the bug, you’ll need to clone the entire branch, run Kerkerkruip, start a game, and enter this command:

queue test dreadful-presence-test

If that doesn’t replicate the problem, you can try tweaking the random seed… ask me if you need more information about that. But for me, right now, the VM crashes with every seed, although it happens after an unpredictable number of table accesses.

capmikee · October 1, 2014, 8:10pm

I’m reading the discussion on the Mantis bug reports and checking that against Tables.i6t. I noticed this comment for ForceTableEntryBlank:

I wonder how this interacts with the stored action variable “the main actor’s action,” which is set from an entry in the Table of AI Action Options. If I’m reading this code correctly, that setting is done by copying and not by reference, so there should be no problem. And there is not a problem most of the time. It’s only in this one test that I’ve ever seen the crash. The test does involve some hijacking of the normal Kerkerkruip AI behavior (dreadful presence stops people from acting sometimes), but not it a way that I could imagine being the cause. I’m not sure what else is special about the test, but there could be something I’m missing.

Edit: Maybe it could be the cause. By diverting the normal sequence of AI rules, it might bypass the line that updates the main actor’s action, leaving an action from an earlier turn still in there. I still don’t know why that would affect the table, but at least it’s something to check.

But anyway, perhaps a close reading of ForceTableEntryBlank might shed some light on this. I’ll look at it, but I don’t know if I have enough understanding of the code to really see it.

capmikee · October 2, 2014, 4:49pm

Here’s something really weird:

I copied the code for “Force Entry Blank” into an Include block in my project, planning to insert some debug messages. Then I ran the code to check that it still worked. Lo and behold, no VM crash! But I did get this when the game restarted itself after the test:

[** Programming error: tried to read from ->2144093 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to write to ->2144093 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to read from ->2144093 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to write to ->2144093 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to read from ->2144094 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to write to ->2144094 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to read from ->2144094 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to write to ->2144094 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to read from ->2144094 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to write to ->2144094 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to read from ->2144222 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to write to ->2144222 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to read from ->2144222 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to write to ->2144222 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to read from ->2144222 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to write to ->2144222 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to read from ->2144222 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to write to ->2144222 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to read from ->2144222 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to write to ->2144222 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to read from ->2144095 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to write to ->2144095 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to read from ->2144095 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to write to ->2144095 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to read from ->2144095 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to write to ->2144095 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to read from ->2144095 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to write to ->2144095 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to read from ->2144095 in the array “TB_Blanks”, which has entries 0 up to 328 **]

[** Programming error: tried to write to ->2144095 in the array “TB_Blanks”, which has entries 0 up to 328 **]

After some more investigation, it looks like a garbage value is being written into the TB_Blanks address in the locale description priority column of the Table of Locale Priorities. I don’t know how. Maybe I copied something over wrong from Tables.i6t, but I don’t know how that could have happened either. In fact, I tried copying it again to be sure, and I got the same result. I even checked all the whitespace in case there were some anomalies in the copy/paste operation. None that I could find.

capmikee · October 3, 2014, 3:33pm

Okay, I’ve found the point where the Table of Locale Priorities gets corrupted. It’s this rule:

Last when play begins (this is the create shimmering items rule): repeat with guy running through people: unless guy is the player: repeat with item running through things held by guy: if item is a weapon and item is not a natural weapon: if item is readied: let new-weapon be a new object cloned from item; now new-weapon is shimmering; now shimmer-owner of new-weapon is guy; if item is clothing: if guy wears item: let new-cloth be a new object cloned from item; now new-cloth is shimmering; now shimmer-owner of new-cloth is guy.
I assume from this that there’s a problem with dynamic objects.

Oh wait… now I know why copying the relevant sections of Tables.i6t changed the behavior: Dynamic Tables is included by Dynamic Objects, and it also replaces these sections of Tables.i6t. I was actually undoing some of the work done by Dynamic Tables! Now back to the drawing board…

capmikee · October 3, 2014, 7:06pm

Fresh start now.

I believe I’ve copied all the code correctly this time before inserting print statements. Now it looks like the original crash, and the crash is happening in FlexFree. I don’t think I understand how this is supposed to work, so let me just show you what I did:

[code]include (-
[ FlexFree block fromtxb ptxb memsize;
@getmemsize memsize;
print "FlexFree “, block, " memsize=”, memsize, “^”;
if (block == 0) return;
if ((block->BLK_HEADER_FLAGS) & BLK_FLAG_RESIDENT) return;
if ((block->BLK_HEADER_N) & $80) return; ! not a flexible block at all
if ((block->BLK_HEADER_FLAGS) & BLK_FLAG_MULTIPLE) {
print “Block is multiple^”;
if (block–>BLK_PREV ~= NULL) (block–>BLK_PREV)–>BLK_NEXT = NULL;
fromtxb = block;
for (:(block–>BLK_NEXT)~=NULL:block = block–>BLK_NEXT) {
print "current block is ", block, “, next=”, block–>BLK_NEXT, “, previous=”, block–>BLK_PREV, “(NULL=”, NULL, “)^”;
}
while (block ~= fromtxb) {
print "Freeing component block ", block, “^”;
ptxb = block–>BLK_PREV; FlexFreeSingleBlockInternal(block); block = ptxb;
}
}
print "Freeing original block ", block, “^”;
FlexFreeSingleBlockInternal(block);
];

! The rest of this section is unmodified…

[ FlexFreeSingleBlockInternal block free nx;
block–>BLK_HEADER_KOV = 0;
block–>BLK_HEADER_RCOUNT = 0;
block->BLK_HEADER_FLAGS = BLK_FLAG_MULTIPLE;
for (free = Flex_Heap:free ~= NULL:free = free–>BLK_NEXT) {
nx = free–>BLK_NEXT;
if (nx == NULL) {
free–>BLK_NEXT = block;
block–>BLK_PREV = free;
block–>BLK_NEXT = NULL;
FlexMergeInternal(block);
return;
}
if (UnsignedCompare(nx, block) == 1) {
free–>BLK_NEXT = block;
block–>BLK_PREV = free;
block–>BLK_NEXT = nx;
nx–>BLK_PREV = block;
FlexMergeInternal(block);
return;
}
}
];
-) instead of “Deallocation” in “Flex.i6t”.
[/code]

And the output:

I’m going to assume there was a stream mixup and that programming error actually happens after the last print statement. So it looks like the trouble begins when the object’s first child is the 0 object (yourself?)… or maybe I’m way off. This is just weird-looking.

zarf · October 3, 2014, 8:34pm

Don’t assume that. Probably not true.

Can you compile the version of Glulxe in the branches github.com/erkyrath/cheapglk/tree/debugger ? This would let you set a breakpoint on the RT__Err() function and look at the stack trace.

capmikee · October 3, 2014, 9:04pm

Maybe next week… that would have a bit of a learning curve for me.

capmikee · October 3, 2014, 9:14pm

Oh, you’re right. It must be this line that produces the programming error:

		if (block-->BLK_PREV ~= NULL) (block-->BLK_PREV)-->BLK_NEXT = NULL;

According to the output, block–>BLK_PREV = 0. I guess you can’t write to 0–>BLK_NEXT…?