Heap space exhausted - solution and why

Draconis · August 4, 2024, 2:16am

Under the hood, the compiled game has a big table mapping vocabulary words to the objects they match. I’m guessing this table is eating up too much space which leads to the crash.

Unfortunately there’s no obvious way to fix that except to reduce your game’s vocabulary.

And yes, I would love to see Dialog compile to a 32-bit virtual machine that doesn’t have these limitations. Glulx is the obvious choice, but Linus was also talking about making a separate 32-bit Å-machine alongside the 16-bit one.

I’ve looked at the Dialog source code, and someday I might try to adapt it…but it’s a huge amount of C code with very few comments, so that’s not a task I’m ready to take on until I know I have a lot of free time to burn.

Pacian · August 4, 2024, 8:30am

Thanks both.

I’ve commented out nine of a certain kind of second-level scenery object, and I’m hoping that’ll give me a bit of safety. I don’t like the idea of being too close to the point where I noticed things breaking, in case there’s more subtle or complex situations where the game still gets pushed above a limit and breaks.

cfmoorz2 · August 10, 2024, 8:52pm

it’s unfortunate - my current project is a straightforward parser game and i’ve gone back to inform (i prefer 6) which is comparatively painful. but if i proceed with dialog i can’t know ahead of time that i won’t wind up overtaxing the z-interpreter and start getting these stupid heap space errors. if i get these with inform i can just compile to glulx and still have the stand-alone file i want.

i start to get frustrated but then remember that no one is getting paid for any of this and that this is simply brilliant people doing amazing things for the love of it.

jwalrus · August 18, 2024, 7:34pm

I’ve been trying to come up with a (relatively) minimal reproducible example of this, with mixed success.

Generating a basic game with a very large number of dictionary words (an object with slightly more than 400 synonyms will do it) gives a heap error when you try to interact with it. However, it gives the same error whatever word you use to refer to it, unlike the example from @Pacian where only certain synonyms trigger the bug. I also can’t reproduce the behaviour where the heap consumption expands - in all my examples, increasing the heap size removes the error (at least until you add more synonyms).

Anyone else attempted the same?

Pacian · August 18, 2024, 8:04pm

I know that both I and @cfmoorz2 have a lot of on every tick rules, and, for me at least, those rules are often using exhaust loops. I’m not sure if that could have an impact? I do also have a lot of per-object flags, and two per-object variables.

Both heap overflow issues presented in the library (parse object name $Words as $Result $All $Policy) predicate, is that what you are seeing in your attempt?

cfmoorz2 · August 19, 2024, 1:12pm

maybe there’s not a simple answer to this but what are meminfo numbers that should make me uneasy?

Draconis · August 19, 2024, 4:55pm

So this is how that library predicate is implemented:

(parse object name $Words as $Result $All $Policy)
	(filter $Words into $Filtered)
	(nonempty $Filtered)
	(collect $Obj)
		%% catch expressions like ‘open the eastern window’
		($Filtered = [$Head | $Tail])
		(nonempty $Tail)
		(parse direction [$Head] $Dir)
		(current room $Room)
		(determine object $Obj)
			*(from $Room go $Dir to object $Obj)
			~(direction $Obj)
			~(relation $Obj)
			(words $Tail sufficiently specify $Obj)
		(from words)
			*(dict $Obj)
		(matching all of $Tail)
	(or)
		%% this is the normal case
		(determine object $Obj)
			*($Obj is in scope)
			~(direction $Obj)
			~(relation $Obj)
			(words $Filtered sufficiently specify $Obj)
		(from words)
			*(dict $Obj)
		(or)
			*(plural dict $Obj)
		(matching all of $Filtered)
	(into $Candidates)
	(nonempty $Candidates)
	(apply policy to $Candidates $Policy $CleanList)
	(if)
		%% Optimize for the common case.
		($CleanList = [$])
	(or)
		%% E.g. get all wooden.
		($All = 1)
	(or)
		%% A plural word causes all matching objects to be returned.
		*($PluralWord is one of $Filtered)
		(determine object $PObj)
			*($PObj is one of $CleanList)
		(from words)
			*(plural dict $PObj)
		(matching all of [$PluralWord])
	(then)
		($Result = $CleanList)
	(else)
		%% Backtrack over each matching object, for disambiguation.
		(if) (fungibility enabled) (then)
			(strip-fungible $CleanList $UniqueList)
			*($Obj is one of $UniqueList)
			($Result = [$Obj])
		(else)
			*($Obj is one of $CleanList)
			($Result = [$Obj])
		(endif)
	(endif)

Notably, this is the main place that the special (determine object $) (from words) (matching all of $) code is used. Since the problem seems to come from having too many dictionary words, I suspect the way that construction is implemented on the back-end is to blame.

Pacian · August 19, 2024, 7:24pm

I guess some things to note about the issue:

It only presents with certain words.
Those words have always been words that could potentially require disambiguation.
But they still trigger a crash even in situations where they are unambiguous or even do not refer to any objects in reach of the player.
It isn’t all words that could require disambiguation but a subset - perhaps just one or two words, but it’s hard to really tell how many.
The number of words that trigger an issue seems to increase as the game pushes further past the limit.
Getting further past the limit also triggers illegal object errors instead of heap errors, which I’m not sure how to pin down to a specific predicate.

When I’m not finalising an IFComp entry, maybe I’ll have a crack at reproducing it myself. Perhaps a Python script to generate Dialog objects with synonyms from a randomized pool of words would do it…

jwalrus · August 19, 2024, 8:54pm

A Python script to generate Dialog source code with large numbers of objects/synonyms is exactly what I’m trying at the moment. But the only way I’ve managed to produce heap errors is by giving a single object 400+ synonyms (which exhausts the heap in the query to *(dict $Obj)). Adding large numbers of each-turn rules and per-object variables doesn’t seem to help. And I certainly can’t work out how the parser can crash on specific synonyms for an object which isn’t even in scope …

Pacian · August 22, 2024, 12:28pm

explode.dg (44.4 KB)
explode.z8 (141.1 KB)

Here we go…

My Python script made objects from a list of a hundred random words and assigned them a flag based on a coin flip. Then I give all the objects with that flag a synonym.

The tricky part is, which of those words will trigger a crash. I randomly found out that “jail” does it (although in this case only when the objects are in scope…)

I compiled against the standard library and I’ve attached the .z8 file in case the results are non-deterministic somehow…

It’s also very possible not all the parts of this example are necessary to reproduce the issue, but figuring out when it has triggered is hard. Is there an easy way to paste a hundred commands into an interpreter?

Screenshot 2024-08-22 131802

Edit: Hmm, or it just that it’s the words which match a lot of objects… Which isn’t quite what I encountered with my WIP…

Edit 2: And increasing the heap size also fixes the issue. OK, I got ahead of myself.

Pacian · August 22, 2024, 1:58pm

explode (1).dg (81.0 KB)

explode (1).z8 (162.6 KB)

Here’s my current best attempt: for me, this reliably crashes on “x wing” even though there are only three “wing” objects in the game and none are in the starting room… BUT it only crashes if I compile with the maximum heap space size.

Draconis · August 22, 2024, 4:08pm

Once the IFComp deadline is past and I’ve properly moved into my new apartment and started my teaching for the semester (this week is a bit insane), I’ll see if I can make a version of the compiler that prints diagnostics about the dictionary when in verbose mode. There’s a bunch of tracing stuff in the source but it’s commented out in the current version.

jwalrus · August 23, 2024, 8:49pm

When @Pacian originally described the symptoms, my hypothesis was that dialogc was producing a malformed dictionary entry when the size of the z-code file exceeded a certain threshold, and the invalid data was somehow causing the interpreter to enter an infinite loop allocating heap memory. But on inspection, Dialog z-code files (unlike those generated by Inform) don’t actually attach any additional data to dictionary words at all, so there’s nothing that could even be malformed. Inspecting the binary confirms that the dictionary in explode.z8 contains all of the expected vocabulary words, correctly z-encoded, in the correct order.

I managed to partially automate testing which words cause crashed when used; the first two-thirds of the alphabet is fine, but after a certain point, about one-third to one-half of words cause a stack overflow when used in a noun context (you can type RELEVANCE, but X RELEVANCE crashes the game).

I thought it was the case that none of the vocab defined by the standard library caused a crash, but X VERSION does it.

Partial list of vocab words which cause a crash - many of the ones which come alphabetically after this also do, but after finding the point where words start to cause crashes relatively easily, I had to start testing individual words manually, so I didn’t do the whole dictionary:

fatal error: object 4980 out of range (pc = 0x1cd72)

receipt

fatal error: call stack too deep: 1025 (pc = 0x9ebd)

reform reject relevance relief remunerat requireme revoke rider robot room rung
salt sand satellite scatter scrap secular set shake slide soprano soul speech

I had a quick look at explode.dg, but couldn’t see any obvious link between how the words are used in the source and which ones cause crashes (e.g. #Object11 has dictionary words organize and wonder; the former can be used fine but the latter causes a crash).

I guess the next step is to try tracing the execution of the code itself somehow …

Draconis · August 23, 2024, 8:58pm

Perhaps this?

bkirwi · August 23, 2024, 9:57pm

I tried it, why not! It does run, and reproduces the error. The log for just the one-move repro is already 100k lines, so I won’t attach the whole thing; but here’s the final stretch before the restore_undo:

 9ec1: call_vs          1295 local0 local3 -> local3
 94a9: store            #16 g9
 94ac: add              g9 #04 -> g9
 94b0: jg               g9 g3 ?94b8
 94b4: jg               g9 g4 ?~94bb
 94bb: storew           g6 #00 local0
 94c0: storew           g6 #01 local1
 94c5: log_shift        g6 g35 -> local0
 94cb: or               local0 g33 -> local0
 94cf: ret              local0
 9ec8: dec_chk          #02 #01 ?~9ebd
 9ebd: call_1s          13d3 -> local0
 9e99: dec              #18
 9e9b: loadw            g36 g8 -> local0
 9e9f: jl               local0 #00 ?~9ee2
 9ea4: je               local0 g32 ?9ee4
 9ea8: je               local0 9000 ?9ecf
 9eae: and              local0 g37 -> local1
 9eb2: store            #04 g37
 9eb5: test             local0 g29 ?~9ebd
 9ebd: call_1s          13d3 -> local0
 9e99: dec              #18
 9e9b: loadw            g36 g8 -> local0
 9e9f: jl               local0 #00 ?~9ee2
 9ee2: ret              local0
 9ec1: call_vs          1295 local0 local3 -> local3
 94a9: store            #16 g9
 94ac: add              g9 #04 -> g9
 94b0: jg               g9 g3 ?94b8
 94b4: jg               g9 g4 ?~94bb
 94bb: storew           g6 #00 local0
 94c0: storew           g6 #01 local1
 94c5: log_shift        g6 g35 -> local0
 94cb: or               local0 g33 -> local0
 94cf: ret              local0
 9ec8: dec_chk          #02 #01 ?~9ebd
 9ebd: call_1s          13d3 -> local0
 9e99: dec              #18
 9e9b: loadw            g36 g8 -> local0
 9e9f: jl               local0 #00 ?~9ee2
 9ea4: je               local0 g32 ?9ee4
 9ea8: je               local0 9000 ?9ecf
 9eae: and              local0 g37 -> local1
 9eb2: store            #04 g37
 9eb5: test             local0 g29 ?~9ebd
 9ebd: call_1s          13d3 -> local0
 9e99: dec              #18
 9e9b: loadw            g36 g8 -> local0
 9e9f: jl               local0 #00 ?~9ee2
 9ee2: ret              local0
 9ec1: call_vs          1295 local0 local3 -> local3
 94a9: store            #16 g9
 94ac: add              g9 #04 -> g9
 94b0: jg               g9 g3 ?94b8
 94b4: jg               g9 g4 ?~94bb
 94bb: storew           g6 #00 local0
 94c0: storew           g6 #01 local1
 94c5: log_shift        g6 g35 -> local0
 94cb: or               local0 g33 -> local0
 94cf: ret              local0
 9ec8: dec_chk          #02 #01 ?~9ebd
 9ebd: call_1s          13d3 -> local0
 9e99: dec              #18
 9e9b: loadw            g36 g8 -> local0
 9e9f: jl               local0 #00 ?~9ee2
 9ea4: je               local0 g32 ?9ee4
 9ea8: je               local0 9000 ?9ecf
 9eae: and              local0 g37 -> local1
 9eb2: store            #04 g37
 9eb5: test             local0 g29 ?~9ebd
 9ebd: call_1s          13d3 -> local0
 9e99: dec              #18
 9e9b: loadw            g36 g8 -> local0
 9e9f: jl               local0 #00 ?~9ee2
 9ee2: ret              local0
 9ec1: call_vs          1295 local0 local3 -> local3
 94a9: store            #16 g9
 94ac: add              g9 #04 -> g9
 94b0: jg               g9 g3 ?94b8
 94b4: jg               g9 g4 ?~94bb
 94bb: storew           g6 #00 local0
 94c0: storew           g6 #01 local1
 94c5: log_shift        g6 g35 -> local0
 94cb: or               local0 g33 -> local0
 94cf: ret              local0
 9ec8: dec_chk          #02 #01 ?~9ebd
 9ebd: call_1s          13d3 -> local0
 9e99: dec              #18
 9e9b: loadw            g36 g8 -> local0
 9e9f: jl               local0 #00 ?~9ee2
 9ea4: je               local0 g32 ?9ee4
 9ea8: je               local0 9000 ?9ecf
 9eae: and              local0 g37 -> local1
 9eb2: store            #04 g37
 9eb5: test             local0 g29 ?~9ebd
 9ebd: call_1s          13d3 -> local0
 9e99: dec              #18
 9e9b: loadw            g36 g8 -> local0
 9e9f: jl               local0 #00 ?~9ee2
 9ee2: ret              local0
 9ec1: call_vs          1295 local0 local3 -> local3
 94a9: store            #16 g9
 94ac: add              g9 #04 -> g9
 94b0: jg               g9 g3 ?94b8
 94b4: jg               g9 g4 ?~94bb
 94bb: storew           g6 #00 local0
 94c0: storew           g6 #01 local1
 94c5: log_shift        g6 g35 -> local0
 94cb: or               local0 g33 -> local0
 94cf: ret              local0
 9ec8: dec_chk          #02 #01 ?~9ebd
 9ebd: call_1s          13d3 -> local0
 9e99: dec              #18
 9e9b: loadw            g36 g8 -> local0
 9e9f: jl               local0 #00 ?~9ee2
 9ea4: je               local0 g32 ?9ee4
 9ea8: je               local0 9000 ?9ecf
 9eae: and              local0 g37 -> local1
 9eb2: store            #04 g37
 9eb5: test             local0 g29 ?~9ebd
 9ebd: call_1s          13d3 -> local0
 9e99: dec              #18
 9e9b: loadw            g36 g8 -> local0
 9e9f: jl               local0 #00 ?~9ee2
 9ee2: ret              local0
 9ec1: call_vs          1295 local0 local3 -> local3
 94a9: store            #16 g9
 94ac: add              g9 #04 -> g9
 94b0: jg               g9 g3 ?94b8
 94b8: throw            #01 g14
 8f1e: add              g0 g31 -> g40
 8f22: call_1n          1204
 9021: jz               g17 ?~rfalse
 9024: jg               g1 #03 ?rfalse
 8f25: store            #39 14cd
 8f2a: jump             ffbd
 8ee8: store            #17 01f4
 8eed: store            #19 0040
 8ef2: store            #12 1374
 8ef7: store            #13 4040
 8efc: store            #14 g3
 8eff: store            #11 #04
 8f02: store            #18 #00
 8f05: store            #20 #00
 8f08: store            #21 #00
 8f0b: store            #25 #00
 8f0e: store            #2b #00
 8f11: store            #2c #00
 8f14: set_window       #00
 8f17: call_1n          1252
 9291: call_2n          1253 g27
 9299: jz               g17 ?~rfalse
 929c: jz               g21 ?~rfalse
 929f: set_text_style   #00
 92a2: set_text_style   local0
 92a5: rfalse          
 9296: rfalse          
 8f1a: call_1s          11e6 -> g0
 8f31: catch            -> g14
 8f33: call_2n          1330 g2
 9981: jl               g3 g4 ?9990
 9985: sub              g4 #0e -> g0
 9989: jl               g0 g9 ?~9999
 9999: storew           g0 #00 g4
 999e: storew           g0 #01 local0
 99a3: storew           g0 #02 g3
 99a8: storew           g0 #03 g2
 99ad: storew           g0 #04 g7
 99b2: storew           g0 #05 g9
 99b7: storew           g0 #06 g5
 99bc: store            #14 g0
 99bf: rfalse          
 8f38: store            #1b g4
 8f3b: call_2n          11ea g41
 8f51: catch            -> g13
 8f53: call_1s          local0 -> local0
 a669: call_2n          12e9 #08
 9749: jl               g3 g4 ?9758
 974d: sub              g4 local0 -> g0
 9751: jl               g0 g9 ?~9763
 9763: storew           g0 #00 g3
 9768: storew           g0 #01 g2
 976d: storew           g0 #02 g5
 9772: store            #13 g0
 9775: rfalse          
 a66e: call_2n          1253 #00
 9299: jz               g17 ?~rfalse
 929c: jz               g21 ?~rfalse
 929f: set_text_style   #00
 92a2: set_text_style   local0
 92a5: rfalse          
 a673: call_2n          g38 46c5
 8f69: jg               g21 #01 ?rfalse
 8f6d: call_1n          1215
 90a9: jz               g1 ?90b0
 90ac: je               g1 #02 ?~rfalse
 8f70: call_2n          11f6 local0
 8fb1: jz               g16 ?~8fb7
 8fb4: print_paddr      local0
 8fb6: rfalse          
 8f75: store            #11 #00
 8f78: rfalse          
 a678: store            #12 2ec8
 a67d: store            #15 g4
 a680: ret              2e78
 8f56: jz               local0 ?~8f53
 8f53: call_1s          local0 -> local0
173c1: call_2s          1348 g40 -> g20
 9a41: jl               local0 g33 ?~9a58
 9a58: test             local0 g34 ?~9a64
 9a64: ret              local0
173c7: je               g20 4001 ?1743a
1743a: call_2n          g38 476c
 8f69: jg               g21 #01 ?rfalse
 8f6d: call_1n          1215
 90a9: jz               g1 ?90b0
 90b0: store            #11 #03
 90b3: jg               g21 #01 ?rfalse
 90b7: print             
 90ba: rfalse          
 8f70: call_2n          11f6 local0
 8fb1: jz               g16 ?~8fb7
 8fb4: print_paddr      local0
 8fb6: rfalse          
 8f75: store            #11 #00
 8f78: rfalse          
1743f: store            #14 g5
17442: ret              g2
 8f56: jz               local0 ?~8f53
 8f53: call_1s          local0 -> local0
17641: call_2n          g38 44d2
 8f69: jg               g21 #01 ?rfalse
 8f6d: call_1n          1215
 90a9: jz               g1 ?90b0
 90b0: store            #11 #03
 90b3: jg               g21 #01 ?rfalse
 90b7: print             
 90ba: rfalse          
 8f70: call_2n          11f6 local0
 8fb1: jz               g16 ?~8fb7
 8fb4: print_paddr      local0
 8fb6: rfalse          
 8f75: store            #11 #00
 8f78: rfalse          
17646: call_1n          1204
 9021: jz               g17 ?~rfalse
 9024: jg               g1 #03 ?rfalse
 9028: jz               g21 ?~9030
 902b: new_line        
 902c: store            #11 #04
 902f: rfalse          
17649: storew           g3 #03 g4
1764e: call_2n          1330 2ecf
 9981: jl               g3 g4 ?9990
 9990: sub              g3 #0e -> g0
 9994: jl               g0 g9 ?998d
 9999: storew           g0 #00 g4
 999e: storew           g0 #01 local0
 99a3: storew           g0 #02 g3
 99a8: storew           g0 #03 g2
 99ad: storew           g0 #04 g7
 99b2: storew           g0 #05 g9
 99b7: storew           g0 #06 g5
 99bc: store            #14 g0
 99bf: rfalse          
17654: store            #12 2ecc
17659: store            #15 g4
1765c: ret              14a0
 8f56: jz               local0 ?~8f53
 8f53: call_1s          local0 -> local0
 a501: test             g69 #01 ?~rfalse
 a505: restore_undo     -> g0

The log format is pretty condensed, but it’s roughly <memory address>: <opcode name> <operands> (-> <stores>)? (?<branch>)?

You may notice that the initial section repeats quite a bit. The actual log has several hundred repetitions. Not sure what’s up with that. Maybe it’s indicative?

Draconis · August 23, 2024, 10:17pm

If you add Dialog tracing to this, I wonder if we can find what happens after the last query is made?

I’m not sure. I recognize some of these routines from the Dialog runtime, but I don’t know what they do, because the only documentation for the runtime is the routine names and parameter lists.

jwalrus · August 24, 2024, 4:54am

What I was really hoping to track down is where this run starts to differ from one that doesn’t produce the error. I’m not sure when I will have time to try it, but my plan was to start with two non-bugged words of the same length, compare their traces somehow to check what varies only because of a different word being used, then compare again to one of the bugged words and see where the trace starts to diverge. This would probably need some ad-hoc tooling at the least, of course.

(Does TXD work on Dialog-produced z-code, by the way? I think I had gathered that it was supposed to, but I got an error the one time I tried.)

heasm66 · August 24, 2024, 9:04am

TXD should work with Dialog but it is missing some of the new opcodes in z-machine standard 1.1 around unicode-characters. See this thread for more info. I’ve made an updated version on my GitHub (link in thread).

EDIT: Also, TXD don’t know Dialog grammar format so you have to run it with the switch to skip the decoding of grammar.

Draconis · August 24, 2024, 7:00pm

Since the Z-machine dictionary is stored sorted alphabetically, my current best guess is that Dialog using some kind of recursion to search through it, which eventually exhausts the heap. That would explain why it’s specifically words at the end of the alphabet causing the problem.

Draconis · August 26, 2024, 11:20pm

I’m no closer to a solution for this, but in case it helps anyone, here’s the routine R_COLLECT_MATCH_ALL which is called at the end of a (determine object) construction. I’ve used approximately Inform 6 syntax since that’s what I’m used to, but with a couple modifications (e.g. indicating explicitly which local variables are parameters and which aren’t).

[ R_COLLECT_MATCH_ALL input / current keywords oldtop iter word tmp ;
	oldtop = top; ! `top` is a global that holds the top of the heap (grows upwards)
	R_COLLECT_END(keywords);
	
l2:
	input = R_DEREF(input);
	tmp = input & $E000;
	if(tmp != $C000) jump l3;
	
	current = input - $4000; ! ref to head element
	input = current + 1; ! ref to tail element
	current = R_DEREF(current);
	
	if(keywords == nil) jump l1;
	iter = keywords;

l4:
	iter = iter & nil;
	word = memory-->iter;
	iter = memory-->(iter+1);
	
	tmp = R_WOULD_UNIFY(word, current);
	if(tmp) jump l2;
	if(iter ~= nil) jump l4;

l1:
	@throw fail; ! `fail` is a global holding a throw destination for a predicate failing

l3:
	top = oldtop;
	if(input ~= nil) jump l1;
	
	rfalse;
];