What can and should we expect from the Glulx heap?

Hanna · April 25, 2024, 6:20pm

I’m not sure what exactly you’re looking for. Part of why I opened this topic is that I’m worried about custom allocator shenanigans becoming entrenched and remaining even if/when they are (no longer) solving an actual issue. However, all signs point to them being useful today, if your heap is going to contain more than a few thousand objects.

There’s been a couple threads over the years where the Flex layer was (partially) blamed for slow performance. At one point, @eu1 did some template hackery to try and estimate the effect of using @malloc/@mfree instead. Results were… mixed, though there’s several confounding factors (e.g., an optimization in BlkSize that is entirely unrelated to using @malloc) and the benchmark is very artificial. One interesting observation is that the terps (then and now, the relevant code hasn’t changed in forever) slow down considerably as the heap fills up with more blocks, though the Flex layer does even worse.

This is easy to reproduce, if you aren’t opposed to artificial micro-benchmarks. A trivial I6 program that just allocates N = 10k blocks in a loop and then exits takes ca. 100 ms to run on my machine with the Git and Glulxe builds I have lying around, ca. 400ms for N = 20k, and ca. 900ms for N = 30k. I swear I didn’t fudge these numbers, it’s just a textbook case of doing N² work (going through n extant blocks before finding a free block to use for the the n^th malloc).

At least the simple linear-list-of-blocks approach used by current terps can slow down to the point of being completely unusable (see above). Beating that with a pure-Glulx allocator isn’t hard, even if it’s a very conventional general-purpose allocator – it’s just Flex specifically that’s even worse. Using intrusive linked lists and inline object headers, the common path for (de-)allocation can be less than ten Glulx instructions, which is probably pretty competitive even with a smarter terp.

Better algorithms are definitely possible, but they’re constrained or complicated by saved game states. To keep with the alignment example: if you start with an empty heap and do @malloc 1; @malloc 4; then all currently existing terps will put first allocation at the heap start and the second one right after, and they have to remain there until they’re freed. So whatever a smater terp does, it can’t simply assume that all block sizes or block addresses have any particular alignment. It may ensure that for its own allocations but it has to be able to deal with exceptions somehow. I’ve come to the conclusion that it’s better to just choose implementation strategies that don’t need such assumptions in the first place.

It’s true that many programs mostly allocate objects of a few distinct sizes. This, together with other trends in program behavior, make it feasible to have fast and good (with respect to fragmentation) memory allocators in the first place. The question is what you mean by “naive implementation”. Currently, all interpreters implement the address-ordered first fit policy, which actually seems pretty great at limiting fragmentation (see The Memory Fragmentation Problem: Solved? by Johnstone and Wilson. The specific data structure they use, however, scales very badly as the heaps get larger. On the other hand, going all-in on the assumption that most objects have the same fixed size may lead you to completely separate free lists per size class or object type. This also gives a simple implementation, and it’s easy to make (de-)allocation very fast, but the simple ways of doing it can easily lead to enormous fragmentation (again see Johnstone and Wilson).