If we malloc a memory range on glulx, can we dynamically check its extent?

Zed · April 24, 2024, 1:51am

Is it possible to read glulx’s memory map from within a program running on glulx? I’d like to know malloc-ed things’ extent without maintaining my own separate copy.

Draconis · April 24, 2024, 2:55am

You can use @getmemsize to see the total extent of memory, and compare it to the ENDMEM value stored in the header to see how much of it has been mallocked instead of being there from the beginning, but you can’t see the internal structure of the heap unless you keep your own records.

For example, it’s possible that you malloc 32 bytes, but the interpreter chooses to expand memory by 256 bytes; it gives you a pointer to the start of that space, and it’s your responsibility not to write past the first 32 bytes. If you write a 33rd byte into that space, you might be overwriting nothing, or you might be overwriting something else you mallocked later.

(Side note: unlike in C, it is legal to do that. The state of the heap is explicitly stored outside the heap itself, so you can legally write anywhere within the extent specified by @getmemsize without wrecking anything the interpreter cares about. You just run the risk of ruining your own data when you do that.)

It is legitimate to read or write any memory address in the heap range (from ENDMEM to the end of the memory map). You are not restricted to extant blocks. [The VM’s heap state is not stored in its own memory map. So, unlike the familiar C heap, you cannot damage it by writing outside valid blocks.]

Hanna · April 24, 2024, 7:08am

It’s possible to save the game state into an a file or memory stream, then parse the Quetzal to get the MAll chunk (which contains start address and size of every allocated heap block), and search that chunk for the block you’re interested in. Of course, this is not very practical, at least not if you expect to do it frequently.

Since the interpreter has to maintain this information anyway (for save/restore, if nothing else) it would be possible to add an opcode / gestalt for directly querying the size of a block, but nothing like that exists today.

Dannii · April 24, 2024, 9:21am

Ah, the tried and true method of Glulx introspection! That’s how I found the locals count in the Data Structures Closures kind.

What are you trying to do with it? This is something that could make sense for a Glulx extension, if it would be a lot of use. (With the Quetzel method for a fallback.)

Actually with the search opcodes it wouldn’t even be very slow! New opcodes might not even be worth bothering with.

Zed · April 24, 2024, 9:42am

I was playing around with string-handling. Besides current length I want maximum extent so I can avoid unnecessary realloc-ing when modifying one. But really it’s more of a practice run to get to know dynamic memory handling in glulx.

Hanna · April 24, 2024, 10:49am

Finding the MAll chunk is cheap, and a single @linearsearch over it isn’t too bad either, as long as the number of blocks is in the hundreds or thousands. But generating the Quetzal blob in the first place is never cheap, and its cost scales badly (at least linear) with the total size of RAM and the number of extant heap blocks. Thus, if this operation is used somewhat frequently, it could easily grind any game above a certain size/complexity threshold (could be as low as a few MB of RAM) to a halt. IE-0008 (replacing Flex with @malloc and @mfree on Glulx) will compound the problem, since it’ll greatly increase the number of Glulx heap blocks.

Dannii · April 24, 2024, 10:59am

That sounds right. If it’s the sort of thing you do ~1 times a turn it would be fine. But automatically getting the length of lots of short life allocations would be bad.

@Zed can you overwrite the allocating function to record the length?

Zed · April 24, 2024, 11:01am

yeah, that’d work. I suppose I’d need a binary tree (or relative thereof) for reasonably efficient lookup and storage. Hmm. That’s sounding like more effort than my original notion of just having 2 words of metadata per string.

Hanna · April 24, 2024, 12:34pm

Removing the need to think about these trade-offs would be another reason for a new Glux opcode/gestalt. I’ve spent a lot of time thinking about efficient and compact terp-side data structures for managing the Glulx heap - it’s not a trivial problem. There’s no reason to duplicate that into Glulx code if terps have to tackle the same problem (with slightly different constraints) anyway.

However, I do suddenly feel an urge to implement a B-tree specialised for integer keys and values in I6/Glulx… could be an interesting exercise.

Dannii · April 24, 2024, 1:52pm

I assume you’re not using I7 block values? Because they store their own length.

And likewise, if you’re using your own system, then it might be better to just store a header in the block, rather than having a separate map?

Zed · April 24, 2024, 2:00pm

Right.

yeah, that’s what I was concluding above… two words of metadata, one for current length, one for extent. (When you suggested doing it in the allocate function, my mind went to maintaining a memory map, though that wasn’t really an inevitable conclusion outside of my mind…)

Hanna · April 24, 2024, 2:00pm

One minor benefit of a separate map is that it’s easier to just skip that map when running under a hypothetical future terp that gives access to heap block sizes (without requiring all terps to support it). If it’s a field in the per-block header, you either waste space by always having that field but not using it when it’s unnecessary, or you make the header size dynamic and complicate a bunch of code to cope with that.

Dannii · April 24, 2024, 2:04pm

You’d still need a header for the current usage. The cost of tracking the capacity would be basically negligible.

Zed · April 24, 2024, 2:13pm

I was trying to work out if there was any point to trying to use actual glulx string objects. But if there’s any advantage to them outside of the context of distinguishing 'em from functions while decoding compressed strings, I couldn’t see it.

Is that true, or are there other use cases?

Dannii · April 24, 2024, 2:21pm

Oohh. Yeah I’m not sure. I don’t think I’ve ever really used the Glulx object types, but that doesn’t mean there wouldn’t be a more practical use for them.

Hanna · April 24, 2024, 3:50pm

For strings in particular, the only things (as far as I know) that depend on the E0/E1/E2 byte are the streamstr opcode, indirect references encountered during string decoding, passing arguments to certain Glk functions, and arguably the Z__Region veneer routine / accelfunc (and everything that builds on it, e.g., I6 ofclass). None of these seems terribly relevant:

You probably don’t want I6 to confuse your string type with its own built-in notion of strings, that will only lead to bugs.
Indirect references in encoded strings are very niche, but if you want to use them, you can also reference a function that prints the string (which could be passed in as argument, there’s a string decoding node type for that).
Most Glk functions that want Glulx strings can be substituted with equivalents that accept arrays (e.g., glk_put_string → glk_put_buffer).
In the unlikely case that you want to call a Glk function that doesn’t have an array equivalent (e.g., glk_fileref_create_by_name), you can still copy the string data to a temporary buffer that has the type byte and terminating 0, do the call with that as argument, and then release the buffer.

So I’d suggest ignoring the type byte business. As the spec says,

Of course, not every byte in memory is the start of the legitimate object. It is the program’s responsibility to keep track of which values validly refer to typable objects.

… so the type byte convention is at best a gentle suggestion, except where enforced by Glulx opcodes.

zarf · April 24, 2024, 4:06pm

For what it’s worth, it still feels to me like adding additional opcodes is more complexity than it’s worth. Keeping an in-memory record of what’s going on is some effort, but not really any harder than introspecting the VM’s record.

And I suspect you don’t want to be tied to VM allocations anyway. Chances are good that you’ll wind up wanting to allocate some “arenas” (big blocks that you subdivide for a particular purpose, i.e. a large number of fixed-size blocks).

HanonO · April 24, 2024, 4:32pm

If we malloc a memory range on glulx…

Unrelated, but when I saw this topic I immediately thought it was an excerpt from The Gostak.

Zed · April 24, 2024, 5:24pm

Now that you quoted that bit in isolation, it made me think of If You Give a Moose a Muffin.

Good point, I hadn’t thought ahead that far.

Draconis · April 24, 2024, 6:17pm

Seems we’re all forgetting about Leviticus 18:21, which specifically says not to trust malloc…