Glulx to Webassembly

tsanderdev · August 30, 2024, 10:51pm

I don’t know if there is such a thing already, but a Z machine or Glulx to webassembly compiler should be pretty doable. That would give very good performance on the web.

Dannii · August 30, 2024, 11:08pm

The Z-Machine/Glulx formats don’t clearly mark the beginning and end of functions, meaning that it’s not possible to reliably ahead-of-time decompile it, which is what you’d need to then compile it into a wasm. You can really only safely do so if you have the Inform debug file, which you have to know about to switch on.

So instead of ahead-of-time interpreters, the JS interpreters are just-in-time - which mean they compile a Glulx function only when it actually gets called. They also then get to take advantage of the massive JS engine performance improvements of the past decade, which almost make up for still being JS.

There are two things that would improve interpreter performance even more (which are my long term goals): implementing a relooper algorithm, to turn the VM’s branch and jump statements into true JS loops, and a function safety analysis system that would in effect make the majority of functions accelerated, rather than only the dozen or so that are listed in the Glulx spec.

Inform 7’s C output mode could however be fairly easily compiled into a wasm (with RemGlk or RemGlk-rs).

jkj_yuio · August 31, 2024, 12:30am

This is what i was thinking. emit C and compile to WASM.

zarf · August 31, 2024, 3:33pm

I’m glad you mentioned that. I was about to post “WASM? Should be no harder than (JIT-) compiling to JS, which we already do!” But I forgot about the ahead-of-time vs just-in-time distinction.

I’m all in favor, obviously.

The current I7-to-C stage generates pretty verbose C, as I understand it. Adding a I7-to-WASM stage might be doable.

(Note that we’re not in desperate need of performance improvements for the web interpreter. This is a tangent to the original discussion.)

Hanna · August 31, 2024, 4:15pm

I don’t want to take the tangent much further, but this is not actually a problem. JIT compilation to wasm is not fundamentally harder than JIT compilation to JS, it just needs a bit more plumbing (lower-level target, binary encoding vs. source code). See just-in-time code generation within webassembly — wingolog for an overview and small demo. Less demo-y examples include WAForth and v86.

Mewtamer · August 31, 2024, 4:32pm

I have no idea what wasm is, but I was under the impression most play in browser IF is just using a web-based interpreter rather than being actual browser games. Though, I for one prefer running an offline interpreter in the Linux console and hope downloadable, self contained story files don’t go anywhere any time soon even if we eventually end up with something that is to glulxe what glulxe is to zmachine.

Though, the mention of Inform outputting to C reminds me of my curiosity if there’s a way to compile C++ to zcode, glulxe, etc, or at least combine existing C++ code with one of the IF-specific languages(aside from C++ being the programming language I’m most comfortable with, I have an old pet project I’d like to dust off that has functional backend code and broken interface code(an attempt to switch from a terminal app using cout and cin for the interface to using ncurses did not go well), so it would be nice to combine the existing backend with a parser based interface without having to rewrite everything from scratch.

tsanderdev · August 31, 2024, 5:45pm

Webassembly is basically a virtual CPU architecture that serves as a sort of common denominator for CPUs. It’s designed to get compiled down to target-specific instructions, while maintaining the sandboxing present in the browser. E.g. LLVM IR allows the code to just access memory randomly, Webassembly operates on it’s own block of memory and can’t access outside it (if the compiler to the target code has done a good job securing it). The goal is to get native performance while running untrusted code.

I’ll have to agree with @inventor200’s (deleted for some reason?) comment: You can absolutely go overboard with your world model, calculations and other stuff, such that performance does become a problem. But you’re right, most games probably don’t really need it.

Believe it or not, there are Webassembly runtimes targeting embedded machines, and of course a wasm2c converter. So unless the retro hardware doesn’t support C, it’d be just as portable as Z code.

So does the code just jump to addresses and and executes until it hits a return or something? You could analyze the jump instructions to get the functions, as long as the destination isn’t computed at runtime. Otherwise, yeah, you’d need debug info or something to properly do it. A build tool could probably take the work of compiling Inform code with the debug file and turn that into a wasm file without the user having to configure something, right?

A quick google search revealed there is a Glulx backend for LLVM made by someone, which can probably be compiled and plugged into Clang to compile C++.

I had a brief look at the Glk interface, and I’d be surprised if there wasn’t a C library for it you can use. Seems like pretty straightforward definitions for me.

Maybe this’d be worth splitting into a separate thread?

inventor200 · August 31, 2024, 9:11pm

Sorry, after I posted that, it felt like I kicked the door down and started bragging and dragged another user into it, and I didn’t mean for it to sound like that.

I guess for the context of future explorers, my post was a “hold my beer” joke, made in response to the trend of IF not typically reaching complexities where performance becomes an issue. @jbg and I have frequently traded notes to keep TADS from grinding to a halt or wrecking RAM when performing massive amounts of calculations.

Dannii · August 31, 2024, 11:09pm

Glulx does allow dynamic branches as well as branches into other functions, but those are thankfully not used in practice. So you can generally find the end of a function. Finding the beginning is another challenge. If it was compiled by Inform then all the functions will be next to each other, but if another compiler was used then the functions and texts and data could all be interspersed with each other and you couldn’t tell for sure which ones were functions. With a debug file you can do it safely, but you do still need a relooper. This all means that you can convert it to wasm, but it’s an alternative way of publishing, not an alternative approach for a general interpreter. Inform’s C output is just simpler as you wouldn’t need a relooper.

(I think this topic should be split, but where from…?)

Hanna · September 1, 2024, 12:38pm

One thorny issue is save/restore/undo. It’s straightforward for an interpreter or for a Quixe-style JIT compiler that is designed with it in mind, but the Inform C backend does not meaningfully support it. The host program embedding the game can take a “snapshot” of the current Inform state, but that only covers memory, global variables, etc. and not “what code is currently running and needs to be resumed when this snapshot is restored”. Thus, game-managed save/restore/undo won’t work at all. The usual way of doing interpreter-managed autosaves (hooking glk_select calls) doesn’t work either.

I can see several ways around this, but none of them are exactly simple. The path of least resistance is probably using emscripten’s asyncify feature (with all its downsides) to ensure that no Inform/wasm frames are “on the call stack” whenever you need to take a snapshot. However, I’m not sure if emscripten actually supports suspending at one program point and then resuming at an entirely different program point. The underlying “asyncify transform” should enable that sort of thing in theory, but the other layers on top of it may not play nice with it. There’s bound to be reentrancy issues, too.

In any case, the end result won’t be interoperable with saved games in the Glulx Quetzal format, the one that all Glulx interpreters support. For that, you really need to wrangle the Glulx stack and program counter in its full glory (like Quixe does, despite not technically being an “interpreter”).

In light of the above, I wonder whether “needing a relooper” would really be such a bad thing. Norman Ramsey’s “Beyond Relooper” approach is rather elegant. While it relies on some textbook compiler ingredients (control flow graph, dominance, RPO numbering, eliminating irreducible control flow), implementing those doesn’t sound so bad compared to all the other work that would be needed for a serious Glulx-to-wasm JIT compiler. In fact, if I was writing one, I’d definitely want most of those ingredients anyway to do various optimizations before emitting wasm.

Dannii · September 1, 2024, 1:10pm

Oh, I wasn’t aware that Inform’s C output didn’t model a Glulx stack at all. That does limit its usefulness.

The Beyond Relooper paper is on my reading list. Hopefully its approach will be a little simpler for me to make sense of.

Mike_G · September 1, 2024, 2:35pm

I’m no expert, but I do believe these exist.

Z-code is like this. There is no end-of-routine marker. It could hit a quit, a throw, one of several return or jump instructions, or any number of branching instructions.

I don’t know Glulx. I pretty much just stick to Z-machine, but it makes me nervous when people say we can assume things because “this never happens”. It always triggers an impulse in me to go write some code that does precisely the thing that never happens.

Again, I can’t speak for Glulx and I know it can be much larger than Z-machine, but is all the work of recompiling Z-code even necessary for speed? I’ve written a straight-up Z-machine interpreter for web assembly and it pretty well screams even on not-so-great hardware.

Thinking on this some more: It’d be cool if there were Z-code files designed to be used for benchmarking. Something that exercised the stack, object op-codes, etc. Perhaps separate files for specific tests and maybe a more comprehensive combined one. Something that could run to completion without player input would be best. I know there are test files designed to test functionality and correctness for interpreters, but none I know of designed solely for performance testing.

zarf · September 1, 2024, 2:53pm

What Dannii means here is “You have to handle that case, but you don’t have to handle it efficiently.” :)

In the past, when I’ve wanted to do performance measuring for Z-code, I’ve loaded up Reliques of Tolti-Aph. It does a lot of setup before the first turn, so the startup time is a half-decent benchmark.

(For Glulx, I just load up glulxercise.ulx and type “all”.)

Mike_G · September 1, 2024, 2:55pm

Ah!

Good to know, thanks. I think I might work on some benchmarking files for Z-code.

zarf · September 1, 2024, 9:25pm

I took a look at that blog post. It seems like this is still pretty hacky territory for wasm.

Hanna · September 1, 2024, 9:55pm

More so than the way Quixe generates Javascript source code and gives it access to its internal helper functions? This is not intended as a gotcha, it’s entirely possible I just spent too much time in wasm land and this whole thing looks crazypants to normal people.

Dannii · September 1, 2024, 10:44pm

Doing a page at a time makes sense for an x86 emulator, but not as much for a Glulx one. Functions aren’t guaranteed to not cross any page-like boundaries. But what you could do is batch functions and then compile 100 or so of them together. I’m not sure it would make that much difference than just using carefully written asm.js style Javascript.

Another issue is that the JS-WASM boundary is still quite slow. If other functions are still in JS (whether that’s VM functions or runtime functions), the boundary crossing penalty might more than offset any benefit that WASM gave in the first place.

tsanderdev · September 3, 2024, 9:13am

Are there raw computational benchmarks for Glulx or Z machine interpreters? Something like fibonacci numbers, n-body simulation or something like that? How much slower is interpreting than native code even?

zarf · September 3, 2024, 1:59pm

That’s what Mike is considering in this thread: Benchmarking Z-code

As Hanna points out, the mix of opcodes in a “realistic” game load isn’t easy to nail down. A traditional FLOPS benchmark isn’t going to tell you much; Glulx’s handling of floats and doubles is inefficient, but most games do very little of that kind of math.

Mike_G · September 3, 2024, 2:58pm

It’s clear nothing will fully substitute for testing how actual games run, but to be useful you need to have a good idea of which games are representative of ‘worst case’ behavior.

I do think testing individual opcodes or sets of opcodes would give benchmarks that interpreter authors could use to detect regressions in performance as they make changes and perhaps aid in coarse identification of performance bottlenecks.