Quixe TypedArray feature?

zarf · March 13, 2012, 7:10pm

So I spent a few hours modifying Quixe to use the TypedArray feature (a recent addition to Javascript). The idea is that instead of storing main memory as a Javascript array of Javascript numbers (which are really doubles, ferchrissake) we store it as a true byte array (Uint8Array).

I figured that this would be faster. However, I’m getting mixed results. For Glulxercise (the VM unit test), it runs about 9% faster (Firefox and Safari). However, Reliques of Tolti-Aph (our favorite I7-generated torture test) is about 9% slower on Safari. For Firefox, RoTA is about 5% faster when compiling code – it’s more or less a wash when executing cached code.

(I have not yet tested this with iOS Safari (mobile WebKit) – I really should, since that’s the real-life low-end platform.)

I guess the diagnosis here is that making main memory a byte array is faster when you’re running code that accesses main memory a lot. (Glulxercise does this, because it exercises every VM opcode on main memory, local variables, and the stack.) However, that’s not normal game code; the I7/I6 compiler is good about keeping most accesses in locals and the stack. So in real life, the gain is outweighed by overhead.

Anybody have experience with this?

Thinking out loud here:

Byte arrays are more compact, I assume. Is the speed hit worth the reduced memory profile?
I guess I could make the stack and locals into Uint32Arrays. That would be more work. (My existing stack code relies heavily on the array.push() method. Uint32Array is fixed-size and has no push method.)
Crap, I didn’t array-ize the save-undo routine. I hope the overhead cost isn’t all there! (Quick test) Nope.

Dannii · March 14, 2012, 12:43am

This is why I want to collect a whole bunch of slow code for a performance test suite using my Benchmarking extension… will be a better torture test than Reliques.

You’ve removed the number casting code?

Although there’s a lot that happens off the main memory, the main memory still needs to be accessed a lot. Maybe the slow down is that 8bit access is rare? Have you tried using a DataView instead? (Problem: not in Firefox yet.) Another thing to try is using a Uint32Array, like Nitfol does.

Lastly, changing to a fixed size stack could be a possibility. You can’t push on to it, but you could copy the values to an even large one if that was required.

Also, if there really is an overhead I’d consider that a browser bug. If we can pin down the slow test cases we can report them - I’ve done it before.

zarf · March 14, 2012, 6:41pm

I removed the “& 0xFF” from all the main-memory writes, if that’s what you’re asking. Computations still have a lot of “>>> 0” scattered around them.

8-bit access is most common during compilation, actually.

The problem with using a uint32 array for main memory is that the architecture doesn’t distinguish between aligned and unaligned writes. So you wind up needing an alignment test on every read and write. This is not necessarily a disaster, mind you.