# Floating-point math in Glulx

We’ve wanted floating-point numbers in Glulx for a while now. I7 can handle fixed-point numbers, if you set up the units and the number of decimal places that you want. But floats are more familiar in the programming world.

Floats are a moderate nuisance for Glulx, because all of its variables and data handling are for 32-bit integers. So we need a way to store a float in an integer, which, conveniently, has been defined for decades now – it’s called IEEE-754. We also need some code to convert from native float values to this standardized form. After letting the idea gather dust, I mean percolate, for the past several years, I finally went Googling and found that code.

Thus, I can finally move forward with the plan, which is to add a passel of floating-point opcodes to the Glulx spec.

numtof – convert an int to a float of equal value (or as close as possible)
ftonum – convert a float to the nearest integer

As you have gathered already, integer values do not match float values. You can’t feed floats to the regular arithmetic opcodes and expect to get meaningful answers. You’ll have to convert back and forth, using ftonum and numtof.

(Or allow the language to do it for you. I7’s type system will eventually encompass floats and do all the conversion for you magically.)

Clearly we need a bunch of float-specific opcodes, and here they are:

fadd, fsub, fmul, fdiv, fmod – ordinary arithmetic
ceil, floor – round up or down, to the nearest integral value (but return that as a float; this does not convert to int format)
sqrt, exp, pow, log – old math warhorses
sin, cos, tan, acos, asin, atan, atan2 – trigonometry
jflt, jflte, jfgt, jfgte – jump if less than (etc)
jfeq – jump if equal to. This takes three arguments (X, Y, epsilon) and tests whether X and Y are within epsilon of each other. I suspect this will be more useful than straight-up equality, but you can always pass epsilon=0.0.
jfne – jump if not equal to; same gimmick.
jisnan – jump if the value is Not A Number. NaN is a magical value that results from certain illegal operations, like 0/0.
jisinf – jump if the value is infinity (or negative infinity). Yes, the floating-point spec lets you compute with infinities. 1/0 is infinity, -1/0 is negative infinity. Roll with it.

Opcodes that I’m not including:

fneg – it happens to be true that the sign bit of float values is 0x80000000, just like for integers. So the regular neg opcode will work here.
fabs – similarly, you can get the absolute value by squashing the top bit.
sinh, cosh, tanh – does anybody care? Also, Javascript’s math library doesn’t have them.

Opcodes that I’m not sure about, feel free to make an argument for or against them:

fhypot (pythagorean distance) – easy enough to synthesize, but maybe it’s common enough to use
jfz – jump if (exactly) zero. This would be identical to jz except for NaN values and negative zero. Yeah, there’s a negative zero. Anyhow, you can use jfeq for this, but there’s an integer jz so why not this one?
isfinite, isnormal – these exist in the C library so maybe somebody wants them.
log10 – ditto.
streamfloat – print a float value. I don’t much want to do this because I’d have to define the formatting in a stable way (across libc and Javascript), and is that even possible? But leaving it out sucks too. Should there be some format-control arguments?

Other notes:

The low-level math libraries on some computers can return NaN values with extra information encoded in them. IEEE-754 can represent these, but Glulx will not. (Because, first, there’s no C standard for getting at them. And second, Javascript can’t get at them at all.)

Float math will be distinctly slower than int math. This is not because of the low-level operations (which are trivial), but because the interpreter has to keep converting back and forth between IEEE-encoded floats and native floats. A C interpreter can usually do this fast (because most computers use IEEE-encoded floats natively) but Javascript has to go through a whole bit-fiddling process. And of course Javascript is where the speed concerns are.

I have not yet done enough testing to know whether float math will produce bit-identical results between Javascript and C interpreters. I suspect it won’t.

When will I get to this? Not before Quixe is out the door, that’s for sure. The implementation work is not too bad, but there will have to be a cyclopean pile of unit tests before I trust it.

I’ll post a draft Glulx spec update in the next few weeks, I hope, but it won’t be definitive until I have the code all working and tested.

Any comments on any of this?

Essentially, shouldn’t IEEE-encoding only be done on load/save/restore? What about at other times keeping them as native floats? The VM numbers could be either a form of pointer, or the VM could keep a cache of native floats (if you refer to an address which hasn’t been decoded yet it would decode it then, otherwise it would refer to the cache. There could also be @decodefloat addr, @decodefloats min max, @writefloat addr and @writefloats min max, which would be run on load/save/restore. Accessing the original data without writing from the cache would be strongly discouraged, perhaps even with a compiler warning?)

I think JS numbers are IEEE-754 Doubles, so hopefully that means they’ll work fine for Single maths. You’ll have to check for out-of-bounds stuff etc.

Presumably people will want to use floats to say “You have \$78.62 in your wallet, broken 2.3 hearts today but are still 100095.3774 lightyears from home.”
You have @ftonum, but how about some opcode to give you the fraction to x decimal places. Then you could go print @ftonum number, “.”, @ftonumfrac number 3; to give you “123.456”.
I think this would be fairly common, but sufficiently complex to do as if.
(What would you do? Take the mod of 1, multiple by 10/100/1000/10000 etc, then round it?)

Hi Zarf,

that does not sound too hard for the JVM either, floats are IEEE-754 32 and the standard library provides enough means for conversion from and to 32 bit int values.

Interestingly, a while ago I was thinking, if Glulx had floating point support, one could probably write a nice Scheme compiler for it (not that you’d necessary need it in order to write one)

Wei-ju

One correction to my post: you don’t negate floats with neg, you negate them with xor \$80000000.

That would require putting type-checking code on every opcode, including the integer ones. (E.g., @add would have to check and encode any float back to int before adding, because while the result is going to be meaningless, it’s well-defined and deterministic meaninglessness. May sound like a quibble, but bit arithmetic is meaningful on floats, as I said above.) Do you think that would be worth it? My gut says no, and while it’s worth testing someday, it’s on the pain-in-the-ass side.

(Maybe less ugly if combined with the store-all-memory-as-words refactoring?)

Giving up on main memory for the moment: Quixe could track local-variable and stack-position use, and prove that some values are used only as floats. That should be worthwhile, but it’s not currently that smart. I would leave that for future work.

Yeah. It’s the conversion I’m most worried about. My notes are full of lines like

(a & 0x7fffff | 0x800000) * 1.0 / Math.pow(2,23) * Math.pow(2, ((a>>23 & 0xff) - 127))

and who knows whether the rounding will come out the same. (Answer: I will know, after I get the tests written.)

fmod 0.001 will do this. Well, it will if I upgrade it to return both the modulo and remainder, which I think I want to.

Actually ftonum rounds towards the nearest integer… You know, I want both round-towards-zero and round-towards-nearest, so make that ftonumz and ftonumr.

Are you saying that even though it would be meaningless to use @add on a float it must still use the live data, not stale data? I guess that would might be wise for consistency’s stake.

Okay so the cache idea sucks, but what about still storing them outside the main memory map? There would have to be a block of memory for them to be initialised from and saved to, but there would be no pretence that that block of memory is in any way reflective of the current state of the floats, no more than it would be in a program that periodically uses @mcopy. If you wanted the data you would have to specifically request it immediately prior to when you want it.

Or would making floats be objects be too expensive?
What about a new memory segment? Would make type checks considerably cheaper.

All of these options would probably mean that bit opcodes would need special float versions too. But in a way, opcodes are cheap.

That would only turn 111.1111 to 111.111 right? How will you turn that into something you can print? Binary fractions aren’t the friendliest things. *1000, convert to int, convert to string, insert the decimal point manually? Ick. Maybe a very simple formatting function would be good.

It’s not meaningless: there’s a famous inverse square root algorithm that depends on treating floats as integers.

This sounds way too complicated for something that’s really just a platform-specific optimization: the slow conversion between IEEE and native floats is a quirk of today’s JavaScript implementations, one that may even be fixed someday by improvements in browser engines or additions to the language. Native Glulx interpretations should not be burdened because of that.

I think it should be dealt with in the interpreter, not the spec. For example, you could use a sparse array to cache floating-point versions of whichever memory locations have been used as floats, trapping accesses to those locations to invalidate the cache, and that would be transparent to the game.

fmod(3.14159, 0.01) => (314.0, 0.00159)

Or, better for printing decimals:

fmod(3.14159, 1.0) => (3.0, 0.14159)
fmod(0.14159, 0.01) => (14.0, 0.00159)

Convert the 3 and the 14 to integers and print. (Add a few lines to deal correctly with negative numbers and 0-padding on the right of the decimal point.)

I agree with this.

IEEE encoding will be slow everywhere.

I agree now though it should be a terp issue. Speed probably won’t be a big issue for most uses of floats anyway.

Not really. It’s supported in hardware on many CPUs, including x86 and PowerPC.

I did not know that. Cool, I guess.

Unfortunately the CLR doesn’t seem to provide an efficient (i.e., not requiring memory allocation) way of converting between floating point numbers and their bit representations. So I’ll use that crazy conversion formula you posted.

I was thinking you could require locals to be marked as floating point values to allow efficient use while inside functions. But this adds complication to an interpreter that otherwise doesn’t care, and, as you point out, a JIT could infer this from usage.

This inference will need to be smart enough to recognize certain “bitxor” and “bitand” operations are really code for “fneg,” “fabs,” etc.

Well, there are a couple options:

1. BitConverter.ToSingle() and BitConverter.GetBytes(float). The former converts 4 bytes from an existing byte[] to a float; the latter allocates a new array containing the bytes of the float. Allocating a temporary 4 byte array could still be more efficient than doing a bunch of computations to encode the number.

2. BitConverter.Int64BitsToDouble() and BitConverter.DoubleToInt64Bits(). These convert between long and double, so you’d need to do some bit shifting to extend the float bits to double bits and then cast the converted value from double back to float, but it’s less involved than doing all the encoding yourself and requires no allocations.

Yeah, I’ll have to try it both ways and see. In a micro-benchmark outside the context of glulx, the formula is 40% faster than the allocation.

Maybe Microsoft’s implementation is smarter, but Mono’s Int64BitsToDouble just calls GetBytes and ToDouble, resulting in even more allocation.

Huh! How are you testing it? The allocation itself should be very cheap, it’s the GC where you end up paying, and in practice I’d expect that the GC will usually take place while the terp is waiting for user input anyway.

Indeed, according to Reflector, Microsoft’s implementation is:

```[SecuritySafeCritical] public static unsafe double Int64BitsToDouble(long value) { return *(((double*) &value)); }```
In fact, you could use something similar in your own code to go directly between int and float, although including unsafe code would make your assembly unverifiable. (Alt: you could put the unsafe part in a separate assembly, and replace it with a less direct version for platforms that require verifiable code.)

I’ve started updating the Glulx spec to include this floating-point feature. The new sections are excerpted here:

eblong.com/zarf/tmp/floatspec.txt

Nothing much has changed since the last time I posted about it. You’ll see some incomplete bits. But it’s incomplete by definition until the first implementation is done, anyhow. I haven’t started to work on that yet.

Cool.

In the description of fadd/fsub/fmul/fdiv, it says:

What happens with Inf+Inf (possibly PositiveInf+NegativeInf) and Inf*Inf (ditto)?

+Inf + +Inf = +Inf.
+Inf + -Inf = NaN (I was trying to imply this with the Inf-Inf line… Subtraction is always addition with the opposite sign.)
Multiplications produce infinity with the appropriate sign bit.

I’m going by Javascript’s behavior, when writing the spec. You can do the tests yourself. If I find cases where libc varies, I’ll rethink them.

Just checking, your proposal is not only compatible with JS floats, but also with C floats correct? Firefox 4 will have support for typed arrays, so if they could be used that would obviously be wonderful. We’d want to use an DataView, which would provide access to ints of all lengths and 32bit floats, all referencing one compact ArrayBuffer.

cvs.khronos.org/svn/repos/regis … -spec.html

I’ve nearly finished the C (Glulxe) implementation.

It should be possible to use that spec, and keep main memory (and local variable blocks) as an ArrayBuffer. Won’t know for sure until we try it, of course.

This is now implemented in Glulxe and Quixe. You can see the test suite, or rather be dazzled by its blizzard of opaque numbers and operators:

eblong.com/zarf/glulx/quixe/test/play.html

(Type “allfloat” to run just the float tests.)