Good 16-bit PRNGs

Draconis · January 27, 2025, 5:49am

Currently, Dialog offers no ability to seed or affect the random-number generator. It just delegates to the interpreter’s backend-specific RNG opcodes.

Adding the ability to seed the PRNG wouldn’t be especially hard, and would be good for debugging. But it seems even more valuable to go one step further, and ensure that every backend uses the same PRNG, so that seeds are consistent even between interpreters.

This would require a few specific things from the algorithm:

It can be easily implemented with 16-bit math—not with the very limited Dialog math functions, but with the moderately limited Z-machine ones (which don’t offer 32-bit multiplication, for example, but can do &, |, ^, >>, and <<)
It can return uniformly random numbers over any given range (which will never be larger than 2¹⁴), so it should give good values when requested between 0 and 16, between 0 and 256, and between 0 and 2¹⁴-1
It can function well enough with a 14-bit seed, since 14-bit integers are the only convenient data type to seed them with on the user’s side; if really necessary, it can be seeded with a list of numbers, but I’d prefer to avoid that if possible
It’s good enough to be used indefinitely, for all sorts of purposes, because the only way to make this work on Å-machine is to build it into the VM specification itself—the Å-machine just doesn’t offer enough math opcodes to implement it in VM code (it’s basically exactly equivalent to Dialog in its offerings)
It doesn’t have any obvious patterns in the low bits, since most uses of randomness in Dialog (just like in Inform!) are for (select)...(at random), which means usually using three or four bits of randomness at a time

However:

It does not have to be cryptographically secure, or even especially good—I mainly expect the seeded PRNG to be used for testing, and would specify that a seed of 0 means to fall back on the best randomness source available to the interpreter (probably substantially better than 14 bits!)
It does not have to handle numbers higher than 2¹⁴-1 (and for most practical purposes, nothing higher than 255: the maximum number of branches in a (select)...(at random)), which means if there’s an algorithm that’s great for everything except the bottom two bits, we can just shift those off
It can store more than 14 bits of state, if need be, though having the state be identical to the seed is convenient (and easy to code—16 bits is the size of a Z-machine register)

And while I’m passingly familiar with PRNG algorithms, I don’t know off the top of my head which of them would have these parameters. So I turn to the forum for guidance: are there any PRNGs that seem especially well-suited for this purpose?

andrewj · January 27, 2025, 7:13am

I suggest an xorshift generator, such as in this site:

Only complication is that it can never be initialized to zero, but that is easy enough to avoid.

Draconis · January 27, 2025, 7:27am

Extremely convenient, in fact, if we use seed 0 to mean “use the best randomness available instead of a seeded PRNG”!

Dannii · January 27, 2025, 7:44am

I implemented Xorshift for Z-Machine here: extensions/Dannii Willis/Xorshift.i7x at 9.3 · i7/extensions · GitHub

This just uses the basic 16 bits of state. You can use more state, with longer calculations.

Note that the Z-Machine doesn’t have an XOR opcode, but you can implement it manually:

xorshift_seed = (xorshift_seed | temp) & (~(xorshift_seed & temp));

I don’t know if the (4, 3, 7) triple is still well regarded for 16 bit. There may be better triples. Or it’s probably not too difficult to do a 32 bit Xorshift manually.

fredrik · January 27, 2025, 8:50am

We use this algorithm for Ozmoo, based on shift, xor and addition. When we need 16 bits, we run it twice.

Hanna · January 27, 2025, 6:48pm

There’s already a bunch of suggestions for PRNGs with very small states using only a few simple operations, so let me jump on this point for contrast:

This is not the only important aspect, probably not even in the top three most important. But it’s an interesting point because it gives some reasons to prefer larger internal state and cryptographic algorithms rather than the minimalistic “one word of state and a few instructions to update it” approach. I’ll mostly talk about state size because I think the cases it the strongest there, but I’ll also give some reasons to consider cryptographic algorithms afterwards.

State size: With 16 bits of state, the period is at most 2^16, which is uncomfortably small. Typical Dialog games may not need 64 thousand outputs, or wouldn’t care if output repeats after that many outputs, but some statistical artifacts become apparent much earlier, even for an idealized, “as good as can be” PRNG:

Since there’s 2^16 possible outputs and the period is 2^16, either the PRNG has to skip some values (undesirable) or it produces each possible output exactly once over its period.
Per the Birthday Problem, a truly random source of uniformly distributed 16-bit integers is likely to repeat a value after a few hundred outputs. For example, the table in the Wikipedia article on birthday attacks reports 50% odds of a duplicate after 300 outputs, and 75% odds after 430 outputs. In contrast, under the assumptions described above any PRNG will never repeat an output unless you exhaust its full period.
Only needing to output 14 bits helps a little, but it still means that each possible output occurs four times across the entire period, so I don’t think this fully solves the problem. It just means there’s no handy pre-computed tables telling us exactly when it becomes a problem.

Independent seeds are also influenced by state size / period length. Regardless of how big or small the seed is, many PRNGs have a single long sequence of states and the seed just determines where in this sequence they start. Ideally, multiple runs of the game with different seeds give completely independent RNG behavior, i.e., each run uses a different part of the PRNG’s period without overlap. A 16 bit state is too small to ensure that even at very modest scales. For example, if you have 32 runs that each used 4K outputs from the same 2^16 period, there must be substantial overlap by the pigeonhold principle.

It’s entirely possible that none of the above is ever a problem, but using a larger state size is an easy way to not have to worry about it.

Similarly, using the same basic algorithm as a cryptographically secure PRNG may use (with lots of extra bells and whistles, more rounds, and many optimizations) is an easy way to reduce the risk of going “oh damn this PRNG isn’t good enough after all” months or years down the line. IF doesn’t need security in any sense (there’s not even a coherent threat model), but when you want something “good enough to be used indefinitely for all sorts of purposes” then cryptographers bring two useful things to the table:

The designers of crypto primitives aim for a very high standard, and naturally prefer conservative designs. A PRNG survives a significant amount of cryptanalysis, it’s basically guaranteed to be good enough for any IF purpose, even if it later turned out that sophisticated attackers could break it. In contrast, with non-crypto PRNGs the whole point is flying close to the sun: make it as small and fast as they can without any obvious defects. Unsurprisingly, most of these later turn out to have flaws, sometimes very severe ones, that the original designer did not anticipate.
Cryptographic algorithm generally comes with test vectors (once they escape academic papers and become standardized), used to ensure that implementations are correct and don’t contain any silly mistakes. Non-crypto PRNG proposals virtually never come with test vectors, and this has caused real problems with at least one algorithm. There is no inherent reason why non-crypto designers can’t supply test vectors, but generally they don’t. So we’re left with comparing against whatever implementation is available, which may not even exist (e.g., there’s a reference implementation with 32 bit output but you want 16 bit) or which may have the same bug as your implementation.

I don’t particularly want to convince you to use a cryptographic algorithm, but I was curious if there’s a standardized one that seems reasonable to implement with 16 bit words so I looked and found Ascon. It has an “extended output” (XOF) mode of operation that can work as a PRNG (produces 64 bits at a time but you can chop it up into 4x16). It looks pretty simple to implement and is supposed to perform acceptably even on 8-bit microcontrollers. It only consists of bitwise logical operations (lots of xors, unfortunately for the Z-machine) and occasional rotations of 64-bit bit vectors, no 32-bit or 64-bit addition or multiplication or anything like that. So nothing fundamentally complicated than a 32 bit Xorshift would need, just repeated more often.

Draconis · January 27, 2025, 7:18pm

Aha! You say you don’t particularly want to convince me to use a cryptographic algorithm, but honestly you raise a bunch of really solid points. Keeping a larger state vector around isn’t a huge issue—I can just store it in RAM instead of a register—and since Dialog has dead-code optimization, it doesn’t matter that much how much code goes into the routine (it won’t be compiled if it’s not needed).

Let me see how hard this would be to implement…

Mike_G · January 27, 2025, 7:25pm

In my z-machine interpreters I now use a middle-square Weyl sequence generator.

It uses 8-bytes of state (two 32-bit numbers) and has a guaranteed ~~4^32-1~~ period. It also passes the NIST random bit generation suite of tests.

Edit: Mental goof, it’s 2^32-1.

Here’s my implementation in Rust:


//! Pseudorandom number generator.

/// Constant key used by the random number generator.
///
/// Key requirements:
/// -  1st half - no repeated hex digits
/// -  2nd half - no repeated hex digits
/// -  roughly equal 1 and 0 bits
/// -  odd
const KEY: u32 = 0x_7269_b1a3;

/// Swap the upper and lower halves of the seed to ensure distinct starting points for all possible 16-bit seeds.
fn swap_halves(seed: u32) -> u32 {
    seed >> 16 | seed << 16
}
    
/// Pseudorandom number generator.
///
/// Uses the middle-square Weyl sequence method.
#[derive(Clone, Debug)]
pub(super) struct Random {
    index: u32,
    state: u32,
}   
    
impl Random {
    /// Returns a pseudorandom number from 1-`range`.
    #[allow(clippy::cast_possible_truncation)]
    pub(super) fn generate(&mut self, range: u16) -> u16 {
        let reject_threshold = u16::MAX - (65536 % u32::from(range)) as u16;
        loop {
            //middle-square Weyl sequence
            self.state = self.state.wrapping_mul(self.state);
            self.index = self.index.wrapping_add(KEY);
            self.state = self.state.wrapping_add(self.index);
            self.state = swap_halves(self.state);
            let result = self.state as u16;
            if result <= reject_threshold {
                break result % range + 1;
            }
        }
    }

    /// Returns a new [`Random`] based on a seed value.
    pub(super) fn new(seed: u32) -> Random {
        Random {
            index: swap_halves(seed),
            state: 0,
        }
    }

    /// Resets the generator with a new seed.
    ///
    /// Zcode can only generate seeds 1-32768.
    pub(super) fn reset(&mut self, seed: u32) {
        self.index = swap_halves(seed);
        self.state = 0;
    }
}

Hanna · January 27, 2025, 8:09pm

Another thing to keep in mind, independent of the raw PRNG, is how this kind of selection between N options is done. Apparently the 6502 and JS implementations of the Å-machine use the common approach of taking the remainder (rand() % N in C notation) which introduces some bias even with a perfect RNG. That bias can be bounded and is relatively modest when N is much smaller than the output range of the raw PRNG. But if you’re already changing the Å-machine it may be worth doing properly at the same time.

A basic rejection sampling loop is a bit fiddly to get right but doesn’t need more than 16-bit arithmetic:

Compute the largest multiple cN of N that’s less than or equal to 2^16, so you can map from [0, cN] to [0, N] without bias by taking the remainder.
Draw outputs from the PRNG until you get one in the interval [0, cN], (which is very likely to be on the first try for small N).

Mike’s code also does this though it takes a detour through 32 bit integers for convenience.

Mike_G · January 27, 2025, 9:13pm

The 32-bit detour is strictly to allow for the case where the range is a divisor of 65536. It can be removed, but then you end up rejecting some results that should be usable without bias. I’ve considered looking for a way to do away with the 32-bit subtraction without losing that extra efficiency, but not had the time.

Edit: I realized you might have just meant the 32-bit math used in the algorithm itself. I was specifically thinking of the 32-bit math used in calculating the rejection threshold. It’s existence there bugs me.

Draconis · January 27, 2025, 9:55pm

So far Ascon looks feasible, though 40 bytes of state is going to be a pain to work with in Z-machine assembly. However, I haven’t found a good open-source implementation that uses it as a PRNG, only academic papers describing how it’s possible—do you know of one I could reference? I’m wary of trying to implement this without knowing exactly what I’m doing, given how delicate these things can be.

Hanna · January 27, 2025, 11:18pm

I would suggest just using the standard Ascon-128XOF, applied to the seed as input message:

Pick down some specific encoding of the seed as byte sequence (e.g., if it’s 16 bit, choose either little endian or big endian) and stick to it. Which one doesn’t matter - a secure XOF has to handle any input well.
Consume the output byte stream in order without skipping or duplicating anything, i.e., taking the sequence of output words and turning them back into bytes (using whatever byte order you choose) must give exactly the same byte sequence as the standard function.

This ensures you can test your implementation against the official test vectors and other implementations of the standard, ruling out implementation level mistakes.

I assume part of your question is whether the XOF is safe to use in this particular way for this purpose. Leaving aside that “16 bit seed” and “security” are never seen in the same room together, the definition of an XOF ensures that the output will be a stream of uniformly random bits to an observe who does not know the seed. An extendable output function is a cryptographic hash function with longer output, so it (1) must not “fall apart” on very short inputs or some other designated set of inputs, and (2) must give uniformly random bit strings as output. The latter is sometimes explicitly part of the definition, but it’s also necessary to achieve the more important security goals of hash functions with the minimum number of output bits (which Ascon claims to achieve).

The academic papers and open source projects building CSPRNGs out of the XOF also rely on the raw output being uniformly random, they just add more bells and whistles on top because they have to solve other security concerns too. A CSPRNG needs to prevent an attacker who has figured out the current internal state from recovering earlier outputs of the PRNG, and support regular injection of fresh entropy so the attacker can’t predict future outputs indefinitely. But none of that is relevant in our context, and adding it makes it harder to find test vectors, so I would actively advice against emulating it.

(Another thing to keep in mind when trying to build something on top of an XOF is that, for a given input, shorter outputs are prefixes of longer outputs. But for a PRNG, this is precisely what we want so we can compute the output stream on demand instead of all at once.)

Exemptus · January 27, 2025, 11:31pm

Does Ascon include MAC schemes? Sorry for the digression, but I’m interested for professional reasons (peer-review of related matters). I haven’t seen a description of the permutation being applied for MAC generation in the draft NIST submission, which is unusual for these kinds of things.

Draconis · January 27, 2025, 11:35pm

All right. So if I’m understanding this right…

Initialize the state to [3, 0, 12<<4|12, 0, 8, 0, 0, 0…] (40 bytes total)
Run 12 permutations
State[0] = State[0] ^ High byte of seed
State[1] = State[1] ^ Low byte of seed
Run 12 permutations
Whenever a random value is requested:
- Return State[0] << 8 | State[1]
- Run 12 permutations

?

Hanna · January 27, 2025, 11:46pm

I did in fact mean the computation of the rejection threshold. I don’t think 32 bit is needed for that, but yeah it’s pretty subtle. It’s too late here for me ~~to work out if the range vs. range + 1 difference matters~~ (edit: oh, it’s (result % range) + 1, not result % (range + 1)). If nothing else, you should be able to avoid it by special casing powers of two, which is probably cheaper than two modulo operations when applicable.

Draconis · January 27, 2025, 11:54pm

And then to run those permutations:

With R running from 1 to 12:
- S[16] ^= $F0 - R×$10 + R×$01
- S[0, 1, 2, 3, 4, 5, 6, 7] ^= S[32, 33, 34, 35, 36, 37, 38, 39]
- S[32, 33, 34, 35, 36, 37, 38, 39] ^= S[24, 25, 26, 27, 28, 29, 30, 31]
- S[16, 17, 18, 19, 20, 21, 22, 23] ^= S[8, 9, 10, 11, 12, 13, 14, 15]
- With J running from 0 to 39:
  - T[J] = (S[J] ^ $FF) & S[(J+8)%40]
  - S[J] ^= T[(J+8)%40]
- S[8, 9, 10, 11, 12, 13, 14, 15] ^= S[0, 1, 2, 3, 4, 5, 6, 7]
- S[0, 1, 2, 3, 4, 5, 6, 7] ^= S[32, 33, 34, 35, 36, 37, 38, 39]
- S[24, 25, 26, 27, 28, 29, 30, 31] ^= S[16, 17, 18, 19, 20, 21, 22, 23]
- S[16, 17, 18, 19, 20, 21, 22, 23] ^= $FF

Followed by the rotation step, which I’m trying to find the implementation for…

As much as I like the idea of using a CSPRNG for this, I don’t think this is going to be feasible on the Z-machine. Each of those ^= operations requires reading the appropriate data from memory, using AND and OR and NOT to synthesize an XOR, then writing it back…16 bits at a time.

Draconis · January 28, 2025, 12:02am

That said, using a standard cryptographic hash seems like it would have a lot of benefits. The question is just if there are any XOFs (or cryptographic hashes in general) that are simpler to run on 16-bit hardware than Ascon…

Hanna · January 28, 2025, 12:11am

It sounds like the two main problems for the Z-machine are (1) not having a native XOR, and (2) not having enough local variables to be able to keep the whole state in there across all rounds. The former is probably going to hurt most cryptographic designs and to a lesser extent also many non-crypto PRNGs because it’s such a useful, ubiquitous and cheap operation on physical computers.

I think the latter problem could be alleviated by using some global variables in addition to or instead of locals. You wouldn’t even need to reserve any globals for it, you could just evacuate some globals used for other things to RAM while running the rounds and restore them after that’s done.

jkj_yuio · January 28, 2025, 12:13am

This is what i do on the trs-80. Not sure where this came from, anyone recognise this method. Warning - could be rubbish!

unsigned int rand16()
{
    uint x = seed;
    x ^= x << 7;
    x ^= x >> 9;
    x ^= x << 8;
    seed = x;
    return x;
}

NB: int is 16 bit.

Draconis · January 28, 2025, 12:19am

That’s the classic 7-9-8 xorshift algorithm! It’s much-beloved on retro systems because shifting by 7, 9, and 8 bits is cheaper than most other numbers, but it’s statistically not very good.