Normal Distribution from Tables

maga · June 22, 2014, 10:58pm

A thing that I occasionally have need of in my less-practical projects: I want to take a table of stuff, and choose a row from it with probabilities reasonably close to (half of) a normal distribution, with the first entry most likely, and so on. And I’d like to be able to tweak the weight of that distribution a little, flattening or stretching the curve so that the results Feel Right.

Dice are a computationally cheap way of producing normal distributions, so I’ve tried something like this:

[code]The dice rolled is a number that varies. The dice rolled is 3.

To decide what number is the distribution random of (tabula - a table name):
let N be the number of rows in tabula;
let Y be N * 2;
let D be the dice rolled - 1;
now Y is Y + D;
let R be 0;
let F be Y / the dice rolled;
let YA be Y;
let DA be the dice rolled;
while DA > 0 begin;
now YA is YA - F;
if YA < F begin;
let FA be F + YA;
let Z be a random number from 1 to FA;
now R is R + Z;
otherwise;
let Z be a random number from 1 to F;
now R is R + Z;
end if;
now DA is DA - 1;
end while;
now R is R - D;
now R is R - N;
if R < 1 begin;
now R is 0 - R;
end if;
decide on R.
[/code]
but I suspect that this is pretty inefficient and crude in a lot of ways - the only way to tweak the probability is by changing the number of dice - and, particularly for larger tables, I suspect that that’ll offer a very fat-fingered degree of fine-tuning. (Could be made less blobby by throwing out negative results for R rather than inverting them, but still). And I’m sure that it could be reduced to about eight lines of properly-written I6, for that matter. Are there better ways to do this?

Draconis · June 23, 2014, 12:31am

Isn’t the “as decreasingly likely outcomes” text substitution roughly normal? You could look at the I6 code from that.

zarf · June 23, 2014, 12:53am

Is it?

(The manual says what its distribution is.)

Juhana · June 23, 2014, 8:39am

The probability to get entry number E with “as decreasingly likely outcomes” is (N-E+1)/(1+2+…+N) where N is the number of entries.

Roger · June 23, 2014, 2:50pm

Speaking of inefficient and crude, my first inclination to implementing something like this would be to choose rows at random, but repeat individual rows as often as desired to achieve the intended distribution. Maybe an example would help:

Table of 2d6

Result
2
3
3
4
4
4
[and so on…]

I’m very hesitant to call this ‘better’ than any other approach, but it is different.

zahariel · June 23, 2014, 3:51pm

As Juhana states, the “decreasingly likely outcomes” text substitution provides a linear distribution. This may be good enough for your purposes, and it’s relatively easy to reimplement so you can use it outside of text substitutions (not tested):

To decide which number is the decreasingly likely outcome from (n - a number) begin;
   let sum be (n * n + n) / 2;
   let choice be a random number from 1 to sum;
   repeat with i running from 1 to n begin;
      decrease choice by n - i;
      if choice <= 0, decide on i;
   end repeat;
   decide on n;
end.

If this is not sufficient, the easiest way to get a “normal-ish” distribution is to roll a lot of identical dice and add up the results. For this it turns out to be easiest to roll “balanced” dice, which always have an odd number of sides, with values from -k to k (again, untested):

To decide which number is fake-normal up to (n - a number) using (k - a number) sided dice begin;
   let acc be 0;
   let dice be (n + k - 1) / k; [ceiling of n / k; make sure we can get high enough]
   while acc is 0 begin;
      repeat with i running from 1 to dice begin;
         increase acc by a random number from -k to k;
      end repeat;
      if acc > n, now acc is 0; [discard any trials that got too high]
      if acc < -n, now acc is 0;
   end while; [discard any trials that ended up at 0 or out of range]
   if acc > 0 decide on acc;
   decide on 0 - acc;
end.

Use of this phrase might be a bit of an art form. Increasing k will have the effect of spreading out the results, making higher numbers more likely, at the cost of a weaker approximation to normality. I think. Have fun testing!