Benchmarking PLUS Contributions wanted for a test suite

Yes something is going wrong with the sampling - 93902 samples seems far too high to me, though it could be possible if your timer resolution is very good, ie. 50µs. I’m only getting 1ms in Windows Gargoyle :frowning:. Can you check what “the minimum timer resolution” is, with an After running the benchmark framework rule? It stores the samples in a list, so with that many samples it’s sure to cause problems.

The framework takes at least 5 samples, and runs each test for at least 5 seconds as well (mainly because that’s what benchmark.js does.) But if too many samples is a problem I can limit that too. I’m sure 100 would be more than fine.

Would you also be interested in seeing the raw number of times a function was run?

If you compile Gargoyle with -DGARGLK_PROFILE_TIME, it uses a high resolution timer for glk_current_time calls. Resolution varies by platform but 50µs is in the ballpark.

Erik has access to a build like that, so possibly he’s using it here.

I’ve uploaded an update. The biggest change is how the iteration multiplier is calculated, which is the number of times a test case needs to be run to reach the timer resolution. It seems to work reliably with my tests. In the event that the test case does ever get a time of 0 then a message is printed and that time is discarded.

Dannii, was the “minimum timer resolution” actually the value you wanted me to test? It always returns “1”, the minimum value, so I assume that the value is not measured in microseconds…?

I’m using the latest public release of Gargoyle, 2011.1.

When the 95-96,000 samples reaches its end and sits for a few minutes as if the interpreter has hung, then finally spits out the results. I assume that during that time it is processing the massive list you mentioned! (Even the smaller value, 25-28,000 samples, has a noticeable delay, but it’s not as bad.) I’m no math whiz, so apologies if this is silly, but would it be possible to process the results iteratively, so that you don’t actually need to store all the values? I’m sure that the mean could be got that way, and probably the variance too…?

I’m not noticing a whole lot of difference between your new update and the original. Here are before and after results:

For the record, the code I’m testing in both cases is:

[spoiler][code]Include Benchmarking by Dannii Willis.

Search list is a list of numbers variable.

First when play begins:
try switching the story transcript on;
repeat with x running from 1 to 99:
add a random number between 1 and 99 to search list;
add 100 at entry 90 in search list;
say “The search list is [number of entries in search list] items long: [search list in brace notation].”;
wait for any key.

Searching a simple list is a test case.
The author is “Erik Temple”.
The description is “Looks for the number 100 in a list of random numbers 100 entries long.”
To run simple-list search (this is running simple search):
if 100 is listed in the search list:
do nothing.
The run phrase is running simple search.

Selecting from a simple list is a test case.
The author is “Erik Temple”.
The description is “Grabs entry 90 from a list of random numbers 100 entries long.”
To run simple-list select (this is running simple selection):
if entry 90 in search list is 100:
do nothing.
The run phrase is running simple selection.

After running the benchmark framework:
say “The minimum timer resolution was: [the minimum timer resolution].”[/code][/spoiler]

I seem to recall Ron mentioning that different systems might return the system clock in different units (was it nanoseconds vs microseconds?) That couldn’t be a factor here, could it? I wouldn’t think so, given that the results look pretty reasonable, but…

I do think it would potentially be useful to see the raw number of times a function was run, and it might also be useful to be able to cap the number (running 90,000 iterations takes a long time!).

FYI, here is a typical result on my system for your example from the extension docs:

It looks like you still have the old version. The new one caps it at 100 samples. It might be possible to iteratively calculate the variance, but I don’t know how. The algorithm I have needs the mean to be calculated first. But with only 100 samples that won’t be a problem any more.

If your timer resolution really is just 1µs then that’s great! And although different systems might be able to get the time to a different resolution, Glk normalises it to microseconds. So if you have a ns timer then it will divide by 1000, and if you only have a ms timer then it will multiply by 1000. On windows Garglk gives you the time in µs, but the value is cached and only updated every ms. The extension measures what resolution you have by running a tight loop that runs until the value changes (and samples that 30 times.) If you’re running on a Mac then potentially you do have access to a real µs timer (Garglk actually accesses a ns timer).

Yep, when I downloaded the file after your update post, GitHub was apparently still feeding out the old version. I just tried again and got the 120118 version. This works very nicely, and the values from 100 samples are indeed equivalent to those from 95,000 samples:

It might be nice to eventually have easy access to change the max # samples run (in case I wanted to raise or lower it), but of course that’s unlikely to be necessary.

With the number of samples run capped at 100, I don’t think it’s necessary any longer, but for future use in some other project perhaps:

This algorithm from en.wikipedia.org/wiki/Algorithms … g_variance seems to be able to calculate the mean as the data are collected, rather than iterating over the full dataset post-collection.

[code]def online_variance(data):
n = 0
mean = 0
M2 = 0

for x in data:
    n = n + 1
    delta = x - mean
    mean = mean + delta/n
    if n > 1:
        M2 = M2 + delta*(x - mean)

variance_n = M2/n
variance = M2/(n - 1)
return (variance, variance_n)[/code]

Anyway, Benchmarking is sweet! (It also happens to be a good illustration of the use of floating point in Glulx Inform, which will come in handy if I ever update Glimmr to make use of them.)

Thanks again for all the work!

Just a couple more thoughts:

First, I’ve edited my local copy of Benchmarking to show the description of each test case:

Before benchmarking a test case (called test case) (this is the showing a test case's info rule): now the current test case is the test case; now the current phase is ""; now the current sample number is 0; say "[line break][The test case][line break][italic type][description of the test case][roman type]";[### Changed ET - 1/18/2012] pause briefly;I’m not sure where that is desirable for all users, but there it is.

Second, for the bank of test cases you’re hoping to put together: Ron Newcomb a while back posted some code for searching through tables that used optimized glulx opcodes (@linearsearch, and maybe one of the others?) That might make for a nice comparison with the standard I7 code.

I’ve just pushed a major update to the extension. It now performs significance tests, so that you’ll be able to see if a new algorithm is actually faster than the old one. It’s not perfect however, and other computer programs may make the results less accurate, so it’s best to run the tests without anything else going on.

It doesn’t show the test author/description yet, but that is still planned.

This is super cool! Thanks for making this. :slight_smile:

::replying so I can find the thread again later::

Also, tres cool, Dannii.

I’ve just pushed an update which gives a menu option to show test descriptions. Which means this extension is now out of beta! I’ll be sending it to the I7 website soon.

Thanks for your comments everyone.

I’m still keen for contributions for a test suite. If you have any slow code please send it my way!

I have a little game called Sugar that is very slow, because it does a bunch of regex stuff on the entire output of the game every turn. Is this useful? Should I PM you the code?

That sounds good, though email would be better thanks.

Dannii, I’ve got an implementation of A* pathfinding laying around. It’s pretty darn slow. Would you be interested in that?

Yes that sounds good too. If you have a map where it was particularly slow that’d be good too.

What kind of path finding is built into I7? I’ve never looked at the details.

Ah, sorry. This isn’t directly related to Inform pathfinding–it finds a path through a grid of cells rather than on a graph of room connections (as the built-in pathfinding does). Don’t know if it would still be useful to you, but here it is:

[spoiler][code]“A-Star”

Release along with an interpreter.

[-----------]
[Multi-window]
Include Flexible Windows by Jon Ingold.

The map-window is a text-grid g-window spawned by the main-window. The position is g-placeabove. The scale method is g-fixed-size. The measurement is 9.

When play begins:
open up the map-window.

Window-drawing rule for the map-window:
move focus to the map-window, clearing the window;
let M be 0;
let count be 0;
repeat with N running through the grid of the location:
increase count by 1;
increase M by 1;
if M > the grid-width of the location:
now M is 1;
say “[line break]”;
if count is the flat-position of the marker:
say " @";
else if count is listed in the least-cost path:
say " X";
otherwise:
select a row with a tile number of N in the Table of Tiles;
say " [key entry]";
return to main screen.

Instead of jumping:
find path to goal in the location;
refresh windows.

[-----------]
[Single-window]
[To display map:
say “[line break][fixed letter spacing]”;
let M be 0;
let count be 0;
repeat with N running through the grid of the location:
increase count by 1;
increase M by 1;
if M > the grid-width of the location:
now M is 1;
say “[line break]”;
if count is the flat-position of the marker:
say " @";
else if count is listed in the least-cost path:
say " X";
otherwise:
select a row with a tile number of N in the Table of Tiles;
say " [key entry]";
say “[variable letter spacing][line break]”

Instead of jumping:
find path to goal in the location;
display map.

After looking:
display map.]

[--------------]

Include (- Array Cost_Map --> 256; -) after “Definitions.i6t”.

A map room is a kind of room.

A map room has a list of numbers called the grid.
A map room has a number called the grid-width.

The goal is a list of numbers variable. The goal is {9, 3}.
The starting point is a list of numbers variable. The starting point is {2, 5}.
The marker is a list of numbers variable. The marker is {2, 5}.

To decide whether (N - a number) is the flat-position of (G - a list of numbers):
let pos be ( (entry 2 of G - 1) * grid-width of location) + (entry 1 of G);
if N is pos, decide yes.

To decide what number is flat-position of (G - a list of numbers):
decide on ( (entry 2 of G - 1) * grid-width of location) + (entry 1 of G);

Walkability is a kind of value. The walkabilities are unwalkable and walkable.

The open list is a list of numbers variable.

The least-cost path is a list of numbers variable.

Table of Visited Nodes
node position closed-node parent F-score G-score H-score
a number truth state number number number number
with 99 blank rows

To evaluate the/-- walkability of the/-- map for (R - a map room):
write index to cost-map for the grid of R;
repeat with pos running from 1 to the number of entries of the grid of R:
select row with tile number of (entry pos of grid of R) in the table of tiles;
add (pass entry) to the cost-map at pos.

To write index to the/-- cost-map for (L - a list of numbers):
(- Cost_Map --> 0 = LIST_OF_TY_GetLength({-pointer-to:L}); -)

To add (W - a walkability) to the/-- cost-map at (pos - a number):
(- Cost_Map --> {pos} = {W} - 1; -)

To find path to goal in (R - a map room):
[create bitmap version of the map to define inaccessible squares]
evaluate the walkability of the map for R;
[initialize temporary variables]
let path completed be false;
let W be grid-width of R;
let G be a number;
[zero out the administrative lists]
let the goal position be the flat-position of the goal;
blank out the whole of the table of visited nodes;
now the open list is {};
[add starting node to the open list and initialize costs]
add flat-position of starting point to the open list;
choose row 1 from the table of visited nodes;
now the node position entry is entry 1 of the open list;
now the parent entry is 0;
now the G-score entry is 0;
now the H-score entry is the heuristic distance from the flat-position of the starting point to the goal position in the map of R;
now the F-score entry is G-score entry + H-score entry;
[now closed-node entry is false;]
[main loop: select neighbor with the best score and follow on.]
while the number of entries of the open list > 0:
[sort the table of visited nodes in F-score order;][this sort function now replaced with priority queue on the open list]
select row with a node position of (entry 1 in the open list) in the table of visited nodes;
let best F be F-score entry;
[choose row with a closed-node of false in the table of visited nodes;]
let current node be a number;
let current node be node position entry;
let G be G-score entry;
[now closed-node entry is true;]
remove the current node from the open list;
if the current node is the goal position:
now path completed is true;
break;
let candidates be the neighbors of the current node in the map of R;
repeat with selected running through candidates:
if selected is not listed in the open list:
choose blank row in the table of visited nodes;
now node position entry is selected;
now parent entry is current node;
if ABS (current node - selected) is 1 or the remainder after dividing ( ABS (current node - selected) ) by W is 0:[short way of determining whether neighbor is orthagonal]
now G-score entry is G + 10;
otherwise:
now G-score entry is G + 14;
now H-score entry is the heuristic distance from selected to the goal position in the map of R;
now F-score entry is G-score entry + H-score entry;
[now closed-node entry is false;]
[let F1 be F-score entry;]
if F-score entry < (best F + 1):
add selected at entry 1 in the open list;
otherwise:
add selected to the open list;
otherwise if there is a node position of selected in the table of visited nodes:
select row with a node position of selected in the table of visited nodes;
let G1 be a number;
if ABS (current node - selected) is 1 or the remainder after dividing ( ABS (current node - selected) ) by W is 0:[short way of determining whether neighbor is orthogonal]
now G1 is G + 10;
otherwise:
now G1 is G + 14;
if G1 < G-score entry:
now parent entry is current node;
now G-score entry is G1;
now H-score entry is the heuristic distance from selected to the goal position in the map of R;
now F-score entry is G-score entry + H-score entry;
[now closed-node entry is false;]
[say “[current node]: added [selected]; F: [F-score entry]; G: [G-score entry]; H: [H-score entry].”;]
if path completed is true:
now the least-cost path is {};
add goal position to least-cost path;
select row with a node position of goal position in the table of visited nodes;
let seek be parent entry;
while seek is not flat-position of starting point:
add seek at entry 1 in least-cost path;
select row with a node position of seek in the table of visited nodes;
now seek is parent entry;
say “Path: [least-cost path].”

To decide what number is ABS (N - a number):
if N < 0, decide on 0 - N;
decide on N.

[To decide which number is the heuristic distance from (S - a number) to (E - a number) in the map of (R - a room):
let L be the coordinates of S in the map of R;
let M be the coordinates of E in the map of R;
let X_dist be entry 1 of L - entry 1 of M;
let Y_dist be entry 2 of L - entry 2 of M;
if X_dist < 0, now X_dist is 0 - X_dist;
if Y_dist < 0, now Y_dist is 0 - Y_dist;
if X_dist > Y_dist:
decide on (14 * Y_dist) + (10 * (X_dist - Y_dist) );
otherwise:
decide on (14 * X_dist) + (10 * (Y_dist - X_dist) ).]

To decide which number is the heuristic distance from (S - a number) to (E - a number) in the map of (R - a room):
let diff be ABS (S - E);
let Y_dist be diff / grid-width of R;
let X_dist be the remainder after dividing diff by grid-width of R;
if X_dist > (grid-width of R / 2), now X_dist is grid-width of R - X_dist;
if X_dist > Y_dist:
decide on (14 * Y_dist) + (10 * (X_dist - Y_dist) );
otherwise:
decide on (14 * X_dist) + (10 * (Y_dist - X_dist) ).

To decide what list of numbers is the coordinates of (N - a number) in the map of (R - a room):
let L be a list of numbers;
let W be the grid-width of R;
If N is 1, now N is 2;
let Y be ( (N - 1) / W ) + 1;
let X be N - (Y * W);
add X to L;
add Y to L;
decide on L.

To decide which list of numbers is the neighbors of (N - a number) in the map of (R - a map room):
let L be a list of numbers;
let candidates be a list of numbers;
let W be the grid-width of R; [width]
let Y_max be the number of entries of the grid of R;
let H be Y_max divided by W; [height]
let X be ( (N / W) * W ) + 1; [first position of current row]
if N < (W + 1), now X is 1;
let X_max be W + (X - 1); [position at end of current row]
unless N + 1 > X_max:
add (N + 1) to L;
if (N + 1) - W > 0, add ( (N + 1) - W ) to L;
if (N + 1) + W < Y_max, add ( (N + 1) + W ) to L;
unless (N - 1) is less than X:
add (N - 1) to L;
if (N - 1) - W > 0, add ( (N - 1) - W ) to L;
if (N - 1) + W < Y_max, add ( (N - 1) + W ) to L;
unless (N - W) < 1, add (N - W) to L;
unless (N + W) > Y_max, add (N + W) to L;
repeat with item running through L:
if square (item) in the cost-map is walkable, add item to candidates;
[select row with tile number of (entry item of grid of R) in the table of tiles;
if the pass entry is walkable, add item to candidates;]
decide on candidates.

To decide whether the/-- square (N - a number) in the cost-map is walkable:
(- Cost_Map --> {N} -)

Include (-
Global seekStartAddress;
Global seekRows;
Global seekVal;
Global seekReturnValue;
-) after “Definitions.i6t”.

To select a/the/-- row with (TC - a word value valued table column) of (W - a word value) in/from (T - table name):
(- {-require-ctvs}
ct_0 = {T};
ct_1 = -1;
seekStartAddress = ({T})–>(TableFindCol({T}, {TC}, true)) + (COL_HSIZE * WORDSIZE);
seekRows = TableRows({T}) + 2;
seekVal = {W};
@linearsearch seekVal WORDSIZE seekStartAddress WORDSIZE seekRows 0 4 ct_1;
if (ct_1 < 1) return RunTimeProblem(RTP_TABLE_NOCORR, {T});
-).

Include (-
Global quickloopSearchFor;
Global quickloopStartAddress;
Global quickloopRowsLeft;
Global quickloopReturnValue;
-) after “Definitions.i6t”.

To repeat through (tab - table name) until (row - number) with (tc - a word value valued table column) as (val - a word value) begin – end:
(- {-require-ctvs} ! {-zero-counter:reviewHistoryCol}
ct_0 = {tab};
!print "rows: ", {row}, "; value: ", {val}, ". ";
ct_1 = -1;
quickloopSearchFor = {val};
quickloopStartAddress = ({tab})–>(TableFindCol({tab}, {tc}, true)) + (COL_HSIZE * WORDSIZE);
quickloopRowsLeft = {row};
quickloopReturnValue = -1;
while (1) {
@linearsearch quickloopSearchFor WORDSIZE quickloopStartAddress WORDSIZE quickloopRowsLeft 0 4 quickloopReturnValue;
quickloopReturnValue++;
if (quickloopReturnValue == 0) break;
ct_1 = ct_1 + quickloopReturnValue; ! set the chosen row for I7
quickloopStartAddress = quickloopStartAddress + (WORDSIZE * quickloopReturnValue); ! set start point for next iteration
quickloopRowsLeft = quickloopRowsLeft - quickloopReturnValue; ! decrease #/rows left to check
!-).

Table of Tiles
tile number pass key
0 unwalkable " "
1 unwalkable “#”
2 walkable “·”

[-----------------]

The kitchen is a map room. The grid-width is 10. The grid is {
0, 0, 0, 1, 2, 1, 0, 0, 0, 0,
1, 1, 1, 1, 2, 1, 1, 1, 1, 1,
1, 2, 2, 2, 2, 2, 2, 2, 2, 1,
1, 1, 1, 1, 1, 2, 2, 1, 1, 1,
1, 2, 2, 1, 2, 2, 2, 2, 2, 1,
1, 2, 2, 1, 2, 2, 2, 2, 2, 1,
1, 2, 2, 2, 2, 1, 1, 1, 1, 1,
1, 1, 1, 2, 2, 1, 0, 0, 0, 0,
0, 0, 1, 2, 2, 1, 0, 0, 0, 0
}.

[The kitchen is a map room. The grid-width is 9. The grid is {
0, 0, 0, 1, 2, 1, 0, 0, 0,
1, 1, 1, 1, 2, 1, 1, 1, 1,
1, 2, 2, 2, 2, 2, 2, 2, 1,
1, 1, 1, 1, 1, 2, 2, 1, 1,
1, 2, 2, 1, 2, 2, 2, 2, 1,
1, 2, 2, 1, 2, 2, 2, 2, 1,
1, 2, 2, 2, 2, 1, 1, 1, 1,
1, 1, 1, 2, 2, 1, 0, 0, 0,
0, 0, 1, 2, 2, 1, 0, 0, 0
}.]

Dumping is an action applying to nothing. Understand “dump” as dumping.

Carry out dumping:
debug node position in the Table of Visited Nodes.
[say “Table of Visited Nodes[line break]”;
repeat through the Table of Visited Nodes:
say “[node position entry]; [F-score entry].”]

To debug (TC - a table column) in (T - a table name):
(- TableColumnDebug({T}, {TC}); -)[/code][/spoiler]

I’m looking for slow algorithms that together will provide a somewhat comprehensive performance stress test of our interpreters. So the ideal test would be something you actually wanted to use, but was too slow so you couldn’t. Then we can use the tests to profile our interpreters to find what’s slow, and prove that our interpreter changes actually help. (Of course a lot of the time the solution will be a better algorithm, but this extension will help with that too!)

So your code looks good. I’ll have to strip out all the printing code though.

If anyone hits this topic off search engine, like I did today, there is a newer version of the extension at: raw.githubusercontent.com/i7/ex … arking.i7x
Version 1/130803 of Benchmarking (for Glulx only) by Dannii Willis

Thank you Dannii. As we are moving more into a diverse set of mobile devices, some of which are ultra-conservative on battery, it might make sense to add a BogoMIPS like fast call in this extension so that an Inform 7 story could make choices based on how fast the device interpreter is? For example, on slow Glulx interpreters like Incant for Android - a loading message could be presented.

Extension Version 1/130803 doesn’t currently build on Inform 7 build 6M62. Error:

In Part 2 - The interface unindexed in the extension Benchmarking by Dannii
  Willis:
  >--> The table 'Table of User Styles (continued)' (~/Inform/Extensions/dannii
    willis/benchmarking.i7x, line 506) won't work as a continuation, because it
    contains a column not found in the original 'Table of User Styles' (~/Inform/Extensions/jon
    ingold/flexible windows.i7x, line 614).
    The old table has columns: window, style name, background color, color, first
    line indentation, fixed width, font weight, indentation, italic,
    justification, relative size, reversed.
    The new table has columns: style name, glulx color.

I reworked the Glk color table and got that to compile, but now it has another compile error on Inform 7 build 6M62. Version 1/130803 of Benchmarking (for Glulx only) by Dannii Willis:

In Part 1 - The framework, Section - Real numbers unindexed in the extension
  Benchmarking by Dannii Willis:
  >--> You wrote 'R1 specifies a real number' (~/Inform/Extensions/dannii
    willis/benchmarking.i7x, line 203): but you can only specify ways to write
    new kinds of value, as created with sentences like 'A weight is a kind of
    value.', and not the built-in ones like 'number' or 'time'.

Assistance appreciated.

Inform now has native real-number support. You could change the extension to use this, or (easier) change all occurrences of “real number” to “custom number” or such.