How do you automatically benchmark a development system?

We currently have a whole bunch of automated tests for the Dialog compiler, which ensure that new changes don’t break any old behavior: three complete games that automatically play through a walkthrough, two with the current library and one with an old library, and a bevy of small test cases checking individual behaviors. So far so good; I’m very happy with how this has been working.

I would now like to also keep track of speed—both the speed of compilation, and the speed of playing the resulting games. I’m thinking of writing a small script that uses time(1) to time compilation and execution on Z-machine, Å-machine, and debugger, and save the results to a file. So far, still so good.

But what I’d really like is to run this script automatically on Github’s servers (so there’s consistent hardware every time, rather than each developer’s personal machine), then commit the resulting file to version control. Looking at changes in that file over time would then show which pull requests are causing significant changes in compilation and execution speed.

Is this a thing that Github Actions can do? Is this something that’s wise to do at all? Is there a better way to track this sort of thing? How do other compiler devs handle this for their own systems? I imagine the Inform 6 maintainers also worry about this, for example.

I’m afraid that the only I6 compilation benchmark consisted of me hitting the compile button on a large source file while playing a Mouth Music track on the CD player, and counting beats. I can’t remember if the track was “Hoireann O” or “Seinn O!”

…Okay, that was about 1998, when I was using MacOS Classic and didn’t have time(1) available.

I have occasionally run speed tests, but only manually, after specific changes to the compiler. For example, after the big switch to dynamic memory allocation (getting rid of $MAX_PROP_TABLE_SIZE and so on), I wanted to know whether all that realloc() was slowing us down. Turns out things were faster with realloc(), go figure.

1 Like

Yes, Github actions can do that. In fact, if you look at my repo you will find that a few commits ago I removed it from there because running the tests took almost 8 seconds as opposed to 1 or 2 on the local machine. You can see the recipes for it in the .husky using cargo for Rust compilation.

You will also find that it automatically generates a report which is consumed by the website on the frontpage. See: GitHub - urdwyrd/urd: A declarative schema system for interactive worlds

Proof is below:

Ozmoo has a benchmark mode, in which it plays through a set of scripted commands for a game. Before the first command, and after the last, it prints the system clock, in jiffies (1/60th s). This works well on C64 and C128, quite possibly on more platforms. I use this quite often to benchmark Ozmoo, PunyInform and I6 lib.

May not be great for fully automated benchmarking.

I don’t think you need to worry too much about the speed.

With Glazer I noticed is was a bit slow on a large file. Compiled it with profiling (-pg option in GCC), then used gprof on the generated file after a run, and found a bottleneck: Node_Add(). After making node handling more optimal, the time dropped from 2.0 seconds to around 0.2 seconds.