Inform 7's (lack of) Speed

Okay, so I did find the template hacking I mentioned earlier. It’s now bundled as an extension, attached for your reference, but be warned that the code is in quite a yucky state.

I also ran some quick-and-dirty performance tests on both Glulxe and git, and I found out that I don’t have such a good @malloc/non-@malloc comparison as I thought yesterday—any performance differences from switching to @malloc are drowned out by other effects.

An example:[code]There is a room.

Instead of singing:
let W be some indexed text;
repeat with X running from one to 80:
now W is “[W][W]”;
say “[the number of characters in W].”

Foo relates numbers to numbers.

Instead of waving hands:
repeat with the counter running from zero to 8000:
now the foo relation relates the counter to the counter;
showme the number that relates to 1592 by the foo relation.

Instead of jumping:
let Y be a list of indexed text;
repeat with X running from one to 8000:
add “” to Y.[/code]
With a debug build of this source and the extension, Git ran the script sing/wave/quit about six times faster, and Glulxe (profiling) went at roughly a tenfold pace. (Though, admittedly, neither the sing nor the wave rule is likely to be representative of actual IF.) The profiling output is spoilered below.

Without the extension:[spoiler][code]Code segment begins at 0x3c
290 called functions found in ./profile-raw
1276 functions found in test-debug were never called
Functions that consumed the most time (excluding children):
RT__ChLDB:
at $03eddb (line 1); called 40991752 times (4294967296 accelerated)
43.076851 sec (4540917808 ops) spent executing
80.325556 sec (17753753712 ops) including child calls
Unsigned__Compare:
at $03e6f6 (line 1); called 46466034 times (4294967296 accelerated)
42.222459 sec (4666695568 ops) spent executing
42.222459 sec (17551597456 ops) including child calls
BlkSize:
at $0310e3 (line 27569); called 3792638 times (4294967296 accelerated)
30.291623 sec (4498572034 ops) spent executing
108.539218 sec (17942947802 ops) including child calls
RT__ChLDW:
at $03ee0c (line 1); called 3451082 times (4294967296 accelerated)
4.148779 sec (4319124870 ops) spent executing
7.317254 sec (17231635414 ops) including child calls
BlkValueWrite:
at $031e0b (line 27972); called 363600 times (4294967296 accelerated)
3.859299 sec (4319845324 ops) spent executing
70.812351 sec (17676125153 ops) including child calls
BlkValueRead:
at $031d12 (line 27946); called 361973 times (4294967296 accelerated)
3.269208 sec (4316540557 ops) spent executing
55.611922 sec (17569324293 ops) including child calls
RT__ChSTB:
at $03ee41 (line 1); called 793328 times (4294967296 accelerated)
1.255341 sec (4302107248 ops) spent executing
2.666186 sec (17199702384 ops) including child calls
RT__ChSTW:
at $03ee8b (line 1); called 218272 times (4294967296 accelerated)
0.390770 sec (4297150016 ops) spent executing
0.785204 sec (17185544256 ops) including child calls
BlkAllocate:
at $0314e0 (line 27686); called 39 times (4294967296 accelerated)
0.302045 sec (4296977667 ops) spent executing
1.964846 sec (17194490934 ops) including child calls
HashCoreCheckResize:
at $03bf70 (line 33297); called 8000 times (4294967296 accelerated)
0.224566 sec (4296280743 ops) spent executing
55.649696 sec (17571155823 ops) including child calls

^D[/code][/spoiler]With the extension:Code segment begins at 0x3c 285 called functions found in ./profile-raw 1273 functions found in test-debug were never called Functions that consumed the most time (excluding children): Unsigned__Compare: at $03e013 (line 1); called 2349167 times (4294967296 accelerated) 2.177504 sec (4313760632 ops) spent executing 2.177504 sec (120277877624 ops) including child calls RT__ChLDB: at $03e6f8 (line 1); called 1452052 times (4294967296 accelerated) 1.681527 sec (4303679608 ops) spent executing 3.032960 sec (120279413016 ops) including child calls BlkValueWrite: at $03172a (line 27890); called 363600 times (4294967296 accelerated) 1.114526 sec (4302745734 ops) spent executing 3.697855 sec (120283951311 ops) including child calls BlkValueRead: at $0315c4 (line 27832); called 361973 times (4294967296 accelerated) 1.102141 sec (4302688473 ops) spent executing 3.708498 sec (120283816891 ops) including child calls RT__ChLDW: at $03e729 (line 1); called 896525 times (4294967296 accelerated) 1.061123 sec (4301242971 ops) spent executing 1.886649 sec (120272532163 ops) including child calls BlkSize: at $03115a (line 27628); called 725826 times (4294967296 accelerated) 0.642042 sec (4297870600 ops) spent executing 2.155998 sec (120272149156 ops) including child calls HashCoreCheckResize: at $03b88d (line 33225); called 8000 times (4294967296 accelerated) 0.226893 sec (4296280743 ops) spent executing 3.530675 sec (120282662662 ops) including child calls INDEXED_TEXT_TY_Cast: at $031edf (line 28337); called 80 times (4294967296 accelerated) 0.183588 sec (4295847646 ops) spent executing 3.825970 sec (120283077175 ops) including child calls INDEXED_TEXT_TY_Say: at $03218d (line 28469); called 164 times (4294967296 accelerated) 0.170247 sec (4295868348 ops) spent executing 1.868206 sec (120270349406 ops) including child calls glk_put_char_uni: at $000564 (line 1390); called 149902 times (4294967296 accelerated) 0.096039 sec (4295417002 ops) spent executing 0.175487 sec (120259533994 ops) including child calls ^D
The short story is that time spent in BlkSize drops considerably: it runs nine or ten times faster (presumably because I used @shiftl in place of a loop), and is called about one fifth as often (likely because I opted to keep data in a single block and relocate it when space ran out, rather than overflowing to multiple blocks). But without more and more careful experiments, it’s not clear how much speed-up is contributed by @malloc.

Still, I can make some comment on how it scales with outstanding allocations. I was half-wrong yesterday: the terps do slow down as the number of outstanding allocations grows, but, using the story above and jump/quit, I watched the extension vs. no-extension runtime ratio steadily improve as the loop bound in the jump rule increased. So, it seems that the terps scale better than Inform’s code does.
Block Value Management via Malloc.txt (16.2 KB)