Inform 7's (lack of) Speed

JimmyMaher · September 26, 2011, 3:27pm

So, I’m currently working with a partner to bring the second work of modern IF to the Kindle, and we’re having a problem with performance. In itself this is nothing new; I had to hack the accelerated opcodes into King of Shreds and Patches and spend hours optimizing the Java code at the interpreter level to get that game running at acceptable speed. Problem is, even those optimizations just aren’t quite enough for this project. With nothing left to optimize on the interpreter, I’m thrown back to the Inform library.

One thing we’ve noticed is that our “After deciding the scope of the player” rules sometimes fire dozens of times in a single turn. I suspect this to be a symptom rather than a cause. Is the game really looping through every object to determine scope that frequently every time it parses a command? If so, it seems to scream out for optimization – if nothing else, just build a list of applicable objects the first time through, and refer to that for the remainder of the parsing algorithm.

But then the Inform parser is a ridiculously complicated beast, and I’d prefer to hack on it as little as possible. Some of the brighter lights around these parts did quite a bit of Inform 7 profiling in years recently passed. I know this is damnably vague, but I’m just wondering if anyone can, say, point me to what THEY would look to first if they really needed to get some more speed out of Inform 7. Are there any other functions that spring to mind as good candidates for new accelerated opcodes? Any areas of parsing or world-modeling that are known to be bottlenecks?

Again, sorry for the vagueness of all this. Advice or ideas (even if similarly vague) would be greatly appreciated.

zarf · September 26, 2011, 3:58pm

Back through Inform 6, if someone’s game was running slow, the first question to ask was “Are you doing too much work in add_to_scope?”

The parser doesn’t loop through every object in the game when it’s doing that. It only loops through objects in the location, which is pretty efficient unless you’ve added a large class of objects to scope – or are doing an expensive test to decide what to add to scope. (The classic blunder is logic like “add X to scope if Y is/is not in scope”, which triggers recursive scope tests.)

All that aside, it’s worth using the profiling features of Glulxe (you’ll probably have to recompile the interpreter with profiling on) to see what’s eating the lion’s share of the time. Just because a function is called “dozens of times” doesn’t mean it’s the long pole. (At the bottom level of the code, I6 property-reading functions are called thousands of times per turn. They’re the ones I concentrated on for acceleration. But a bottom-level approach can only go so far.)

Ron_Newcomb · September 26, 2011, 8:31pm

Capmikee just posted an extension called Scope Caching which sounds like it might be useful to you.

In other news, I’m rewriting that parser in Inform 7 so we can see how the magic happens. AFAICT the add_to_scope thing isn’t used in Inform 7, though I’m not finished yet so could be wrong.

Ghalev · September 27, 2011, 4:22am

I’ve no skills to assist you with your technical challenges; just want to throw in a note of applause for you facing them.

JimmyMaher · September 27, 2011, 11:35am

Thanks for the suggestions!

I’m afraid that the scope algorithms don’t seem to be the source of the problem. Implementing Scope Caching yields no appreciable improvement.

So, I took Zarf’s advice and made a profiling build of Glulxe. It turns out the most expensive function is ProcessRulebook. No surprise there, I suppose. But that’s a… daunting one to try to implement as an accelerated function.

I did, however, find a few more modest candidates for acceleration. BlkSize in Flex.it6 seems particularly juicy. It’s a bottom-level function, with no calls to anyone else. Also, it seems to get called a whole lot during the “insert” command that is particularly slow. I’m going to start with implementing it as an accelerated opcode, and go from there.

zarf · September 27, 2011, 4:16pm

It’s worth looking at what functions call those low-level functions. If a particular path to BlkSize is a majority of the usage, you may be able to find a way to do the same thing without using the same low-level facility.

Like I said, accelerated opcodes only go so far. You’re a lot smarter than the compiler is.

capmikee · September 28, 2011, 3:02pm

Although it can be used to prevent the infinite recursion Zarf described, Scope Caching has no accelerating effect on builtin code. It’s primarily useful to speed up visibility tests that you need to perform in your own source.

e.g.:

Repeat with item running through not marked visible things:

could be considerably faster than:

Repeat with item running through not visible things:

capmikee · September 28, 2011, 3:04pm

I will follow this with interest, though. I’m using Juhana’s Parchment-Transcript tool to collect testing data for my WIP, and I hate that my testers have to wait so long for Parchment - sometimes even for seemingly innocuous commands.

JimmyMaher · September 30, 2011, 8:15am

So, I’ve implemented BlkSize and WhetherProvides as accelerated functions, with some modest improvements. Along the way, though, I’ve learned something I find pretty interesting.

The slowest command by far is “put.” Something like “put hat in chest” could take upwards of 17 seconds on the Kindle (down from almost 30 without acceleration; never say I’m not making progress ). And some investigation has revealed that it’s all in the parsing phase.

“Put” is a really complicated verb for the parser to deal with, because it can mean so many things. “Put hat in chest” parses to the Insert action; “Put hat on table” parses to Put On; “Put on hat” parses to Wear; “Put down hat” parses to Drop; etc. In addition, there are different ways to phrase each variation: “Put on hat” vs. “Put hat on”, etc. There are 19 separate syntax lines for “Put” in the game I’m working with, many of them quite complicated with prepositions and indirect objects. As it happens, “Put x in x” (leading to Insert, and one of the most common constructions of “Put”) is near the end of that group. The parser apparently loops through the syntax lines until it finds a match, and then quits. Thus the Wear variation, which comes at the beginning of the list, parses almost instantly, while Insert takes forever.

My first thought was to ask whether I could optimize the syntax lists, placing the most commonly used near the front. I went so far as to declare “the command ‘put’ as something new,” then add back in all the “Understand” lines from the library in what I judged to be a better order. This accomplishes nothing, however. The compiler apparently makes it own decisions about the proper order, placing the simplest syntax lines first and working up to the more complicated. With a command like “put,” this is a recipe for inefficiency in many cases – the complicated constructions already take much longer to parse, and then to put them at the end of the collection… A useful addition to future versions of Inform 7 might be some way to prioritize certain “Understand” lines as commonly used or having priority.

But given the situation as it is, it seems there are a few options for improving performance. One (kludgy) one is to try to help the game out by catching problematic inputs at the interpreter level and changing them to something the game can process more efficiently. Say, define “place” as a synonym for the Insert version of “put” (only), then substitute where appropriate in the interpreter. Besides being ugly, this is of course also problematic in that it requires us to do a substantial amount of parsing in the interpreter – exactly the job the parser in the game is supposed to be doing.

Another is to cut down on the number of syntax lines we define for problematic verbs like “put.” Since this also means cutting down on the number of possible inputs we understand from the player, it’s also not ideal.

The last solution is just to fix the damn parser, find some way to make it faster. But man, is it a complicated and opaque piece of code. I think a logical first step is to set up some performance logging in the parser itself, to try to figure out where it’s spending the majority of its time. Then we could perhaps target just that section(s) for optimization – maybe even an accelerated function if it seems manageable. I suspect that there’s some sort of Inform 7 hook in there that is causing a massive slowdown, as the Inform 6 parser never had these problems. I just don’t know where or what it might be.

Or maybe we learn something important from Ron’s ongoing reconstruction of the parser. That would be nice…

capmikee · September 30, 2011, 2:08pm

Once again, I suspect that calculating scope is the problem. I am pretty sure that scope is recalculated for every single grammar line. This is reasonable, because the action in the grammar line (and maybe the direct object too) can affect rules in the Deciding the Scope of Something activity. But it seems like there must be a way to make it more efficient.

You have a number of After Deciding the Scope of Something rules, don’t you? Do you perhaps have some Understand by a relation rules too? I think those affect scope calculations as well.

JimmyMaher · September 30, 2011, 2:59pm

I do have some “After deciding the scope of” rules. I tried commenting them out, but this makes no real difference in parsing time. Obviously running those rules will lead to slight delays, but they appear to be very slight indeed.

I don’t believe I have any understand by relation rules. There are things like “something preferably held” and “other things.” Nothing really unusual or that deviates a great deal from the standard rules for understanding.

I only have about 40 objects in scope (checked with the “scope” command), which doesn’t seem egregious at all.

zarf · September 30, 2011, 6:17pm

That’s correct. However, in a simple game, this is not enormously expensive. It should do one scope search per line, maximum.

(Sidebar: this is actually one scope search per noun before the first preposition. “put on X” can be eliminated if word 2 is not “on”, so no noun is needed. “give X Y” is an expensive case because the boundary between the two noun phrases requires trial-and-error to determine.)

A scope search should not be so expensive that it crimps your grammar-line writing style.

Really, you should test this. Build a game using the same “put” grammar, but just one room, one object, and no scope rules or noun-understand rules. Is it similarly slow? Then add 40 objects with simple names, but no new rules. Test speed again. Then maybe start adding long names and noun-understand declarations.

This will give you an idea where the time is going.

Beyond that, profile. You can’t solve this problem blind.

Ron_Newcomb · September 30, 2011, 10:21pm

Wow. But remember that if a noun has “on” in its name, both “put” lines match so their relative ordering is important.

If you’re interested in modifying Parser.i6t just for this one work, I might suggest looking at UnpackGrammarLines. A grammar line is stored in a compressed, VM-specific format, so before each line is tested against the command it needs to be “unzipped” to a VM-neutral format. You could modify things such that all grammar lines are unpacked into a big array when play begins. Glulx has the memory to spare.

Relatedly, see this bug for a quick hit.

Finally, there’s many array copies in Parser.i6t that could use opcodes for the task.

JimmyMaher · October 6, 2011, 2:36pm

So, I feel like I’ve mostly solved these problems. I just thought I’d report back here in case there is any interest in what I found. Basically, getting into an acceptable window of performance involved four strategies.

A new accelerated opcode for the BlkSize function.
Eliminated calls to WhetherProvides entirely through template hacking. This is NOT recommended for anything but polished, tested story files, as it does remove some sanity checking. It is, however, expensive sanity checking, and I judged it acceptable to do away with on the Kindle, which will only run well-tested story files.
Modified the concealed possessions routines. The standard parser runs the “deciding concealed possessions of” activity hundreds or thousands of times in the course of parsing a single command, and it is a huge performance drain. I modified the code to only do this once for each object in scope before parsing each verb, caching the results for later use.
Got rid of stuff like this in the Inform 7 source: “Understand ‘jail key’ as the jail-cell key when the player’s command matches the text ‘key’, case insensitively.” I’m not sure why the authors were doing things like this in the first place, but it’s VERY expensive for obvious reasons.

With all that, that problematic “put” command of mine has been reduced from almost 30 seconds to under 4. (And just about all other commands run in 2 seconds or less.) I still have to do some clean up and some profiling, and I want to have a look at Ron’s replacement of certain array routines with VM opcodes in his own parser project. But, I think the majority of the problem is solved.

Thanks for your advice, everyone!

zarf · October 6, 2011, 3:43pm

Thanks for posting this.

I’ll keep BlkSize in mind for acceleration. When I did my original profiling for the Glulx acceleration feature, I didn’t test any games that were heavy on the dynamic data. I will go back and look at that.

WhetherProvides is a good candidate for improvement. I agree that it shouldn’t be removed off the bat; I think that some valid coding patterns rely on it. But a hand-optimized assembly version would be valuable. So would analysis of call points to see if some of them don’t occur in valid code; those could then be ifdef’d out for release builds.

Concealed possessions surprises me. It hasn’t crossed my radar before as a pain point. However, I note that in an out-of-the-box I7 game, the concealed-possession rulebook is empty. This is a case which could really benefit if I7 super-optimized simple rulebooks and activities. (In the meantime, of course, your game can hack it out entirely if you’re not adding any concealed-possession rules.)

zarf · October 6, 2011, 4:01pm

By the way, the use of the deciding-concealed-possessions activity is more accurately characterized as “once per scope test per object”. So in a one-object game, it’s only called about a dozen times for a command. (There are always at least thirteen objects in scope: yourself and twelve compass directions.) As the room gets more crowded, the call count goes up.

JimmyMaher · October 6, 2011, 4:32pm

You’re right that it’s once per scope test per object. However, scope tests are actually done for every noun in every grammar syntax line, over and over again, via ScopeWithin and DoScopeActionAndRecurse. Thus for a command like PUT in this game, which has 17 syntax lines, it’s potentially done many, many times – once for every object (about 40), for every noun (one or two per syntax line, depending on whether an indirect object is taken), and for every syntax line (17). That really starts to add up to some serious performance drain for commands that fall toward the end of the potential syntax group for a complex verb, even if there are no or few rules actually in the concealed possession rulebook. I assume the parser is coded this way to allow for the HELD token.

This game does have a handful of concealed possession rules, so I can’t just throw it out entirely – and anyway I’m trying to build something that can be dropped easily into many games to “Kindleize” them. I did enough hacking and special casing to last a lifetime with King. Now it’s time to systemize this stuff a bit.

As I’ve modified the parser code, it now works as you describe – just once per object in scope, at the start of the command parsing.

capmikee · October 6, 2011, 5:40pm

I’d be interested in adding that code to the Scope Caching extension.

JimmyMaher · October 6, 2011, 5:44pm

Give me a few days to clean it up a bit more, and I’ll get it to you.

zarf · October 6, 2011, 6:20pm

Okay, I see what you mean.

I couldn’t make this happen with the standard library grammar on the “put” or “get” grammars – those don’t resolve at the bottom with a trivial game. However, “switch on obj” goes through three tests (for a carried thing), “switch obj off” goes through four, and “switch off obj” goes through five. Noted for future testing.