Slowdown - and regular expressions

severedhand · November 7, 2015, 2:58pm

I added some regular expression searching to my CYOA project and suddenly it slowed way down. In the IDE, 7-8 seconds to process a node of 10 choices, 2 seconds in Gargoyle.

The way it works is:

[rant]Say you have a choice text like this:

“Give the wheat to the rats.”

I’ve set up a mode where you can put a tilde before the character you want to be the triggering key.

So input “Give the ~wheat to the rats” will become the choice “Give the (W)heat to the rats”, if key W is still available at the time.

Failing that, or if no tilde was used, the program seeks to assign the first character of the text if it’s alphanumeric, in this case ‘G’. If that’s already used too, the program assigns a ring-in from a pool of available characters. So there’s a lot of list work as well.

Basically I was running the whole reg ex search (to identify the tilde and make sure the character after it was typeable) on each choice text in the node. Then I realised I didn’t need to run the whole search unless the tilde was at least present in that text. So I used Inform’s phrase ‘if the text YADA matches “`”’ (the tilde being the signal character) to screen lines out. If it sees a tilde, only then does it do the reg ex search.

This brought execution time back to reasonable (1 second in the IDE, a fraction of that in Gargoyle.)

However!.. that’s 1 second for 1 reg ex match in an example including only 1 tilde. The point of this feature is that the author could put a tilde in every choice if they wanted to, in which case execution time would blow out again. 2 seconds is already pretty unacceptable in Gargoyle for 10 choices, and I expect it would be worse online.

I suppose this means I need to ask around here for regular expression advice, or alternative approaches and ways to save horsepower.

(I should begin by saying I’m no reg e(X)pert, so if you see more efficient expressions of my expressions, please elucidate).

I don’t see a way around using reg ex’s per se, since I need to identify spots in the text, test characters for typeability, and also replace them with brackets and things when printing the option onscreen.

So at the moment, I first screen a prose for a tilde the fast way:

if cyoa-prose matches the text "~":

And if I find one, I subject the line to this:

if cyoa-prose matches the regular expression ".*\~(<\d\l\u>).*":

The goal of this is: find the first tilde, and it should be followed by a single digit / lower case letter / upper case letter, and then match to the end of the line

The reason I keep matching to the end is to prevent multiple matches within the line (only because they lead to multiple replacements). Otherwise, in a case like this:

"Give Buddha the ~yingyang."

if I then replace like so (which I am) -

replace the regular expression "\~[text matching subexpression 1]" in prose with "(Y)"

I would get “Give Buddha the (Y)ing(Y)ang”
when what I want is “Give Buddha the (Y)ingyang.”

Finally, if I didn’t find a tilde in the first place, I check that the first character is typeable by my standards (0-9, a-z or A-Z) with this reg ex check:

if cyoa-prosework matches the regular expression "^(<\d|\l|\u>)":

This means that at the moment, in this mode, each choice text is subjected to a fast search for a tilde, followed by one of either (a) the slow search for a tilde and particular character, or (b) the search for a typeable character at the head of the text (I don’t know if this one is very slowing?)[/rant]
So, thanks much if you wrap your head around all that, and can offer any advice.

-Wade

Draconis · November 7, 2015, 3:42pm

For the tilde part, you can speed it up a bit like this:

replace the regular expression "\~(\w)" in the selected text with "\(\u1\)"

That makes the matching significantly simpler, and only runs through the text once rather than twice. (\w is a “word character”, i.e. anything except spacing or punctuation; \u1 means “subexpression 1 in upper case”.)

For the first letter part, you can do something similar, or alternately give the regex parser less to work with:

if character number 1 in the selected text matches the regular expression "(\w)", replace character number 1 in the selected text with "([character number 1 in the selected text in upper case])"

I don’t know which ends up being faster. But the first one was really your problem, since you were running through the whole thing multiple times for parsing.

severedhand · November 8, 2015, 2:01am

Thanks! I applied the above approaches to all the reg ex moments and the time came down to 2 seconds for a 10 choice node (where every choice had a tilde in it) in the IDE. This means that offline it’s basically instant. Online, it’s now about the same as the IDE, 2 seconds. But I notice that there’s already a bit of a delay in the IDE, even with the tilde mode off, that must be due to other churning stuff that I’m doing every turn. I can probably get that stuff to be more efficient. And overall, we’re back in the realms of overall acceptability.

-Wade