Chapter 20: Advanced Text
This is a smaller chapter, with only 7 or so examples. (edit: turns out there were more than that!)
Section 20.1 is Changing Texts.
This chapter will deal with stuff like regular expressions, matching snippets, selecting text letter by letter, like so:
if character number 1 in "[time of day]" is "1", ...
Apparently Inform used to have ‘text’ and ‘indexed text’, but since 2012 (before I started coding in Inform) it’s just all ‘text’.
Section 20.2 is Memory Limitations. It mentions that Inform runs on virtual machines (one reason it’s so platform-robust) that are pretty small.
Using too much text can make your memory overflow! In glulx, the game just grabs more memory. But Z-machine games need extra memory declared, like this:
Use dynamic memory allocation of at least 16384.
Next, there is a maximum text length of around 1000 characters. I swear I actually hit that once, but I don’t remember why; I usually hate having multiple pages of text and try to avoid ever printing more than one screen at once, but I think there was some poem or something that I was printing…Oh, I think I made the xyzzy response in one game cycle between every single XYZZY response recorded by David Welbourn. I didn’t know you could increase the text length so I just split it into smaller strings and cycled through the smaller strings which cycled through even smaller ones. I’m pretty sure I took it out of the finished game, though.
Another limitation is that the Z-Machine can only use ‘ZSCII’ characters, which are mostly ASCII.
Section 20.3 is Characters, words, punctuated words, unpunctuated words, lines, paragraphs.
Here we can select specific characters in a string:
character number 8 in "numberless projects of social reform"
or count the number of characters in a string:
number of characters in "War and Peace"
number of characters in ""
Or check if a text is the empty string:
if the description of the location is empty, ...
We can also select single words out of the text:
word number 3 in "ice-hot, don't you think?"
(this would produce ‘don’t’, since we slice it along spaces, punctuation, etc.)
We can find the number of words in the text:
number of words in "ice-hot, don't you think?" (which gives the number 5)
We can also count punctuation as a word:
punctuated word number 2 in "ice-hot, don't you think?" gives the hyphen -.
The punctuated words here are “ice”, “-”, “hot”, “,”, “don’t”, “you”, “think”, “?”. If two or more punctuation marks are adjacent, they are counted as different words, except for runs of dashes or periods: thus “,” has two punctuated words, but “–” and “…” have only one each. If the index is less than 1 or more than the number of punctuated words in the text, the result is an empty text, “”.
Similarly, you can count the number of punctuated words:
number of punctuated words in "ice-hot, don't you think?"
We can also deal with unpunctuated words:
unpunctuated word number 1 in "ice-hot, don't you think?"
number of unpunctuated words in "ice-hot, don't you think?"
I guess the difference between just ‘word’ and ‘unpunctuated word’ is that ‘ice-hot’ is two normal words and one unpunctuated words.
I’ve only used this machinery once, in a puzzle with levers corresponding to individual letters where a letter can’t repeat:
LetterTyping is an action applying to one thing and one topic.
Understand "set [letter-levers] to [text]" as lettertyping when the hideous-contraption is in the location.
The letterlevertext is some text that varies.
Tempchecker is a number that varies. Tempchecker is 0.
Firstchar is some text that varies.
Secondchar is some text that varies.
Carry out lettertyping:
let tempword be the topic understood;
if the number of words in the topic understood > 1:
say "You can only set the levers to one word at a time!";
otherwise:
now tempchecker is 0;
let wordlength be the number of characters in tempword;
repeat with currentnum running from 1 to wordlength:
repeat with comparenum running from 1 to wordlength:
now firstchar is the character number currentnum in tempword;
now secondchar is the character number comparenum in tempword;
if currentnum is not comparenum:
if "[firstchar]" matches the text "[secondchar]":
increment tempchecker;
if tempchecker > 0:
say "You can only set the levers to a word without repeated characters!";
otherwise:
say "You set the levers to '[the topic understood]'. [randomsloth].
You can also select line numbers (which I didn’t know!)
line number (number) in (text) ... text number of lines in (text) … number`
These refer to explicit line breaks, since we won’t know how big the player’s monitor will be. Similarly:
paragraph number 3 in ...
number of paragraphs in ...
Section 20.4 is Upper and Lower case letters.
It mentions that some letters in some languages don’t have upper case or lower case equivalents or have multiple ones.
We can test for case:
if...is in lower case:
if...is in upper case:
And you can change the text to another case:
...in lower case
...in upper case
...in title case (which only capitalises the first letter of each word)
...in sentence case (only capitalises first letter of each sentence)
I’ve found that Inform really struggles with combing a description and case changes all at once (so if you say something like [the description of the book in Upper Case] or [a list of things enclosed by the room in lower case], more than half of the time it just doesn’t compile for me. I haven’t checked these specific examples. So I generally make a temporary text variable like ‘let X be the description of the book’ and then print ‘x in upper case’).
Accents (for greek, for instance) is perserved when changing case.
Title and sentence casing cannot recognize proper nouns.
All command input automatically makes input lower case.
There follows a bizarre account of how ÿ exists only in lower case in the Z-machine, which I hope will never be something I need to know in the future (maybe the $500 Inform Jeopardy question).
We now have two of the very few examples in this chapter.
Example 411 (nice) is Capital City:
To say the player's capitalised surroundings:
let the masthead be "[the player's surroundings]" in upper case;
say the masthead.
When play begins:
now the left hand status line is "[the player's capitalised surroundings]".
Hmmm, this actually explains that weird hack I always did! They say:
Not much is needed for this. The only noteworthy point is that it doesn’t work by changing the LHSL to “[the player’s surroundings in upper case]”: it cannot do this because “the player’s surroundings” is not a value. Instead, “[the player’s surroundings]” is a text substitution sometimes printing the name of a room, sometimes printing “Darkness”, and so on. We must therefore load it into a text first, and then apply “…in upper case”.
Nice to know there wasn’t any better way!
Example 412 is Rocket Man, which uses a similar hack:
Instead of going somewhere from the spaceport when the player carries something:
let N be "[is-are the list of things carried by the player] really suitable gear to take to the moon?" in sentence case;
say "[N][paragraph break]".
The Spaceport is a room. North of the Spaceport is the Rocket Launch Pad. The player carries a stuffed bear, a chocolate cookie, and a book.
The description of the book is "It is entitled [italic type]Why Not To Take [sentence cased inventory] To The Moon[roman type]."
To say sentence cased inventory:
let N be "[a list of things carried by the player]" in title case;
say "[N]".
Section 20.5 is Matching and exactly matching
‘Matching’ in Inform means inclusion:
if "[score]" matches the text "3", ...
(this just texts if the number 3 is anywhere in the text)
if the printed name of the location matches the text "the", ... just checks if the lower-case string of letters ‘the’ occurs in the text, even inside other words, but it doesn’t match upper case letters! Unless you say:
if the printed name of the location matches the text "the", case insensitively: ...
We can check for inclusion both ways (i.e. the two texts having the same contents) with:
if "[score]" exactly matches the text "[best score]", ...
This is not ‘equality’, since equality means that they are always the same text, while exactly matching just means they are at this moment equal.
You can also count the number of times it matches:
number of times "pell-mell sally" matches the text "ll" = 3
number of times "xyzzy" matches the text "Z" = 0
number of times "xyzzy" matches the text "Z", case insensitively = 2
number of times "aaaaaaaa" matches the text "aaaa" = 2
Hmm, that would have made my code from before nicer with making sure no letters are repeated.
Section 20.6 is Regular expression matching.
Regular expressions are a standard thing in most text files, and can be super useful (you can also make fractals out of them! You can look up Finite Subdivision Rules to see some).
We can use them for search like so:
if "taramasalata" matches the regular expression "a.*l", ...
(which we can add the words ‘case insensitively’ to if we desire)
or we can say ‘exactly matches’, with or without ‘case insensitively’.
Or ‘the number of times … matches the regular expression …’
Since regular expressions can match a lot of things, it can be useful to figure out what exactly got matched. Right after using a regex, you can say text matching regular expression:
if "taramasalata" matches the regular expression "m.*l": say "[text matching regular expression].";
The section then goes on to explain how regular expressions work, which is a complicated subject and one probably better learned somewhere else through the numerous tutorials, only returning to this page to see exactly what notation Inform uses.
Example 413 is useful for beta testing:
After reading a command (this is the ignore beta-comments rule):
if the player's command matches the regular expression "^\p":
say "(Noted.)";
reject the player's command.
This is better than my current stuff, which I never noticed had a typo in it to prevent it working:
Understand "* [text]" as a mistake ("Noted.").
Understand "*#[text]" as a mistake ("Noted.").
(There shouldn’t have been a space after the asterisk, but oh well).
Example 414 is an explanatory essay about how Inform’s regex was chosen. It’s basically a stripped down version of the PCRE regular expression. It omits things like carriage returns, character codes, etc, and specifies which test cases Inform was unable to match well.
Section 20.7 is Making new text with text substitutions.
It mentions two older ways we learned how to write text:
say "The clock reads [time of day].";
To decide what text is (T - text) doubled:
decide on "[T][T]".
and you can use them like this:
let the Gerard Kenny reference be "NewYork" doubled;
Now, the reason all of this is discussed is because when we set something equal to a substitution, Inform quietly decides whether to bring over the internal logic of the substitution or just print it out once and keep it that way forever.
Here’s how it decides:
What's going on here is this: Inform substitutes text immediately if it contains references to a temporary value such as "T", and otherwise only if it needs to access the contents. This is why "[time of day]" isn't substituted until we need to print it out (or, say, access the third character): "time of day" is a value which always exists, not a temporary one.
If we want, we can exclusively use the ‘print out exactly what it says now and keep it that way’ version by saying ‘the substituted form of’:
now the accumulated tally is the substituted form of "[the accumulated tally]X";
You can test if text has been substituted or not:
now the left hand status line is "[time of day]";
if the left hand status line is unsubstituted, say "Yes!";
An amusing in-text example is given:
The player is holding a temporal bomb.
When play begins:
now the left hand status line is "Clock reads: [time of day]".
After dropping the temporal bomb:
now the left hand status line is the substituted form of the left hand status line;
say "Time itself is now broken. Well done."
The last three examples are now given.
Example 415 is Identity theft:
The player's forename is a text that varies. The player's full name is a text that varies.
When play begins:
now the command prompt is "What is your name? > ".
To decide whether collecting names:
if the command prompt is "What is your name? > ", yes;
no.
After reading a command when collecting names:
if the number of words in the player's command is greater than 5:
say "[paragraph break]Who are you, a member of the British royal family? No one has that many names. Let's try this again.";
reject the player's command;
now the player's full name is the player's command;
now the player's forename is word number 1 in the player's command;
now the command prompt is ">";
say "Hi, [player's forename]![paragraph break]";
say "[banner text]";
move the player to the location;
reject the player's command.
Instead of looking when collecting names: do nothing.
Rule for printing the banner text when collecting names: do nothing.
Rule for constructing the status line when collecting names: do nothing.
Example 415 is Mirror, mirror:
The player carries a magic mirror. The magic mirror has a text called the mirror vision.
To erase the mirror: now mirror vision of the mirror is "The mirror is polished clean, and has no impression upon it."
To say current room description: try looking.
To expose the mirror:
say "The mirror shines momentarily with a dazzling light.[paragraph break]";
now mirror vision of the mirror is the substituted form of "The hazy image in the mirror preserves a past sight:[line break][current room description]All is distorted and yet living, as though the past and present are coterminous in the mirror."
Understand "hold up [something preferably held]" or "hold [something preferably held] up" as holding aloft. Holding aloft is an action applying to one carried thing. Report holding aloft: say "You hold [the noun] aloft."
Instead of rubbing the mirror: erase the mirror; try examining the mirror. Instead of holding aloft the mirror: expose the mirror.
Example 417 is The Cow Exonerated:
This example doesn’t really seem to have anything to do with the chapter, besides the fact that it mentions we can’t have objects called ‘matches’ since that’s part of Inform’s matching phraseology. But the example has some neat tricks, like this:
Every turn:
let N be 0; [here we track how many matches are being put out during this turn, so that we don't have to mention each match individually if several go out during the same move]
repeat with item running through flaming s-matches:
decrement the duration of the item;
if the duration of the item is 0:
now the item is burnt;
now the item is unlit;
if the item is visible, increment N;
if N is 1:
say "[if the number of visible flaming s-matches is greater than 0]One of the matches [otherwise if the number of burnt visible s-matches is greater than 1]Your last burning match [otherwise]The match [end if]goes out.";
otherwise if N is greater than 1:
let enumeration be "[N in words]";
if N is the number of visible s-matches:
if N is two, say "Both";
otherwise say "All [enumeration]";
otherwise:
say "[enumeration in title case]";
say " matches go out[if a visible strikable-match is flaming], leaving [number of visible flaming s-matches in words] still lit[end if]."
Section 20.8 is Replacements
You can edit a text (that you’ve defined via let or so on) with some of the following phrases:
let V be "mope";
replace character number 3 in V with "lecul";
say V;
says “molecule”.
let V be "Does the well run dry?";
replace word number 3 in V with "jogger";
say V;
says “Does the jogger run dry?”.
let V be "Frankly, yes, I agree.";
replace punctuated word number 2 in V with ":";
say V;
says “Frankly: yes, I agree.”.
let V be "Frankly, yes, I agree.";
replace unpunctuated word number 2 in V with "of course";
say V;
says “Frankly, of course I agree.”.
And replace line number ... in ... with... and paragraph number work as well.
You can also replace all instances of a text:
replace the text "a" in V with "z"
If you only want to replace whole words, we do it as so:
replace the word "Bob" in V with "Robert"
Instead of saying ‘word’ we can also say ‘punctuated word’ or ‘regular expression’.
When replacing regular expression (with the ‘case insensitively’ option or not), you can use special characters to print back part of the expression matched.
\0 is the exact text matched, while \1,\2,\3, up to \9 represent the different clusters of symbols matched. Two examples:
replace the regular expression "\d+" in V with "roughly \0"adds the word “roughly” in front of any run of digits in V, because \0 becomes in turn whichever run of digits matched. And
replace the regular expression "(\w+) (.*)" in V with "\2, \1"performs the transformation “Frank Booth” to “Booth, Frank”.
Putting an ‘l’ or ‘u’ in between the backslash and the number forces it to be upper case or lower case respectively.
Oh! It looks like I missed a few examples when I browsed earlier. There are a ton here!
Example 418 is Blackout:
Rule for printing the name of a dark room:
let N be "[location]";
replace the regular expression "\w" in N with "*";
say "[N]".
This just censors the name of a room with a matching number of asterisks.
Example 419 is Fido:
A dog is an animal in Back Yard. The dog has some text called the nickname. The nickname of the dog is "nothing". Understand the nickname property as describing the dog.
Rule for printing the name of the dog when the nickname of the dog is not "nothing":
say "[nickname of the dog]"
Naming it with is an action applying to one thing and one topic. Understand "name [something] [text]" as naming it with. Check naming it with: say "You can't name that."
Instead of naming the dog with "nothing":
now the nickname of the dog is "nothing";
now the dog is improper-named;
say "You revoke your choice of dog-name."
Instead of naming the dog with something:
let N be "[the topic understood]";
replace the text "'" in N with "";
now the nickname of the dog is "[N]";
now the dog is proper-named;
say "The dog is now known as [nickname of the dog]."
Example 430 is Igpay Atinlay, which turns the player’s comman into pig latin:
After reading a command:
let N be "[the player's command]";
replace the regular expression "\b(<aeiou>+)(\w*)" in N with "\1\2ay";
replace the regular expression "\b(<bcdfghjklmnpqrstvwxz>+)(\w*)" in N with "\2\1ay";
say "[N][paragraph break]";
reject the player's command.
Example 421 is Mr. Burns’ repast:
Rule for printing the name of the unknown fish:
if the supposed name of the unknown fish is "", say the printed name of the unknown fish;
otherwise say the supposed name of the unknown fish.
After reading a command:
if the unknown fish is visible and player's command matches the regular expression "\b\w+fish":
let N be "[the player's command]";
replace the regular expression ".*(?=\b\w+fish)" in N with "";
now N is "[N](?)";
now the supposed name of the unknown fish is N;
respond with doubt;
reject the player's command;
otherwise if the unknown fish is visible and the player's command includes "[fish variety]":
now supposed name of the fish is "[fish variety understood](?)";
respond with doubt;
reject the player's command.
To respond with doubt:
say "You're not [italic type]sure[roman type] you're seeing any such thing."
This reminds me of a project I abandoned, which was to make ‘the game with no pictures’, a play on BJ Nowak’s ‘the book with no pictures’, and do fun responses to text like this, but I didn’t know how to do any of this at the time. Could be fun to revisit eventually.
Example 422 is Northstar:
After reading a command:
let N be "[the player's command]";
replace the regular expression "\b(ask|tell|order) (.+?) to (.+)" in N with "\2, \3";
change the text of the player's command to N.
Example 423 is Cave-troll:
This example is by John Clemens.
This removes some stuff from a player’s command if it comes after a part that was already understood:
Rule for printing a parser error when the latest parser error is the only understood as far as error and the player's command matches the text "with":
now the last command is the player's command;
now the parser error flag is true;
let n be "[the player's command]";
replace the regular expression ".* with (.*)" in n with "with \1";
say "(ignoring the unnecessary words '[n]')[line break]";
replace the regular expression "with .*" in the last command with "".
Rule for reading a command when the parser error flag is true:
now the parser error flag is false;
change the text of the player's command to the last command.
Finally, section 20.9 summarizes regular expressions in Inform’s house style, which I suppose I could include here:
Summary
Positional restrictions
| ^ | Matches (accepting no text) only at the start of the text |
|---|---|
| $ | Matches (accepting no text) only at the end of the text |
| \b | Word boundary: matches at either end of text or between a \w and a \W |
| \B | Matches anywhere where \b does not match |
Backslashed character classes
| \char | If char is other than a-z, A-Z, 0-9 or space, matches that literal char |
|---|---|
| |For example, this matches literal backslash "" | |
| \n | Matches literal line break character |
| \t | Matches literal tab character (but use this only with external files) |
| \d | Matches any single digit |
|---|---|
| \l | Matches any lower case letter (by Unicode 4.0.0 definition) |
| \p | Matches any single punctuation mark: . , ! ? - / " : ; ( ) { } |
| \s | Matches any single spacing character (space, line break, tab) |
| \u | Matches any upper case letter (by Unicode 4.0.0 definition) |
| \w | Matches any single word character (neither \p nor \s) |
| \D | Matches any single non-digit |
|---|---|
| \L | Matches any non-lower-case-letter |
| \P | Matches any single non-punctuation-mark |
| \S | Matches any single non-spacing-character |
| \U | Matches any non-upper-case-letter |
| \W | Matches any single non-word-character (i.e., matches either \p or \s) |
Other character classes
| . | Matches any single character |
|---|---|
| <…> | Character range: matches any single character inside |
| <^…> | Negated character range: matches any single character not inside |
Inside a character range
| e-h | Any character in the run “e” to “h” inclusive (and so on for other runs) |
|---|---|
| >… | Starting with “>” means that a literal close angle bracket is included |
| |Backslash has the same meaning as for backslashed character classes: see above |
Structural
| Divides alternatives: "fish | fowl" matches either | ||
|---|---|---|---|
| (?i) | Always matches: switches to case-insensitive matching from here on | ||
| (?-i) | Always matches: switches to case-sensitive matching from here on |
Repetitions
| …? | Matches “…” either 0 or 1 times, i.e., makes “…” optional |
|---|---|
| …* | Matches “…” 0 or more times: e.g. “\s*” matches an optional run of space |
| …+ | Matches “…” 1 or more times: e.g. “x+” matches any run of "x"s |
| …{6} | Matches “…” exactly 6 times (similarly for other numbers, of course) |
| …{2,5} | Matches “…” between 2 and 5 times |
| …{3,} | Matches “…” 3 or more times |
| …? | “?” after any repetition makes it “lazy”, matching as few repeats as it can |
Numbered subexpressions
| (…) | Groups part of the expression together: matches if the interior matches |
|---|---|
| \1 | Matches the contents of the 1st subexpression reading left to right |
| \2 | Matches the contents of the 2nd, and so on up to “\9” (but no further) |
Unnumbered subexpressions
(# …) Comment: always matches, and the contents are ignored
(?= …) Lookahead: matches if the text ahead matches “…”, but doesn’t consume it
(?! …) Negated lookahead: matches if lookahead fails
(?<= …) Lookbehind: matches if the text behind matches “…”, but doesn’t consume it
(?<! …) Negated lookbehind: matches if lookbehind fails
(> …) Possessive: tries to match “…” and if it succeeds, never backtracks on this
(?(1)…) Conditional: if \1 has matched by now, require that “…” be matched
(?(1)… …) Conditional: ditto, but if \1 has not matched, require the second part
(?(?=…)… …) Conditional with lookahead as its condition for which to match
(?(?<=…)… …) Conditional with lookbehind as its condition for which to match
IN REPLACEMENT TEXT
| \char | If char is other than a-z, A-Z, 0-9 or space, expands to that literal char |
|---|---|
| |In particular, “\” expands to a literal backslash "" | |
| \n | Expands to a line break character |
| \t | Expands to a tab character (but use this only with external files) |
| \0 | Expands to the full text matched |
| \1 | Expands to whatever the 1st bracketed subexpression matched |
| \2 | Expands to whatever the 2nd matched, and so on up to “\9” (but no further) |
| \l0 | Expands to \0 converted to lower case (and so on for “\l1” to “\l9”) |
| \u0 | Expands to \0 converted to upper case (and so on for “\u1” to “\u9”) |