A Python script that extracts texts from an Inform project

Natrium729 · November 16, 2018, 2:01am

I wrote a Python script that extracts all the text from an Inform 7 project so that it is easy to proofread them. It also substitutes them as much as possible so that spell checkers are not troubled with the brackets inside the text.

You can download it here:

You’ll need Python 3. Type the following command in a terminal to run the script:

$ python i7extract.py MyProject.inform

There’s a bunch of options, all is explained in the README. You can also define custom substitutions with a JSON file.

I hope someone will find it useful!

aschultz · May 26, 2021, 4:52pm

This link looks down. In case it is lost for good, I’d just like to add this VERY rough code that helped me:

import re

with open("story.ni") as file:
  for (line_count, line) in enumerate(file, 1):
    if '"' not in line: continue
    quoted_text_ary = line.strip().split('"')[1:0:2]
    quoted_text = ' | '.join(quoted_text_ary)
    if remove_comments: quoted_text = re.sub("\[.*?\]", " ", quoted_text)

Obviously you can look for extension files, too, but these basics served me well for finding typos, etc.

zarf · May 26, 2021, 5:00pm

Zed · May 26, 2021, 7:02pm

Of related interest, glulx-strings pulls all readable strings out of any of: glulx, zcode (in or out of blorbs) or TADS 2 or 3. (glulx-strings.py is Python 2 that does just glulx/gblorb, but the README is literate CoffeeScript detailing the whole tale of a programmer having fun getting carried away with covering more cases.)

It’s operating on game files, not source, and you end up with lots of cruft and code fragments so it’s not suited for proofreading, but still handy.

Natrium729 · May 26, 2021, 7:47pm

I migrated from Bitbucket to GitLab when Bitbucket announced they’d no longer support Mercurial. I’ve updated the link in the original post. (And thanks zarf for giving the correct link!)

I think I’m the only one to have ever used this script, so I would be happy to know if it can help someone else!

Ben · May 26, 2021, 11:18pm

This is a nice script, thanks very much. I gave it a try and it was interesting to see the output from my project. I was planning to do something like this at some point as I think I’ll need it but now I don’t have to, great.

Although, the main thing I’d love to have would be to be able to segregate text in understand statements from text in say statements. Going through the output for my game feels a bit daunting as it seems so long because of all the understand words; it would be much shorter without. My python isn’t good and I’m not sure where to start but I’ve cloned the repo at least!

Natrium729 · May 27, 2021, 1:47am

That’s a good suggestion! It won’t be trivial, but it shouldn’t be too difficult either.

I filed it on the repository not to forget it. And of course, if you have other suggestions or find bugs, please report!

aschultz · May 27, 2021, 11:00pm

Wow! I wasn’t expecting the original author to stop by. People move on, and so forth.

I think I really enjoyed writing my own text extraction program, because it made me feel competent. But I’m really glad others have attacked it in more detail.

It’s also cool to see other people like @Ben asking for features. Sometimes I feel wonder if I’m the only person who might ask for a feature, and other times it’s cool to see what people think of that’d makethings easier.

@Zed, yes, glulx-strings is great. I’ve used it so often. The author got back to me really quickly after I found a bug in the z-code reading.

For when it still doesn’t quite work, txd.exe and mrifk.exe (utilities from ifarchive.org) tend to fill in a lot of the gaps.