Tool and corpus for analyzing narrative structure in Twine stories

datalexic · April 16, 2019, 5:44am

As part of a personal data science project, and as a fun way to contribute to IF writing tools and research on IF narrative structure, I’ve developed a tool (twine-graph) for producing structured json files and automatically laid-out graphviz visualizations of link structure from html Twine stories.

The project (twine-story-clustering) culminated in using the tool to create a research corpus of Twine stories that have been entered over the years into IFComp and Spring Thing (159 total), in both json and pdf formats, and writing a Jupyter notebook that runs a clustering analysis to visualize which stories are similar according to structural features.

My hope is for the tool and corpus to be useful to the community in a couple of different ways. The tool might be relevant to command-line savvy authors as a way to visualize in-progress or completed story structure through graphviz’s smart, automatic layout algorithm. The corpus, meanwhile, could be an interesting resource for ongoing research into the use of narrative structure in Twine stories—trends in narrative approaches over time, differences across competitions, and more.

Any feedback is appreciated. Cheers!

AvB · April 16, 2019, 9:32am

This is neat! The visualizations show all the passage links as edges, even those that are not usually shown as such in the twine editor, though not all delayed branching, but that might be asking a bit much.

In the structure clustering I notice that stories published in the same competition often end up close to each other. First thought: this may have to do with the changing twine syntax itself driving the design.

In the feature clustering, my two own entries end up right next to each other, even though they are considered quite different in the structure analysis. I thought they were far more different in style and theme than in structure, but the algorithm saw through all of the smoke and mirrors and identified me as the author of both. So much for using pseudonyms in our brave new world. Then again, as you noted, other stories do cluster by theme.

datalexic · April 16, 2019, 6:38pm

Thanks for the feedback!

Interesting points about the passage links. I hadn’t realized that this tool shows more links than in the editor; maybe this is due to creating edges for display links that show a different passage’s text inside an existing passage. And yeah, variable-dependent links aren’t currently shown, which leads to some missing edges in stories that use them—it might require a bit of variable tracking to support them.

The finding that story structure seems to have evolved a bit over time is very cool, and might be worth exploring further. And it’s interesting that two stories by the same author, but with different subject matter, were clustered together when using text representations. While I removed stop words and punctuation, there might still be a bit of an authorial word usage fingerprint (perhaps adjective and adverb usage) that shows up in the representation, in addition to a more semantic/content-related part.

mathbrush · April 17, 2019, 2:53am

I found this all very interesting. Thank you for creating this tool!