Recent developments of voice controlled IF

Hey all. I’m an infrequent haunt of these forums, never sticking around long enough to really become familiar, but never losing interest enough to stay away. As a result, I haven’t kept tabs on developing IF technology.

I am curious about the feasibility of an IF game controlled entirely by voice. Perhaps a verbal signal would be given (some equivalent of “Hey Google”), or perhaps a button would be pressed, followed by the prompt for the game.

Possibly the experience would be made more immersive, as the response could be a combination of voice acting and sound effects. This could also allow enjoyment of the game without hands, such as while driving.

The famous ChatGPT Skyrim mod utilizes the Whisper AI to convert speech to text. Has something like this been attempted before, or is this direction for IF being developed right now? What should I look into to find out more about this idea?


There was a pretty long thread about this recently. I’ll try to find it; one sec…


Okay, so this thread kinda had two parallel topics, but the point where AI-powered voice-handling got introduced was around here.


Thanks for this post. However, I don’t think it quite is what I’m looking for. In standard IF, all allowable inputs are predetermined and handled. This post discusses something more like AI dungeon, where the game itself is generatively created based on inputs.

I’m looking for a way to control a standard, story driven game with the actual typing replaced by speaking, and the actual reading replaced by listening. Basically, can there be a truly auditory gameplay experience?

If there are examples of this being done well, I’d love to hear of titles. And if there are tools in existence already to accomplish this, I would like to hear of them.

Also, if I didn’t read the thread thoroughly enough and it does talk about this, I apologize! Thanks for your time.


Are you looking for something more than just playing a normal parser-IF game with a screenreader for output and Talon or Dragon or whatever for input?

Edit: IIRC there is stuff around interactive audio fiction but I don’t think any of that really happens here. I feel like Austin Auclair or someone did a NarraScope talk one year? I’m not turning that up on a quick search, hrmph. Tends to be more commercial stuff and requires a bit of a budget rather than the hobbyist text stuff that’s more common here…


Yeah, I’m pretty sure that this was in the thread. The discussion included both one possibility where the whole thing was LLM, and another possibility where the game was using a standard, author-made game model, and the LLM was just trying to match user input to the closest-available author-implemented command (if memory serves; I jumped into the thread about halfway through).

Then there was a discussion on some possible accessibility drawbacks of voice-controlled IF, and that a keyboard fallback should always be made available to the player, regardless of if an LLM is involved in any way or not. This was started about when I jumped in, which is why the link lands partway through.


Feldo organises a yearly sound-based jam, and I remember him mentioning one IF-like game. (I think it’s the 2020 one, most years, there hasn’t been any entry)


Several interpreters work with screen readers. I think the general wisdom is to focus on compatibility with screen reader systems rather than doing bespoke audio input/output.

I see. And if the main selling point of the game was the immersion of sound effects and voice acting integrated with voice commands, would it no longer be considered IF even if the mechanics were the same idea? Would the prominence of the audio in the game make it a different genre of game?

I think some of us would be interested in hearing about it if you made something: this forum “promotes a broad definition of Interactive Fiction” – but if you’re looking for other people working on similar things, or even for pointers to who is working on this kind of thing or what has and hasn’t been tried, it’s enough of an outlier that I suspect you’re not going to find much here, that’s all.


Ah, well I don’t think anyone would consider it not to be IF, and we’d love to see more of it. A few IF works have sound effects or music, but voice acting is far beyond most projects’ budgets.


Ah, gotcha. Thanks

Two other past threads here came to mind—here’s one discussing the possibilities and drawbacks:

And one where someone shared an example game:

(Edit: Actually I’m not sure if the example game is voice-controlled or just audio with regular player input, as I haven’t played it.)


This seems like the most useful thing. An AI/LLM could be trained on a game and serve as the “interpreter” between the player and the parser so it could actually improvise those new-player commands that make sense to them WALK OVER TO THE TABLE; the AI could probably judgmentally translate that to EXAMINE TABLE and give the parser response. Or reallize the player is trying to walk everywhere and suggest, “If you were going to walk a compass direction, which way would it be? You can actually go north, south, or northwest from here?”

Basically a “conversational” parser, which makes more sense with a voice-bank like Siri instead of voice actors and pre-recorded lines. Maybe there might be something in actors recording for speech-banks that are more conducive to storytelling inflections rather than GPS navigation.

AudioIF doesnt work properly becouse you can’t save/load. Try it and comment here.