Augmenting 100% of the text with recorded audio

I am investing myself in augmenting a parser-based IF with recorded audio for 100% of the text for 1) accessibility and 2) richness and immersion.

I’ve been evaluating Inform 6, Inform 7 and Adventuron as candidate systems for making this happen. Best I can tell, I need override all standard commands and responses and implement them from scratch with the code to display the text and also play the audio files.

1) Does that conclusion sound correct? If someone has ideas for a less invasive way to implement recorded audio for 100% of the text, feel free to advise.

Inform 7:
2) Assuming the above approach is the best way (or the only way) to go about this, is there a simple way to de-list all standard commands in Inform 7 so that I can commence the laborious (but hopefully rewarding) task of tweaking all of the output rules to play the sounds when printing the text?

Inform 6:
Best I can tell, doing this in Inform 6 is theoretically straight forward, I just need to build it for Glulx or Vorple and update the standard library to play sounds when it prints text.
3) However, if there is a less hackish way to delist/replace/augment all of the output I’m all ears.

Adventuron:
It turns out that intercepting 100% of Adventuron’s output and augmenting it to play sound files was simple, I have a working prototype example here:
http://joshware.com/sandbox/if/adventuron/port-l.html
(this example uses recorded TTS, in an actual game it will have voice acting, sound effects and music)

Any thoughts are welcome.

I’ve thought about this before. In Inform 7, there quite a bit of variation in the standard responses. (Think "[The actor] [pick] up [the noun]" – depending on the number of characters and things, that could already make a couple hundred lines to record!) You’d need to make the responses less varied (or reduce the scope of the game to, say, not have other characters) in order to reduce the recording effort.

(I guess you could also splice recorded snippets together at runtime, but that would likely sound terribly janky.)

As for the implementation, I guess it would be simplest to outright replace the relevant sections of the Standard Rules (using the Section A - My Possession Actions (in place of Section SR4/2 - Standard actions concerning the actor's possessions in Standard Rules by Graham Nelson) syntax) and implementing your own rules as needed.

1 Like

Thanks for the reply!

Follow up question:
I observe the I7 handbook says that the standard library should not be directly edited:
https://www.musicwords.net/if/I7Handbook6x9.pdf, p450

…and elsewhere I7 documentation indicates that we can only change individual specific responses when we know what they are called:
http://inform7.com/changes/CI_7_6.html

…so it would appear that, in inform 7, I would need to name and delist every individual response rule in order to avoid accidentally leaving in commands that were unsupported by the audio recordings.

Is there a more elegant and or conclusively thorough way to do it? If not, it would seem that Inform 6 is the better choice between the two since the library is directly editable.

You can replace chunks of the standard rules wholesale, rather than one response at a time. (See chapter 27.26 of the manual.)

However, if you’re talking about rewriting every library response to remove variation, you’re removing most of what makes Inform valuable. A parser game is substantially less playable if every failure message reads the same. Also, what’s your strategy for lines like “You also see a brass lantern, an Elvish sword, and a sack (containing garlic)”?

Have you considered mixing pre-recorded audio with text-to-speech technology? People have been using TTS with IF since the technology first existed. It’s not as good as a voice actor, but that might not matter for “utility” lines like object lists and failure messages.

As for getting the audio to play at all: this is an interface change, fundamentally. It’s probably better to modify the UI at the interpreter level rather than in Inform code. Vorple might be your best bet.

2 Likes

…Mind you, neural-net speech synthesis is probably going to get there real soon now.

(Not for replacing voice actors, but for replicating a voice actor’s performance into an unlimited number of line variations.)

Thanks, I shall review!

My success with more complicated constructions in another related Adventuron POC has convinced me that it’s just a matter of being sufficiently AR when it comes to details of the content and the form, which is quite natural for me :wink: I’ll probably try I6+Vorple next and see how it works out.

I have but I’m enamored with implementing it as much as possible in the style of a 1930s radio drama. So at this exploratory phase, I’m trying for pure recorded audio.

Yes, there are some pretty impressive engines out there.

Thanks for the feedback!

Every 1930s radio drama hero needs a robot sidekick.

1 Like

But Mom! I don’t wanna rip off Penny Arcade :sob:

totally and absolutely AGAINST: I’m deaf.

Best regards from Italy,
dott. Piergiorgio.

That’s why I use the word “augment” - it will still have readable text and rich graphics like any other multimedia IF emphasizing accessibility. But if it’s not for you, it’s not for you. I understand and appreciate your perspective.

Vorple and Adventuron don’t have API for splicing audio (playing a sound after another sound); you’ll have to call custom JS functions.

I recommend you also look at INSTEAD because it does have a decent hackable English parser, a multi-channel sound API and a good audio format support. The docs have a “playlist” code sample, just replace snd.music_playing with snd.playing.

Thanks! I shall review INSTEAD :slight_smile:

Adventuron Beta has the ability to queue sequential sound files now, I used this feature in the successful POC linked above (http://joshware.com/sandbox/if/adventuron/port-l.html )

As for Vorple, I was thinking assembling a table of the audio files in sequence and using VorpleStartPlaylist may work but I haven’t tested it yet

Perhaps it will turn out that INSTEAD words instead :wink:

Sequence tables will work but you also need to anticipate how to stop the audio from playing if the player is typing faster than the narrator can keep up.

I was thinking I’d call VorpleStopSoundEffects() at the beginning of a fresh post-command output and rebuild the table. I did something like that in the Port-L POC and was pleased with the result. That way the player has the option of going faster than the narration if they wish (and/or if they have sound disabled or unavailable).