I think well scripted sound effect would add to anIF story. Maybe. It could also detract from the player’s imagination. Imagination and abstract thinking are hallmarks of IF. In the video above, the music is too much and in your face, IMHO.
Totally agree! If the avatar is on screen then it would take about 7 mins to respond to questions, but if it’s just audio then I think I could reply in a semi-decent cadence within a minute which wouldn’t be too egregious.
Oh yeah, for ambience it would be horrendous!
It’s done to mimic a late 2000s podcast, the music is lifted from some of Keith McNallys actual XO podcasts.
I’ve seen streams that are on a 1-minute delay for safety, or sometimes there’s buffering so chat is behind. That could play into the scenario - the chat notices or hears something the streamer does not and has to warn them and there’s tense minutes before he can read what they’re saying - like the video delay in Scream. It also justifies why the audio/video isn’t in sync with the player’s commands since the media would need to reach a branch point to take in to account any request/commands.
Yeah, the combinatoric explosion can be a total pain in the anatomy for narration, especially since I imagine it’s hard to splice one word voice clips into full sentences on the fly without it sounding choppy.
And yeah, some of the stuff that’s being done with AI voice synthesis is impressive and starting to climb the far side of the uncanny valley*, but it would kind of suck if we get to a point where IF either requires a fast internet connection or a Triple-A Gaming PC to run. At least, I don’t foresee top quality AI TTS reaching the point of running in real-time in CPU mode any time soon
*I use espeak-ng as my TTS engine of choice, partly because it’s the default on most Linux distros, but also because most of my experience with more so-called natural sounding TTS has been they work well for utilitarian things like reading filenames or system messages, but once they start trying to read anything longer, their inability to intonate and emote properly is actually more unsettling than a TTS voice that is clearly synthetic from the get go… At least that’s my opinion. There’s about as many opinions on which TTS engines and which specific voices/settings for each are the best for which applications as there are screen reader dependent blind persons… I can’t really put AI TTS into my hierarchy yet because its currently too expensive and inconvenient to try at scale on the user’s end, but my preference for reading books goes something like audio dramatization > audiobook > espeak > traditional natural sounding TTS.
And another advantage of text: You don’t have to worry about proper intonation and emoting. Picking up an ax off the ground because you need to chop some firewood is very different from picking up an ax off the ground because you need to fend off a wild animal. A text-only presentation can get away with presenting the line, “you pick up the ax” in the exact same manner regardless of context and let the player’s imagination fill in the details, but for audio narration, you’d probably want at least a calm and a tense delivery of that line based on whether the player is in danger at the time, and the wrong delivery for the situation could break immersion(which is kind of the point of adding the narration).
As for Asthma videos, my experience is most of them are so quiet that maxing out volume barely makes them audible, and puts my screen reader at dangerously loud levels(for the record, I only ever mess with the master setting in alsamixer, but 20% puts my screen reader at comfortable to listen to in a silent room, 30% is comfortable listening for most YouTube Videos in a silent room or my screen reader when my portable AC is running its compressor with the fan on high, 50% makes most YouTube videos comfortable listening with the AC and makes my screen reader shout. Asthma videos are sometimes barely audible at 100%. and that’s leaving YouTube’s volume slider at 100% at all times and leaving my screen reader’s volume at its default. I might could enjoy some of the ones that are meant to simulate a lover whispering in one’s ear, but the broken audio balance makes them unlistenable most of the time… and while my portable media player has separate volume settings for media and its TTS, cranking the media volume up to 15/15 still isn’t enough for some videos.
ANd the aforementioned AC is my preferred means of sleeping at night and drowning out my housemates when they play games, watch television, or listen to music at a volume that can be heard throughout the trailer… and on the rare cases it’s too cold for the AC to run all night, I tell Alexa to play fan noise(I’ve tried telling her to play white noise, but that’s too harsh to my ears… also, I’ve heard of pink and blue noise before, but green is a new one of me… I’m guessing it’s white noise with muted highs and lows and amplified mid frequencies.
It sounds like the ocean, or rivers flowing, maybe a hint of rain.
Adam
I liked the first two Secret of Monkey Island games better than many of the sequels. I finally figured out that the comic timing of their text was flawless. Even though they had professional voice-acting in their later games, it was never as snappy and awesome as the text captions over the character’s heads.
This so much so that I played most of Thimbleweed Park with voices off despite the great voice-acting.