Approaches to sound in The Lurking Horror

Mike_G · June 21, 2024, 3:50pm

I’m looking for some feedback.

EDIT: Sorry - For clarity this is an interpreter written as a binary library. I am not writing an Inform library.

I went back and looked at how I handled sound for The Lurking Horror (TLH) in my z-machine interpreter library and am now questioning whether I took the best approach.

The TLH problem:
At two different points near the end of the game TLH plays two sounds in quick succession, i.e. without an intervening read instruction. In the first case the second play instruction is quickly followed by a stop instruction for the second sound. In the other case by a stop 0 instruction (which Standard 1.1 has defined to mean ‘stop all’ instead of Infocom’s interpretation of ‘stop most recent sound’).

On a slow enough interpreter, i.e. from the 1980’s, you can hear both sounds, but otherwise the instructions happen too quickly unless care has been taken by the interpreter’s author to deal with it. Assuming I don’t want to code anything specific to TLH, and instead handle arbitrary z-code that does crazy things: I see two viable approaches, each with some minor variations possible.

Approach 1 Synchronous:
Within a turn, when a second sound is played before the first has finished, pause the z-machine until one iteration of the first sound completes, then play the second sound and resume execution. A stop instruction issued for a sound started this turn also results in this wait state until the sound to be stopped has completed at least one iteration.

Approach 2 Asynchronous:
Within a turn, when a second sound is played before the first has finished, set the first sound to stop after one iteration and queue the second to play when the first has finished. Proceed running normally. Any stop instruction issued for a sound started this turn reduces the iterations to play to a single iteration but does not stop the sound.

Pros of Approach 1:
Plays at least one iteration of any sound regardless of machine speed (On a slow machine more iterations may happen). Play and Stop instruction behavior is straightforward and can handle more than two sounds played in quick succession (not needed for TLH).

Cons of Approach 1:
Will almost certainly cause a noticeable delay in appearance and responsiveness of the input prompt when this happens as the machine waits for the sound to complete.

Pros of Approach 2:
Plays at least one iteration of any sound regardless of machine speed (On a slow machine more iterations may happen). The Z-machine does not pause and input appears as quickly and responsively as normal.

Cons of Approach 2:
Without a larger or dynamic queue size, handling more than two sounds played in quick succession (not needed for TLH) would likely need to be considered an error, leading to a discrepancy in behavior between fast and slow machines. Play and Stop instruction logic is slightly more complex, including needing special handling for Stop instructions that happen after the next input but refer to the first or second sound from the previous turn, e.g. A stop command for the queued sound comes in after the next turn starts, but the sound hasn’t actually begun playing yet.

This may all seem silly since there are no z-code files that do anything beyond what TLH does, but I’d like to know my code can deal with it anyway.

fredrik · June 21, 2024, 5:05pm

Ozmoo recognizes TLH and uses a strategy that works for this specific game only. IIRC it knows that specific sound effect numbers must be treated in a special way.

(I think we got this from the Frotz source?)

Mike_G · June 21, 2024, 5:19pm

I definitely want the code to be TLH agnostic. Specifically, I’d like the code to do ‘the right thing’ (for some definition of right) if an author plays a number of sounds in quick succession. Chopping them down to an almost inaudible ‘click’ is probably not ‘the right thing’, so I’d like to extrapolate from TLH to something more general. Clearly this is all beyond the scope of the Z-machine standard as written and largely just an exercise in my stubborness.

HanonO · June 21, 2024, 7:52pm

I’m likely missing the point, but if you know the sounds will be played in succession, can you just combine both sounds into one continuous sound file? Even if you have a “creak” and a “knock” sample, then a different sample that is “knock-creak” in sequence… Instead of relying on the software to do it, let it play a pre-recorded single file of both sounds?

(My humblest apologizes if I’ve interjected in a situation I don’t understand completely. This was the workaround in the original Final Girl where I had two sounds that played separately and then together, I just had three sound clips A, B, and A+B instead of attempting to play both sounds simultaneously, or buffering and queueing them.)

Or is it a matter of just generally buffering random sounds that might occur close together and interrupt each other? I’ve never played TLH with sound nor got to the end so I may not understand how it’s supposed to work.

Draconis · June 21, 2024, 8:14pm

The problem is that the authors of TLH didn’t do that, and it’s too late to change it now.

Adam_S · June 21, 2024, 8:28pm

What version Z-Machine are you using Mike, and are you using Blorb or putting the sound file in the same directory as the game file?

Adam

Mike_G · June 21, 2024, 9:22pm

It’s this.

Mike_G · June 21, 2024, 9:23pm

To be fair it wouldn’t have been possible given size constraints, since other sounds were cut from the game entirely for the same reason.

Mike_G · June 21, 2024, 9:26pm

This would apply to all versions 3 or greater and would apply to both separate files and Blorb (the library doesn’t even have a concept of files).

Adam_S · June 21, 2024, 9:31pm

I’d like to have ambient soundscapes playing in the background of a game that I’m working on. So if I can sit an OGG file next to the Z5 game file on the directory and achieve this then hail Mary!

Adam

Mike_G · June 21, 2024, 9:36pm

You should definitely be able to do that.

fredrik · June 21, 2024, 9:43pm

In version 5 the author can handle the playing of two consecutive sounds perfectly well with the functionality that is specified in the standard, and an author would indeed be wise to do it this way, for maximum compatibility with different interpreters. Sherlock does this, and that’s why we don’t need special kludges for Sherlock.

Mike_G · June 21, 2024, 10:00pm

I imagine you are referring to sound interrupts triggering subsequent sounds. Yes that works and is the method a story author should use.

What I am interested in is rather different. What I am trying to do is fully define what my interpreter (and only mine) would do if it encounters sounds in rapid succession a la The Lurking Horror, but in an unknown story. I have some specific goals in mind for this interpreter and ‘silently do something the author probably didn’t intend’ is the polar opposite of what I want. So is ‘crash’ or ‘corrupt the sound state of my interpreter’. So ignoring the situation isn’t an option, even if it is astronomically unlikely to ever occur. If anyone is interested I’ll explain my goals in greater detail, but I was trying to avoid a long description of my project.

I’m looking for feedback on the two approaches I’ve identified, or any unique approach I haven’t thought of.

Draconis · June 21, 2024, 11:10pm

Personally, I think this is a fine way to handle it. It’s close to the original behavior, and avoids adding too much complexity for a case that probably won’t come up outside this.

jkj_yuio · June 21, 2024, 11:41pm

So do these sound requests have channels? Because otherwise there is missing information whether to replace the existing sound or overlap it.

With a channel ID;

whenever a sound is played on a channel already playing it should micro-xfade to the new sound.
When a sound is played on a blank channel, it mixes with any existing sounds.

All stops and starts ought to have implied micro-fade out/in to eliminate pops and clicks.

A micro fade is something around 100ms or less.

Mike_G · June 21, 2024, 11:53pm

The z-machine only has two channels. One for samples and one for music. The music channel was added post-Infocom and so was not a part of any of their games.

I was restricting my example to only the sample channel because the standard specifically states that music always instantly interrupts other music, while music and samples don’t interact with each other. This avoids ambiguity, but actually recreates for music the conditions that led to TLH’s issues with samples - namely, the behavior is observably different when the interpreter is orders of magnitude slower.

Draconis · June 22, 2024, 12:03am

And also, as far as I know, has never been used? I vaguely remember it being deprecated.

FLACRabbit · June 22, 2024, 12:07am

Click to show audiophile opinion

As a musician and audio programming enthusiast, I disagree with jkj yuio’s proposed solution. A “micro fade” (especially one as long as 100ms, which is a full tenth of a second) is not the best way to eliminate “pops and clicks,” because for many sounds - such as an explosion - the first 20ms or so, called the “transient,” determines the perceived character of the entire sound. Furthermore, for seamless looping, it’s desirable for a new sound to start immediately and the old one to end as quickly.

If pops and clicks are really a problem, a better solution would be using a DC offset remover to ensure that the first sample of the sound is at zero amplitude. The DC offset could be smoothly interpolated to cancel out the last sample’s value, ensuring that the sound starts and ends at zero.

At any rate, I doubt that this subproblem is significant enough to warrant any interpreter-enforced solution at all. It’s best handled by the author processing the sound to eliminate undesirable discontinuities.

I apologize for being slightly off-topic here.

jkj_yuio · June 22, 2024, 12:25am

You’re right of course, but it’s a bit different for games;

the long 100ms fade is used for both changing one sound to another on the same channel and for stopping a sound. It doesn’t sound very good to abut two completely different sounds within 20ms. Also a 20ms sound stop is too abrupt.

What you’re doing is trying to approximate an edit, where you’d xfade from one to another. And a fade out when a sound ends. Bearing in mind it might be the game stopping it not the sound track normally ending.

the “pops” are not DC offset. If you’re playing sound A on channel 1, then you request sound B on channel 1, you have a discontinuity unless you fade them.

Mike_G · June 22, 2024, 1:09am

It’s all good. I’ve always viewed forum threads as conversations and tangents aren’t necessarily bad. I admit I have trouble staying on topic, just like I do with in-person conversations.

What I’m looking at here is strictly in the realm of abstract z-machine behavior, and the concrete details of how the sounds are mixed, faded or whatever are not really relevant.

To maybe clarify things a bit:
The naive (and almost universal) approach used by interpreters playing samples in the z-machine is that only one can ever be active at a time and when another is played, the existing one (if it is still playing) is immediately stopped. However the game TLH played a sound followed immediately by another and the authors expected both sounds to be heard because the Commodore Amiga interpreter was slow enough that this was indeed the case. A faster machine starts the second sound before the first can even play and thus doesn’t achieve the authors’ intent.

What I want is for my interpreter to play both sounds, not just for TLH but in the more general case. Imagine if tomorrow a hundred new games popped into existence, all repeating TLH’s ‘mistake’ of rapid-firing sounds.

This is largely academic since:
A) There are no other games that do this.
B) It’s unlikely there ever will be.
C) The requirements of the z-machine standard are fully met if you simply cut the first sound off.

But - the standard falls short of saying a game that duplicates what TLH does is an error or a violation of the standard, so supporting a more complicated solution is not invalid. In fact there’s nothing preventing a game from firing off a hundred sound samples back-to-back and the majority of interpreters would likely just cut off all but the last sound. This is fully compliant with the standard, but I’d rather not do that. My two approaches boil down to either have the machine stop (no input cursor or other interaction) until at least one iteration of each sound plays, or queue up the sound(s). In the second approach, as a secondary concern, I don’t want to let the machine allocate memory for an aribtrarily long queue of sounds, so playing more than two would probably be an error.