Glk sound API plans

In theory there should not be too much variation between Glk libraries. The Glk specification says, with regard to the glk_schannel_set_volume() call:

The question then becomes what exactly is meant by a phrase such as “half volume”. I would take what I hope is a common sense position on this: if a game plays a sound at half volume and I have my amplifier volume dial set to (say) 8, then it should have the same perceived loudness as when the game plays a sound at full volume and I have my amplifier volume dial set to 4.

When talking about perceived loudness, the usual rule applied (see for example http://www.sengpielaudio.com/calculator-levelchange.htm) is that a -10 decibel change in volume is equal to a halving of the perceived volume. This is what Windows Glk does: it uses Windows’ DirectSound interface to play sounds, and a call to the Glk volume function with a volume of 0x10000 causes DirectSound to be set to a volume of 0 dB; a call to the Glk volume function with a volume of 0x8000 causes DirectSound to be set to a volume of -10 dB; a call to the Glk volume function with a volume of 0x4000 causes DirectSound to be set to a volume of -20 dB; and so on. I don’t know what the other Glk libraries do but I would hope it would be something similar.

Gargoyle does a linear conversion of the volume parameter into the SDL_mixer level, which ranges from 0-128.

0x10000 / 512 = 128  (max)
0x8000  / 512 = 64   (half)
0x4000  / 512 = 32   (quarter)

Based on your numbers it seems like the results should be similar, but reports from the field suggest otherwise.

What I don’t know is whether SDL_mixer amplifies the volume, and if so how to calculate the zero point of the unadjusted volume. Maybe 128 is not linked to the sound file at all but represents the highest volume the system can output? Then two sounds with different volumes would have the same apparent loudness if both were played at 0x10000.

I guess I need to do some testing.

It does not use the system’s mixer. It alters the sound by itself, before the final mixing step. MIX_MAX_VOLUME (which is defined as 128) means “don’t touch the sound” (in other words, a 0db attenuation).

Good to know, thanks NC!

It looks like I can get volume changes broadly similar to WinGlk’s by using a logarithmic approach instead of a linear one. I’ll get that fix merged in for the next release.

That would be good, but in doing this I suspect you’ll need to look into what SDL_mixer does with its volume argument. Looking at the documentation for SDL I couldn’t see anything that explained what it does: is it a linear scaling of the waveform data by the volume argument?

Yes, you are right. Both. :wink:

I will publish a beta version of DamusixI7 just finished my internship (in 2 weeks more). After that, I guess I will take about two to three weeks to have a working version ready and tested (given the recent changes in I7, and the release of the next Damusix 4, more user-friendly for English-speaking).

I take this time because I have had many changes this quarter: my internship, moving to a new home, moving to a new operating system (now I’m using Linux :mrgreen:), etc.

Thank you for your support. :smiley:

PS: The “abuse” I promised, I will try to publish it in 1 week more. :smiley: I have not forgotten it.

Sorry, I let this thread slip away from me…

In this case, I want to make features available efficiently. The fact that they can be approximated in VM code at the cost of rapid timer-looping is not a reason to skimp on the Glk API level. If the result is that a future version of Damusix provides the same features, but produces a cleaner compiled game, that’s still a win.

Now, Ben’s earlier point is a concern:

That’s a significant cost and I don’t want to impose it without good reason.

Let me ask this: do you already have a callback that takes buffers of sound output from SDL_mixer and passes them to the OSX sound library? If so, it might be better (and produce a better result) to scale each buffer as it goes by. That way, the volume during a change would change smoothly rather than being stairstepped by any timer at all.

The code to scale a buffer by a linearly-changing factor is very simple – I’ve written it a couple of times for other projects. (It would be harder if there were multiple volume changes going on at the same time, but I’ve explicitly ruled that out.)

This is basically equivalent to multiplying the raw waveform data by V/0x10000, right?

This is where, as far as my limited understanding goes, things get difficult: I have a nasty suspicion that the above, while seeming obvious on the first glance, is not in fact correct.

When talking about sound volume, we could be talking about one of three things: the power in the sound wave, the amplitude of the sound wave (which would be the definition used by you in the above) and the perceived loudness. None of these three are equivalent. Most sound libraries’ documentation dodge this issue by not specifying what is meant by volume. The Microsoft DirectSound SetVolume documentation is as bad as any other in this: there’s much talk of decibels, but no talk of what precisely the volume argument applies to.

If this sounds like I’m generating problems without offering solutions: I am. Sorry about that.

I have something similar: a callback that queues up a new sound buffer for SDL_sound to play (via SDL_mixer, which masks the OS interface).

The trouble with using the SDL_sound callback to handle volume change is that that the buffers are somewhat large - 128 KB - to minimize the processing / decoding overhead. This code path is used for Ogg / MP3 at the moment but could be extended to the other sound types.

The 128K size is somewhat arbitrary; I had Eliuk run a lot of tests using different buffer sizes and that’s the one he liked best. I think it works out to be 1-2 seconds of audio per pass, which is a bit coarse for the proposed API. I could tune it to a lower amount at the cost of performance and audio quality.

On further reflection, perhaps it would be best to cut through the proverbial knot and have the Glk define volume as meaning the above. This puts more work on interpreter authors as they need to figure out how their sound library handles volume and, if it doesn’t match the above, scale appropriately, but making work for interpreter authors is nearly always better than making work for game authors.

This might screw up some games that rely on whatever the author’s current favourite interpreter does, but that’s always going to be the case if we pick a fixed definition.

That sounds about right: on Windows I use a 2 second buffer that a thread writes sound data into. Experiments showed that anything less than that caused stuttering on slow machines under load. It also means I’d be reluctant to implement a Glk call by directly modifying the buffer, given the time delay involved.

Two seconds? Yeah, that’s too long to use as a quantum for volume change control. If it were 0.5 second, I’d consider going with it.

I will do that if everyone thinks they can cope.

Hm. Is that possible with the MOD library?

As for the basic question: I still have to choose between dropping this volume-change proposal and asking Ben to rewrite a bunch of timer code. Is there any other factor to throw in here? Is the timer code going to have to change for other reasons, say if I add Javascript callbacks or (perhaps) video-playing calls to the spec?

It might need to change if it breaks in Lion, and there’s a fair chance it will.

Right now there’s kind of a horrible kludge going on where the interpreters run as NSTasks and put themselves to sleep via [NSThread sleepuntilDate: …], either until the next Glk timer tick (if one is active) or forever. The trick is that the launcher sends the blocked process a POSIX signal when an event for it arrives, and that signal breaks it out of the blocked state. This arrangement can sometimes delay hardware sleep from kicking in, which sucks.

I could improve it quite a lot if I could figure out how to give an NSTask its own run loop to listen on for timer events, but I pounded my head against that particular wall for a long time with no luck. I know there must be a way to use CFRunLoop for this purpose. I just need to pick my way through the Chromium source until I figure out how Google made it work.

The libmodplug code has GetCurrentPos() and SetCurrentPos() which return and take an integer specifying, in an arbitrary sense, where the decoder has got to. The problem I can see with this, while sound decoders have some notion of a position of where they are, it’s usually arbitrary: what if the value was stored in saved game file, then loaded in a different interpreter using a different MOD decoding library? It would be better if the position were in terms of, say “I’m 34.5 seconds into the song” but I don’t know any sound decoder that offers that.

The game could standardize this, though, if there were also an API call to get the length–e.g., divide the one by the other before saving, then on reloading, expand that ratio to the appropriate ratio used by the new terp. (Or maybe the Glk API could even go so far as to use a percentage rather than an arbitrary tracking number or a value in seconds?)

Let me add a quick note to explain why I suggested this functionality:

Currently, there are essentially two uses for sound in a glulx IF game: background music and momentary effects (footsteps, a siren, etc.). The ability to pause and resume playback (i.e., the ability to control file tracking) would allow sound to take on other roles, such as exposition. Moreover, if we can give the player control over the playback of at least some sounds, we might be able to foster a different kind of relationship with sound in-game. For example, when background music has been discussed on the forums, there have always been a number of people who step in to say that they would generally prefer to simply shut it off (I tend to be one of these people, and almost always do shut the music off in Flash games or anything else that I might be playing.) If instead of playing background music through the course of the game, we offer the player a music widget* (whether in a graphics window or a set of unicode “buttons” in the status bar) where he can control what he listens to and when, there is at least a possibility that we would get more player buy-in. The cost would be that we couldn’t use those tracks for dramatic purposes, but players might be more interested in using them for general mood- and place-setting purposes.

In such a model, sounds could also be used as achievements or to mark progress through the game – “A new track has been added to your player!”. Such tracks need not be music, either. They could contain spoken text, for example, to serve as exposition/backstory.

Anyway, that’s the background to the request. Thanks for giving it some thought.

–Erik

  • I suppose that such a widget is possible without the ability to pause and restart, but it would seem pretty underfunded–we’re surrounded by music players everywhere these days, and all of them offer pause/restart. (For spoken word tracks, the ability to pause/restart is likely even more important.) Also note that the same API functions needed to provide pause and restart would also allow for author-created seek functionality.

I haven’t used libmodplug, but it seems strange if the position is an arbitrary value. Are you sure about that? The sound libraries that I have used (BASS, FMOD and DUMB) are consistent in their handling of MOD songs (and related formats). The value for position isn’t directly linked to time, because that’s not how MOD files are constructed. Orders (the terminology may vary - orders may also be called patterns) makes more sense than seconds for setting a starting point in a MOD file.

Pause and resume is a different issue, and usually easiest handled by special case functions.

Well, yes, you’re right: the value returned can be related to a position in the file. I’m just a little uncomfortable with assuming that, for any given sound format, all players will have some way to set and get a position in that sound, and that a particular value will have the same meaning for all players of a particular format.

On reflection, if we did any of this, I think I’d prefer a way to just pause and resume a Glk sound channel.

If pause/resume doesn’t require specifying the current position, that sounds fine. Knowing nothing about sound libraries, I naively assumed that it would require that.

Note, though, that a simple pause/resume Glk call couldn’t emulate a seek function. (Less important to me personally, but it might open up possibilities.)

–Erik

Pause and resume (of a channel) sounds like it will be easier for everybody to implement.

I agree. I haven’t done much with runloops, but this is indeed what they’re for.

For a single-threaded interpreter, the model would be to do work until the glk_select() call, at which point you call CFRunLoopRun (on the thread’s run loop). The loop will run on its own, executing the callbacks for all the timers and event sources you set up, until one of them needs to cause a Glk event, at which point you stash the event data somewhere and call CFRunLoopStop. The interpreter then returns from CFRunLoopRun, looks at the stashed event data, and continues doing VM work.

…If the problem is passing data into and out of the NSTask, then I think you create a couple of NSPipes and use them for stdin and stdout. Inside the NSTask, stdin can be hooked up as a data source for your run loop.