Request for help: JSON protocol for Glk sound

Dannii · December 19, 2023, 3:00am

So unfortunately Parchment and Quixe don’t support the Glk sound functions. It’s great that Bisquixe now does, but it would be better of course to have sound out of the box in Parchment/Quixe.

There’s a part of this which someone else could help with even if they’re not familiar with the Quixe source code or even if they don’t know Javascript! That’s because the ideal Quixe solution for sound will do so via the GlkOte JSON protocol. (This also allows RemGlk to have sound support, or any other protocol user.) So if you’re familiar with the Glk API, and if you know JSON, then you might be able to help think of a good way of encoding the sound instructions into a JSON protocol. This shouldn’t be too hard, I just haven’t set aside time to do it myself.

For reference, the Glk sound API is documented here: Glk: A Portable Interface Standard for IF

And the GlkOte protocol is documented here: GlkOte: a Javascript library for IF interfaces
Or perhaps more usefully, as a TypeScript interface: https://github.com/curiousdannii/asyncglk/blob/master/src/common/protocol.ts

Note that the protocol will need to be two-directional: sound instructions will be sent out (as part of StateUpdate), and sound events will be received back (as a new event type).

mathbrush · December 19, 2023, 4:56am

This is something I don’t know how to do well (which is why I’ve hoped that eventually much of the stuff I’ve done with Bisquixe will get ‘fancified’ by professional programmers), but I can say what I had to have to make things compliant with glk.

Things glk expected to have:
-Sound channels
-Sounds
-Sound notifications
-Volume notification

For the channels:
-it expected them to have a ‘rock’.
-I’m not sure what disprocks do, so I don’t know if they’re expected
-It expected that a channel would have a volume
-It expected that volume could be changed over a given length of time
-It expected that channels could store a number and that when volume completed changing that number could be sent as a notification.
-It expected to be able to iterate over channels in the normal way glk objects are iterated
-It expected that channels could be played or stopped
-It expected that multiple channels could be played at once, with a notification number is assigned to all the sounds at once.

For the sounds:
-It expected that sounds would have ids and be associated to channels. (I gave channels a property called currentsnd but the connection could go the other way).
-It expected that you could set the sounds to a certain number of repeats (and that -1 repeats would loop forever)
-It expected that you could store an integer number, and send that number on a notification once the song finished its last repeat.

For the notifications:
-It expected something with four fields, where field 0 had the event type sound notify or volume notify, field 1 had the associated window (which is null for sound stuff), field 2 had either the sound id (for sound notifications) or 0 (for volume notifications), and field 3 had the stored notification for the given event.

I do not know if that is useful (and I just learned what JSON actually is about 30 minutes ago; I wondered why I kept editing attributes of a pair of braces) so feel free to disregard if it’s not useful!

Hanna · December 19, 2023, 2:18pm

I started thinking about the JSON representation and almost immediately needed a better understanding of how exactly both sides would use the information being exchanged (which affects what needs to be exchanged, and how exactly). For this, @mathbrush’s list (together with the Glk spec and skimming some source code) was helpful. I think it turns out that many things you mention don’t have to be in the JSON, though. For instance, the fields of the different events can be represented more naturally in JSON than in C or Inform, and quite a bit of state is controlled by one side of the protocol and does not need to be explicitly sent though the protocol.

I’ll start by thinking out loud about how to fit the sound API and events in the state update-shaped box and corresponding notifications. A more concrete sketch of a JSON schema follows (it’s intended to be complete, but may include embarassing typos and can surely still be improved).

Design musings

Sound channels being created and destroyed can be modeled after the the existing windows array in state updates, though it should probably be a new top-level field schannels:

In each update, the terp just lists all sound channels that currently exist, and they’re closed implicitly by no longer being listed in the next update.
Like windows, this works because each object is identified by a numeric ID that is never reused (this distinguishes it from a rock) and is specific to the JSON protocol, i.e., not exposed via the Glk API proper.
Also in analogy with windows, an empty array means all previously existing sound channels were closed, while omitting the schannels field entirely means the set of sound channels is the same as in the last update.
While windows have numerous attributes, sound channels just have their ID and little (if anything) else. The terp-side Glk implementation will maintain rocks and disprocks for the sake of the terp and the program it’s running, and it probably wouldn’t hurt to include the rock in the JSON, but I don’t think the display layer will have any use for it (unlike windows, where rocks translate into a WindowRock_123 class attribute that can be used for CSS shenanigans).
Volume is conceptually a property of the sound channel, but should not be included in the state updates: it’s not always under the control of the terp side of the protocol (set_volume_ext puts the display layer in charge of change it asynchronously) and the game can’t even check what the current volume is.

Sound instructions fit with the existing content updates array (I’m unsure whether it’s a good idea to intermingle them with window updates in the same array, but it should work):

Roughly, each Glk call that starts or stops or modifies playback or volume on some sound channel translates to a JSON object describing that operation, with attributes for all the parameters, e.g. { "special": "play", "sound": 1234 }.
For consistency with window updates, the updates array groups together all operations affecting the same channel, like this: { "schan_id": 123, "play": [update1, update2, ...] }
For the glk_schannel_*_ext sound APIs, the basic (non-ext) version is supposed to be like the _ext variant with default values for the extra parameters. This suggests the JSON protocol may represent, for instance, both play and play_ext operations as the same kind of update, with optional repeat and notify fields only used by the extended API. I’ve chosen this approach in my sketch below because it gives a shorter schema, but I’m not 100% sure whether this makes things more complicated for implementations what don’t have the ext versions (are those still a thing in 2023?).
Sound loading hints conceptually belong with these updates, but don’t target a specific sound channel, so it seems more consistent to put these in a new top-level field of the state update message.

On notifications:

Background: input and timer events are spelled out in state updates, separate from window updates. GlkApi/RemGlk/etc. tracks which of these are requested and not yet canceled at any given point in time and the display layer obliges by creating and cancelling its UI/timer event handlers accordingly. This works alright because these events have very simple state, controlled directly by Glk APIs that request and cancel these events.
In contrast, I think sound notifications should be in the hands of the display layer: some sound instructions include parameters that imply a sound or volume notification should be sent when some asynchronous process in the display layer finishes, and the display layer will track that and send those notifications if and when the time is right. There can be many potential sound notifications in flight at any time, they are not explicitly cancelled by the program but may just never be delivered for a multitude of different reasons, including playback errors that only the display layer can notice.
This means there’s no further additions to the state update message. In the other direction, the two new event types are relatively obvious to encode.

Pseudo-TypeScript Sketch

// Existing type alias gains new variants
export type Event = ... | SoundNotifyEvent | VolumeNotifyEvent

export interface SoundNotifyEvent extends EventBase {
    type: 'sound_notify',
    // resource ID which finished playing
    snd: number,
    // non-zero value given when the notification was requested
    notify: number,
}

export interface VolumeNotifyEvent {
    type: 'volume_notify',
    // non-zero value given when the notification was requested
    notify: number,
}

export interface StateUpdate {
    // ...
    // Fields added to existing type:
    sound_channels?: SoundChannelUpdate[],
    sound_load_hints?: SoundLoadHint[],
}

// Existing type alias gains new variant for sound channel updates
export type ContentUpdate = ... | SoundChannelContentUpdate

export interface SoundChannelUpdate {
    // no type field needed (Glk only has one kind of schan)
    id: number,
    // initial volume (from schannel_create_ext) is encoded as first update
}

export interface SoundChannelContentUpdate {
    // not just plain 'id' to help distinguish it from window updates
    schan_id: number,
    play: SoundChannelOperation[],
}

export type SoundChannelOperation = PlayOperation | StopOperation |
    PauseOperation | UnpauseOperation | SetVolumeOperation

// one glk_schannel_play_multi call maps to many of these
export interface PlayOperation {
    special: 'play',
    // schannel ID implied by parent
    snd: number,
    // default: 1
    repeats?: number,
    // default: 0
    notify?: number,
}

export StopOperation {
    special: 'stop',
    // schannel ID implied by parent
}

export PauseOperation {
    special: 'pause',
    // schannel ID implied by parent
}

export interface UnpauseOperation {
    special: 'unpause',
    // schannel ID implied by parent
}

export interface SetVolumeOperation {
    volume: number,
    // default: 0
    duration?: number,
    // default: 0
    notify?: number,
}

// Direct translation of the C API - inelegant and possibly optional
export interface SoundLoadHint {
    snd: number,
    // Hint: load the sound resource if non-zero, or unload if zero
    flag: number,
}

Hanna · December 19, 2023, 2:25pm

All of the above is assuming that sound updates are coupled to the normal state update lifecycle, as Dannii suggested (if I understood correctly). That is,

The terp/game runs and calls a bunch of Glk APIs, including sound APIs, until it blocks on glk_select. (I’ll ignore the few other blocking calls for simplicity.)
An update covering everything that happened since the last glk_select, including all the queued-up sound stuff, is sent to the display layer.
The display layer carries out those instructions, including playing sound. Then it and the terp stop running until input or another external event arrives (now including sound/volume notifications), which the display layer presents as an event to be returned from glk_select.

This is the least invasive extension to the protocol, and it’s probably good enough for the vast majority of cases where the game runs only for a brief moment between each user input. But it’s subtly different from how the classic Glk API actually works: a game could start playing a sound while it’s busy doing more work for several seconds. If the Glk implementation supports it, then player hear sound while they’re waiting, and sound/volume notifications can be delivered to the game whenever it calls glk_select_poll.

There is an equivalent issue for timer events, but GlkApi and RemGlk make it work by keeping track of the timer themselves in addition to what the display layer does. Thus, they can report timer events in their glk_select_poll without sending any messages to the other side. But a sound can’t actually start playing, and the volume can’t actually change, until these commands are relayed to the display layer.

I don’t know whether this is a realistic concern, and whether it would even work in the single-threaded world of the browser/node.js event loop. If the terp runs on the main thread, then I believe events from the web sound APIs probably wouldn’t be delivered to the display layer (which may prevent Glk sound/volume notifications from working), but maybe starting/stopping playback can work. However, this would require that sound stuff doesn’t go through regular state updates, but through new kinds of messages:

Terp sends “please do this sound operation” and gets back an acknowledgement or error from the display layer. (NB: in this model, glk_schannel_play_multi would need to be a single atomic update, not decomposed into several “play this sound on this channel” steps.)
Terp polls the display layer for sound/volume notification events (and timer events, I guess, while we’re at it) and immediately gets back such an event if any is pending, but doesn’t block if no event is pending.

This seems like a more invasive change to the protocol, but feasible in principle. However, it would be need further thought, since the current state update message couples the “I’m blocked waiting for events” together with the changes to windows and such:

Some changes and updates would have to be included in the new messages, e.g., newly created sound channels must be announced for operations on them to make any sense, and changes to the timer event should also be reflected.
In principle the same idea applies to text and graphics output – these, too, could be displayed before the next blocking glk_select. The Glk spec for glk_select_poll mentions this possibility for text buffer output.
But these non-blocking updates probably shouldn’t include everything that’s currently part of the state update. For instance, I don’t think glk_select_poll should set up or cancel pending input events.

Dannii · December 20, 2023, 12:18am

This is a great start, thanks @Hanna!

So one of the principles behind the GlkOte protocol is that the Glk API implementation might not even be on the same computer as the GlkOte client! (Indeed, that’s why the initial implementation was called RemGlk + GlkOte - it was designed to be remote.) This means some compromises have to be made for how the Glk API is implemented. I had forgotten that the sound system had one of them.

GlkOte already has one use of a pseudo-RPC for glk_fileref_create_by_prompt. If the stylehint measuring functions ever get implemented they’ll need to be RPCs as well. So it would be possible to use an RPC for glk_schannel_play so that we can return an error.

But I don’t think it would be worth it. Unlike, for example, the file functions, errors wouldn’t be common here. If the sound file is corrupt it’s likely to not play in other interpreters too. The Glk implementation needs access to the blorb and not just the storyfile, so I’d say it would be enough to check if the blorb has the requested sound resource and if it can play that type of resource (ie, does it support MOD files or not), and if it does glk_schannel_play can return 1.

On the delays if you start playing a sound and then try to do many seconds of further work, I don’t think that’s something we should worry about here. Delays are inevitable with GlkOte. (In practice they’re minimal as most uses are single machine, but if it’s actually going over a network then everything will be delayed.) And glk_select_poll is kind of soft-deprecated too:

The GlkOte implementation also needs a blorb, so I think glk_sound_load_hint can just be a NOP, and it doesn’t need to be represented in the protocol.

Dannii · December 20, 2023, 12:37am

Now for my thoughts on your protocol sketch. In general I think it’s good! Just a couple small things I think should be changed.

TypeScript doesn’t have true sum types/tagged unions, but when there’s one shared property it can work pretty well. Even though it could work as you designed it, I think it would be better for SetVolumeOperation to also include a type property. (I assume you were basing this off the graphic window operations, but I’d probably just call it a type or op rather than special.)

Likewise for SoundChannelContentUpdate, while it might work with a schan_id, I think this makes the protocol more complex than it should be. It also ties together parts of the Glk system that don’t need to be connected. My GlkOte implementation gives the whole content array to the windows system. If the sound updates were part of it then the windows system would need to know about and have references to the sound system.

I think it would be simplest just to put the sound ops into SoundChannelUpdate:

export interface SoundChannelUpdate {
    id: number,
    ops?: SoundChannelOperation[],
}

So we’d use the parts of the protocol as follows:

No sound functions were called: just leave out sound_channels.
Sound channels were created or destroyed: send SoundChannelUpdate[] for all currently existing sound channels.
There were sound operations: send SoundChannelUpdate[] for all currently existing sound channels, and if any have operations then send ops.

Otherwise I think your protocol looks good. (I might rename some properties to shorter names, but that’s not important at this stage.) But again, I haven’t had the time to think deeply about the Glk sound API, so there might be something else I’ve missed. So feedback from others would be great too!

Hanna · December 20, 2023, 10:50am

Right, compromises is the keyword. For better or worse, the protocol is as it is and already makes similar simplifications in other areas. I’d like to explore an alternative that is more eager to have frequent back-and-forth, at the expense of supporting the “two different computers” use case less well, but it’ll only ever matter for edge cases and maybe a little UX polish for the occasional game doing unusual things. That’s a quite different project from adding sound support to the existing protocol and the libraries using it.

It can most definitely be left out of the protocol, or accepted and silently ignored by the display layer. But I’m not quite sure what you mean w.r.t. access to the blorb. I know that GlkOte, or rather gi_blorb, can take the raw .blorb file and pull out all the resources to keep them in client-side JS objects. But it’s also possible to only supply the metadata and fetch the actual resources via URLs when they’re accessed. This format is not very widely used today (in part because there’s less tooling for producing it) but it has the advantage of not breaking on large blorbs. If audio files are loaded over the network, then a hint for pre-loading them can definitely make sense, and in fact HTML <audio> elements support this. So the Glk library could do something sensible with the hints – assuming any games actually give such hints, which I just don’t know.

Dannii · December 20, 2023, 12:14pm

I forgot another significant reason to avoid RPCs where possible: it is costly, to both performance and file size of the interpreter, to convert the synchronous C API of Glk to an asynchronous implementation. Emglken does this through Emscripten’s Asyncify mode, and while it works well enough, if further functions must be Asyncified, then that will have a cost. Likewise there is a cost when a WASM function calls out into JS code (even sync code). The ideal is to call one WASM function and have it do all of its processing staying in WASM mode, and then return. If we have to bear these costs to do something then we’ll just do it, I’m not saying we can’t do RPCs where needed. But it would be best to avoid them if possible.

I also don’t know if any games use glk_sound_load_hint! That would be a good research project for someone .

Zarf’s GlkOte doesn’t use typed arrays, so it would have more difficulties with large blorbs than my AsyncGlk. Apparently each number in an array takes 9.7bytes in Chrome, so basically multiply by 10 to get how much RAM is needed in GlkOte compared to AsyncGlk. I haven’t actually tried a 100MB+ blorb in AsyncGlk, but I wouldn’t anticipate issues.

The infomap (JSON) format doesn’t currently support sounds, but we could add to it if we decided it was better than just using blorbs (though I don’t think it is because it would be quite bad for resource chunks). But supposing we did, the JS audio API would allow us to stream the sound files, so it’s not like it would need to download the whole file before playing. But HTTP can be slow, so preloading could be slightly helpful, particularly with short timely sound effects rather than music. But as I’ve said, we should just use blorbs.

Hanna · December 20, 2023, 7:37pm

For Glulx interpreters, it “just” takes some elbow grease to teach them how to efficiently suspend themselves at every @glk opcode. However, I’m sure we can all imagine far more pleasant and rewarding activities than refactoring the 1.5k LOC glkop.c (twice, if we want Git and Glulxe) to enable this. More importantly, from a quick glance at the TADS runner, I can see the issue there – so many Glk calls all over the place! Let’s hope that new Webassembly proposals (JS promise integration and stack switching), will eventually enable a better and more efficient solution.

I just tried a blorb that’s around 500 MB on iplayif.com. The good news is that it works on my machine. The bad news is that needs about 2 GiB of memory, which would be a problem for many other machines, especially mobile ones. I’m not eager to bring the blorb to my phone to try it out, but in the past I’ve had the whole browser crash due to less challenging tabs.

Firefox dev tools show me that there’s a single ArrayBuffer whose size matches the blorb (so far so good!), but also ca. 36k ordinary arrays with a total footprint of ca. 1.5 GB. I can’t immediately figure out where those come from (gathering stack traces for all allocations makes the loading process grind to a halt). In any case, this exercise has not quite convinced me that everyone should just use blorbs.

jkj_yuio · December 20, 2023, 8:55pm

Not suggesting anything specific for GLK, but here’s what i do in Strand;

Strand works by sending JSON messages between the game and the UI. So there are similarities here. for the Web, it’s all WASM.

This is the soundobj JSON object;

SoundObj

name: "filepath"
duration: int
play int times. A value of 0 means; stop playing any existing sounds (or ones on this channel). A value of -1 means play continuous loop.
channel: int
Optional. Specify audio channel, default to 0.
volume: int
Optional. Specify audio volume level as percentage.
preload: true
Optional. Cache audio before use

“filepath” is sometimes a URL.

eg { "soundobj": { "name": "audio/dogbark.ogg", "duration":1, "channel":0, "volume", 100 }}

messages are all async and the UI deals with the audio async.

Things i discovered:

You need to stream your audio fetches for web (music files can be large).
Beware of latency when using short sounds.

Dannii · December 20, 2023, 11:07pm

Is there an actual published game that’s a 500MB blorb, or is that a test file? Honestly that’s such an extreme blorb that I’m not too concerned if it gets out of memory errors. And note that the Parchment proxy has a file size limit of only 100MB.

mathbrush · December 20, 2023, 11:09pm

I recently was making releases of two different games, each of which had 60 sounds, each including some longer ‘music’ segments as well as sfx.

One was 10 MB, and another 9 MB.

Dannii · December 20, 2023, 11:11pm

Oh, blorbs of that size won’t be a problem, that’s pretty common now. Counterfeit Monkey is 11MB and Kerkerkruip is 18MB and both run great in Parchment.

mathbrush · December 20, 2023, 11:13pm

Right! I was supporting your position that we most likely needn’t worry about 500 MB files.

Dannii · December 20, 2023, 11:19pm

I see the biggest files in the Glulx folder in the IF Archive are:

Renegade_Brainwave_Sound.gblorb: 54MB
FlexibleSurvival.gblorb - 252MB
DiaperQuest.gblorb - 398MB
Archaeological_Fiction.zip - 724MB

So wow, yes there are some huge blorbs! But you can’t play the >100MB ones in Parchment (by URL I mean. It would be possible to try loading them from your computer, but I haven’t tried.) Renegade_Brainwave_Sound.gblorb works fine, and the memory snapshot is only 86MB.

Hanna · December 21, 2023, 9:34am

Yeah, there’s a couple of such huge games. I went looking for them a few months ago specifically to use as stress tests. I also found two more that aren’t in the IF Archive by searching the forum archives (both AIF, so I won’t link them), and some of the games on IF Archive have newer versions elsewhere that are even bigger. From my arguably twisted POV, such huge games pose interesting technical challenges for Glk libraries and also interpreters (e.g., .ulx files many dozens of MB large). I think it would be nice if even the largest games could “just work” in the browser (aside from the proxy’s size limit) or in a Lectrote-style app, and I don’t see a fundamental reason why it wouldn’t be possible. Dannii’s use of typed arrays in AsyncGlk is already a big step in that direction! Unlike Lectrote for instance, Parchment can evidently load a 500 MB blorb, and it’s quite reasonable to consider the remaining issues and inefficiencies very low priority. I don’t want to further derail this topic, so let’s await more input on the topic of Glk sound via JSON

zarf · December 21, 2023, 3:56pm

I want to get there for GlkOte/Quixe. It’s just the usual run of everything else going on too.

Dannii · October 19, 2024, 6:11am

Here’s the protocol I think I’ve settled on.

First output.

export interface StateUpdate {
    ...
    /** Sound channels (new channels, or new operations) */
    schannels?: SoundChannelUpdate[],
}

export interface SoundChannelUpdate {
    /** Sound channel ID */
    id: number,
    /** Sound channel operations */
    ops?: SoundChannelOperation[],
}

export type SoundChannelOperation = PauseOperation | PlayOperation | SetVolumeOperation | StopOperation | UnpauseOperation

export interface PauseOperation {
    op: 'pause',
}

export interface PlayOperation {
    op: 'play',
    /** Notification value */
    notify?: number,
    /** Number of repeats (default: 1) */
    repeats?: number,
    /** Sound resource ID (from a Blorb) */
    snd: number,
}

export interface SetVolumeOperation {
    op: 'volume',
    /** Duration in milliseconds */
    dur?: number,
    /** Notification value */
    notify?: number,
    /** The volume as a number between 0 and 1 */
    vol: number,
}

export interface StopOperation {
    op: 'stop',
}

export interface UnpauseOperation {
    op: 'unpause',
}

This is largely the same as @Hanna proposed above, just with a few props renamed. The main difference is as I wrote above - channel and operation updates are all done together. If you have many channels but only play a sound on one, it will be a little bit verbose as you’ll need to send updates for all those channels, but each channel’s update only needs to include its ID.

~~Also, play operations may need to send a URL rather than a Blorb resource ID. I’m not sure about this - it wouldn’t work with the sound event below.~~ Rather than sending a URL in the case of non-Blorb resources, it may be better to use the garglk_add_resource_from_file extension.

Events will be updated with:

export type Event = ... | SoundEvent | VolumeEvent

export interface SoundEvent extends EventBase {
    /** Event code */
    type: 'sound',
    /** Sound resource ID which finished playing */
    snd: number,
    /** Notification value */
    notify: number,
}

export interface VolumeEvent extends EventBase {
    /** Event code */
    type: 'volume',
    /** Notification value */
    notify: number,
}

Dannii · October 19, 2024, 6:46am

I’ve just realised that in order to support the gestalt_Sound2 gestalt (without which several of the functions won’t be accessible), I would have to add MOD support to GlkOte. I was thinking of leaving it out, because it isn’t natively supported by the Web Audio API, and also because I don’t think it was actually used much in practice? But I guess I could return 1 0 from glk_schannel_play if the sound is a MOD file?

Hanna · October 19, 2024, 8:18am

It’s possible to advertise support for the other APIs with the older, more fine grained gestalts. At least GitHub - angstsmurf/soundtest: Test sound capabilities of Glulx interpreters seems to handle this carefully. It seems likely that many games just check sound2 and not reporting that gestalt willl break more of them than implying MOD support would. But tweaking reported gestalts is easy, so you could do the technically correct thing first and then see how games handle it in practice.

By the way, wouldn’t returning 1 from glk_schannel_play imply that the MOD was played successfully? Returning 0 seems more accurate, in case anyone checks the return value. The type of sound resource should be available on the interpreter side at least for blorb resources, right?