Plans for IF-Octane and I Am Prey 1.0

This is what i do. Audio is fetched in streaming mode and decoded in chunks. I start playing once i have a small “ahead” buffer. A long piece, like music will still be downloading whilst playing.
I try to feed the output 50ms ahead of running dry. ie 1 frame at 20 fps.

1 Like

To be clear: You got this working through the Web Audio API on a browser client?

EDIT: According to this, it looks like I will need to figure out some way to splitting a longer *.ogg file into smaller files, which are decoded and played one after another, because the API’s decoder needs a whole file in one go.

(Using *.ogg as my target audio format for reasons.)

EDIT 2: Wow, a full song takes like 250 MB of RAM after decoding it. The people who keep telling me (and telling me) that engineers don’t need to worry about RAM anymore are absolutely full of it.

EDIT 3: Okay, so this engine does not run as-is; the idea is there is a build process, which generates the final HTML file. One option I have is to mark certain audio files as “long” beforehand, and split them up with ffmpeg or something in the build script, and then the engine just recognizes all the chunk files as segments of a larger audio track.

I’m reaaaally tempted to see if I can play MIDI files through the Web Audio API, inspired by what @pinkunz suggested. Maybe use noise oscillators and processing to create live environment ambience.

(If it weren’t so difficult to get timidity to run in a web browser. Also…the size of sound fonts aren’t ideal… Time to run some tests, I guess.)

3 Likes

Okay, I started a new branch, and have modified my build process to allow for an “embed” directory and “streams” directory.

Anything in the “streams” directory will be cut up into 3-second “chunks” by ffmpeg. The manifest file will register all of these as chunks for a single virtual audio file.

At this point, it loads correctly. Now I just gotta hook this into the current engine audio manager.

2 Likes

You got this working through the Web Audio API on a browser client?

Yes, except I’m using WASM. Although the audio uses WebAudio via JS. This is provided for me by Sokol audio. Look for the EMSC/WebAudio bits in sokol_audio.h.

I use ogg too, and I don’t need to decode the whole file at once because i decode it myself with the vorbis lib. i use the “pushdata” API and feed it chunks of ogg that i receive from net streaming, and wait for it to push out decoded samples. Those i put into buffers for playout.

Yeah 250M is a lot. I have this when I play a music track. i don’t reclaim memory immediately, instead having a cache - because often a sound is played again shortly after. Things like ambiences can also be long and they’re looped, so they cannot be thrown out until they are stopped. Even then, the player will probably walk back into the location with the ambience a few seconds later! However, with ambience sounds, i compress them more :slight_smile:

2 Likes

This sounds like it’s right up my alley, and with the focus on screen reader support, I’d be more than willing to test the accessibility once it’s remotely close to being completed. Best of luck!

4 Likes

Oh, this is an amazing idea, actually. Right now, I have everything using the same quality compression amount, but I suppose nothing stops me from choosing varying levels of compression, based on how much high-end and prominence the sound will have in the game…!

I just got done taking WASM out of my project. I’m gonna explore a few solutions before considering adding it back in, lol. I can program in C/C++, but my forte is Java and C#. I’m already well-outside my comfort zone using JavaScript, and C#'s Blazor seems to be a bit unstable still, from what I’ve been hearing lol.

Excellent!! :star_struck: Thank you! I will absolutely let you know! I’ve been teaching myself to use screen readers and testing on a lot of browsers, devices, and platforms, and also reading a lot of accessibility blogs and standards, but getting this to you will be some actual proper field testing! I really appreciate it!

2 Likes

Well in my case compressing some sounds more than others doesn’t save memory in the end as it still has to be turned into PCM. But it does make downloads smaller and it does mean you have a longer runway before the next block is needed.

With languages, mine is all C++ but any modern language would do just as well. I have a requirement to statically compile to machine binary as well as to WASM. I’ll also build an Android version at some point.

Making it work well on the web has been the hardest because you are at the mercy of several browsers. I have found that some browsers don’t like giving out too much VRAM (I’m looking at you Edge!) and that mobile bandwidth is paltry in comparison to desktop - despite what modern phones claim!

2 Likes

Okay, it’s really goofy, and there’s a chance it will fail more extreme tests, but I swapped ffmpeg for mp3splt, and had it split anywhere between 1 and 30 seconds.

This means that a song will have (on average) 10 MB loaded at a time. Memory snapshots tell me my game averages 28 MB with music playing and no ambient sounds.

One of the split points in one of the songs has a single, tiny clipping click, which is annoying…

I’m gonna lay down and ponder the results. I’m really impressed by how well it works as-is, but I gotta keep those clicks from happening.

Considering making my own splitter, for extra control.

2 Likes

I don’t have any experience with the web audio API, but let me ask: how sure are you that you need to go down this rabbit hole? The stack overflow question you linked discusses huge delays of up to 30s when using HTML audio elements. But this is in the context of live streaming, so I suspect that the browsers delay playback until enough data arrived over the network, to reduce the risk of stuttering. If this is correct, or if anything else changed in browsers since 2015, then that question’s conclusions may not be applicable to your use case. If you could just use a MediaElementAudioSourceNode and leave the streaming decoding to the browser, that should be much simpler while keeping the memory consumption low, right?

3 Likes

I’ve gone down some wild rabbit holes; I might surprise you…! :joy:

Honestly, making a custom slicer utility wouldn’t even be that much of a descent, compared to other stuff I’ve done.

The Web Audio API has the necessary precision to play these chunks immediately after each other. The clicking mostly comes from slicing artifacts.

I’d rather not have audio elements on the DOM where keyboard navigation could find them, and screen reader functionality has been proving extremely fragile when it comes to fine-tuning how exposed a DOM element is.

However, I’m not rejecting this idea, because it’s way better than my other backup ideas, lol.

I would just need to:

  1. Put the audio element somewhere that keyboard navigation couldn’t find it.
  2. Also put it where the screen reader can’t find it.
  3. Make sure the screen reader can still use the page normally.
  4. Make sure that playing and stopping the audio element doesn’t mess with the screen reader cursor.
  5. Figure out if/how seamless looping would be done.
2 Likes

Okay, it looks like hiding audio elements behind visibility: none safely solves everything, and there is a loop property.

Looks like I got another branch to try in my repo, lol

4 Likes

@Hanna, you’re a genius! :star_struck:

It works perfectly! Memory footprint is around 24 MB now! Thank you for chiming in! :grin:

There are a few bugs, but they’re entirely self-inflicted (they’re found in my music and ambience transition system); the upside here is your solution has made parts of my engine finally functional enough that these bugs have now wandered into the searchlight!

Time for me to get fixing!! :saluting_face: :grin:

6 Likes

Bugs are fixed! :grin: Audio manager still needs some work, but it’s about 65% done now!

6 Likes