AudioSource::SortAudio()

Tyr

New Member
I have a few questions about the AudioSource::SortAudio() method, which I came across while trying to fix out-of-sync issues the VideoSourcePlugin had. The way I understand it the method is used to remove problems introduced by draining the buffers of auxiliary audio sources and the possible bad timestamps it introduces.
To explain what the problem in the plugin was I'll quote what I wrote on the pull request of the plugin:

Basically VLC pushes a lot of audio at the start, let's say 2 seconds. In addition, OBS pulls all of the buffered audio from auxiliary source like our (null) device. Because of this, these first two seconds of audio all get timestamped with the same time stamp, because we use GetAudioTime() to get our time stamp and its value doesn't change during the burst. The pts times VLC delivers are also pretty unreliable, so they don't help at all.

OBS has two methods with dealing with this problem but both of them fail.

The first one is AudioSource::SortAudio(), which is called after OBS burst pulls the audio buffer. This method ensures that the segments' timestamps are spaced 10ms apart. But it assumes that the last time stamp pulled is correct and adjusts the timestamps before that accordingly. This is obviously wrong in our case, the first time stamp would be the correct one. Also this actually means that the first segments are shifted into the past and are therefore discarded.

The second method is part of AudioSource::QueryAudio2, where the timestamps are smoothed, meaning they are actually ignored and replaced with the last time stamp + 10. The original time stamp is only used, if the difference is greater than 70ms. Since VLC pushes way more than 70ms of audio and these segments all carry the same timestamp, this smoothing is not enough.

The way I fixed it, the plugin now does the smoothing itself by using new timestamp = last time stamp + 10 and only using GetAudioTime() again, if that value is greater than the last time stamp + 10.

So the first question I have is why does the method assume that the last time stamp pulled is correct? Since the OBS audio loop pulls all of the available audio data instantly, if the plugin relies on time information from OBS, the first segment time stamp would be correct and all of the following would be the same. Were there any plugins/sources, where the last time stamp pulled was more reliable than the first one?

The second question would be, why is the entire available audio buffer pulled at once? In the case of the desktop source, the buffer is only pulled until the specified scene buffering time is reached. Were there problems with auxiliary audio sources, if they aren't drained entirely?
 

Lain

Forum Admin
Lain
Forum Moderator
Developer
This is actually a surprising post, rare is the person who willingly ventures in to the source base of OBS1 and survives unscathed.

To sum it up, OBS 1's audio subsystem is pretty.. abysmal because of design flaws while originally writing it. It was my first time venturing in to the deep dark mysterious world of audio/video recording/playback, and holy moley is it filled with a million exceptions, gotchas, and issues you could never believe, especially from devices. I wouldn't advise anyone getting in to this genre of coding unless they're ready for massive headaches. You can see by how unreliable those timestamps are what sort of stuff you have to deal with, and I never knew that when creating that audio subsystem. I was used to gaming subsystems, where everything just works, but here, here you have to compensate for *countless* issues.

I remember I felt I had a pretty good design at first. However, because I was new to it, I made the *one* big amateur mistake in audio/video coding while designing it: assuming that timing would be reliable for any given input, audio or video. Upon my discovery that timing could be junk from any given input, I was confused and perplexed. I was angry at device manufacturers. Angry at microsoft. Microphones, capture devices, and drivers could give you broken timestamps, or timestamps that were under the last timestamp, or they could expect 5 seconds of buffering and send their data back in time after being processed, or they could repeat the last 10 values over and over and a whole bunch of strange things that astounded me. I still am angry at all of them to be honest, that there is no real concept of standards in hardware, or that they can completely ignore standards. Any given device may give you junk timing data at any given point, it's abysmal to work with, I wouldn't advise this programming for anyone with a weak stomach.

So I changed it to rely on a different time base, then I discovered other bad devices which broke with that. Changed it again to account for another thing, and again for this, and again for that, until it turned in to a giant monstrosity of total duct-taped garbage that I am embarrassed to even admit that I wrote. Right now, the reason why you have to use GetAudioTime() is because all audio time in OBS1's subsystem is based upon desktop capture audio timing, which is always reliable (it was the one reliable thing I could always count on in almost any case). So when that is used, it ensures that if you push the data, it will play that data at the according timing that the data was pushed.

This worked, however, there was yet another exception. Audio burst. Audio burst can happen from devices and then it basically screws up the timing and caused all data to go out of sync. The only way to fix it without completely rewriting the audio subsystem (which is done in obs-studio), was to query the audio each audio tick for each device until the device buffer was empty. Again, everything is based upon desktop audio time. Desktop audio timing is always reliable and on time, and not subject to any of these issues. It's always in sync with captures such as game capture, window capture, and such as well.

So, when the desktop audio buffer has been emptied, it empties all other buffers for all other audio sources as well to ensure that all audio sources are "in sync" and not actually behind in buffering due to burst. However, doing this means that GetAudioTime() will begin to repeat the last values, so when it's finished emptying all the audio source buffers, it's necessary to sort all the audio packets backwards to "shift" all the data based upon the burst compensation size. After that, it then remembers that burst compensation size and will account for that burst size with newer packets, so it'll automatically increase the audio smoothing threshold based upon that size.

-----------------------------

Whew, that took a while to explain. The reason it took so long to explain is just because OBS1's audio subsystem has too much duct-tape and band-aids for what was a flawed design based upon one single amateur assumption. It's amazing I actually managed to hack it to work so well considering the layers of duct-tape. I would have just rewritten it, but at this point it's not worth the time to rewrite OBS1's audio subsystem because I'm already rewriting the entire application, obs-studio. I already have a new audio subsystem with an far better design that accounts for all the timing issues right off the bat.

In obs-studio, you no longer have to use some silly function to "get audio time". You can now just directly pass, for example, returned VLC timestamps to it together with your audio segments at any point in time and it will automatically calculate all the timing data based upon those timestamps and place it in the right position in a circular buffer. If you pass a combination of audio/video data, that data will now *always* be synced. It will automatically account for non-monotonic timestamps and all timing bugs/quirks that I know of, and it will handle them the way they're supposed to be handled. It will even detect system timing in case you want to use that for your timestamps. It accounts for every possible scenario, and is simple and straight forward. I feel so much better about it this time around.

-----------------------------

So, to answer your first question, it's accurate because GetAudioTime() is always accurate at the time it's used. It returns the value of "what's the exact audio time right now"

Second question: to compensate for audio burst, which caused audio drift.
 

Tyr

New Member
Thanks for the detailed answer. The new audio system in obs-studio does sound nice. But regarding your answer to question 1. It is true that GetAudioTime() is accurate at the time it's used, but in the case of VLC where a *lot* of audio is prebuffered, it is used like 2 seconds before the last segment should be played and this last segment gets the time stamp of *now*. Which means that all other segments are shifted to the past and discarded. So in that case it doesn't work. Maybe it's different with capture devices/sources and it has to be this way but yeah it gave me kind of a headache ;).
 
Top