Question / Help Audio and video (multiple tracks) have all different lengths

glabifrons

New Member
I apologize in advance if I'm weak on some of the terminology. I'm a greybeard with tech, but a newb with AV recording/editing. I'm also not sure that this is a Linux issue, but I don't see a general forum for bug/issue discussion.

TL;DR: My most recent recordings have multiple audio tracks, and when I extract the audio tracks using either mkvextract or ffmpeg, they all have different lengths from the video and each-other. One example is ~40 seconds off over an hour. How can I extract the alternate audio tracks and keep them in sync?

Short description:
I just started doing some game recordings with separate tracks for the game and my microphone so I could do some post-recording rebalancing, and I ran into a problem. All of the videos (mkv) contain tracks with all different lengths, or at least that's how then end up after extraction (maybe some metadata external to the track is being missed during extraction?).
I've googled more than a little trying to solve this before coming here. One suggestion I saw in a thread with a person with a similar problem was to remux. I tried that, but the mp4 then had tracks with different lengths, so same problem, different container. Another suggestion was to use the Use Device Timestamps option in the input device configuration, but this version (current/stable) of the software does not appear to have that option.

My OBS Studio configuration:
Audio track 1 is game audio and mic already mixed.
Audio track 2 is game audio only.
Audio track 3 is microphone audio only (Plantronics headset).
Format is mkv.
Encoder is set to "(use stream encoder)".
Audio bitrates are all left stock at 160.
Sample rate is 44.1khz.

Details:
I'm using OpenShot for editing as it was the easiest I've tried, but unfortunately, it doesn't appear to support anything other than the primary audio track. If you have suggestions for video editing software which runs on Linux that may solve this issue, I'm open to them.
I extracted the second and third audio tracks using mkvextract's "tracks" option, which seemed extremely straightforward. I tried extracting the tracks to wav and aac, with and without the "-f" option, all with the same results.
When imported into OpenShot, one video (for example) was 00:10:03:01 long, while the second audio track was 00:09:14:03 and the third was 00:10:03:14 long (this is as close as I can tell with the granularity of OpenShot's GUI).
If I align the ends of the recordings, the beginning is way off. If I audibly align the recordings (aligning a sound with a graphic in the video) at the beginning (yes, it's off by a second or so), the audio at the end is horribly off. It appears some stretching and/or length-compression is needed, but I'm not sure why (or how, for that matter).
Another recording contained video which was 1:12:40:23 long, while audio track 2 was 1:12:04:01 long and audio track 3 was 1:12:44:08 long, a spread of ~40 seconds over an hour.
As mentioned above, I also used the "Remux Recordings" option to convert an mkv file to mp4, then used ffmpeg's "copy" audio codec to extract the tracks with the same results as the above.

Questions:
While writing up the below questions, it occurred to me to check the alternate tracks using VLC and it turns out they stay in sync when played from the original recording. There are also random audio drops, those I noticed in previous (non-multi-track) recordings I've done (and posted) previously.
Am I extracting the audio incorrectly? What do you recommend?
Is there metadata in the mkv which keeps the audio in sync? There must be, else VLC wouldn't be able to play it back in sync.
I'm mainly looking for a way to extract the audio while keeping it in sync, but am also interested in eliminating the audio drops I'm experiencing.

Software and hardware:
My OBS Studio is the current stable version (20.1.0).
My OS is Ubuntu MATE 16.04 LTS (fully up to date).
My rig is an i7-4790K with 32GB of DDR3/2400 RAM. There are only 5 CPUs made with faster thread performance, and the difference between this and the fastest (which is 4 generations newer) is less than 8%, so CPU performance should not be a problem.
My card is an EVGA GTX 950 FTW edition (~20% faster clocks than spec GTX 950), so not a monster, but it hasn't reached peak load during recordings either, at least it's never been when I've checked.
Audio is via Realtek ALC1150 in the GIGABYTE GA-Z97X-UD3H motherboard (I've had no audio issues at all with this outside of OBS Studio recordings).
Storage is via NFSv4 on a Solaris 11.3 server using ZFS with SSD ZIL & L2ARC (separate SSDs for write and read caches) backed by striped and mirrored 4TB 7200RPM drives so storage speed should not be an issue. Besides, the video is far higher bandwidth and is unaffected.


Examples of audio drops in single-track recordings (possibly related to the above) can be heard in game recordings on https://www.youtube.com/channel/UC8VoVhoSa3s8m47yxdOztRA.
A specific example would be at 1:35 and again at 2:05, 2:13, 2:20, 2:22, 2:33, 2:34, 2:36, 2:56, 2:58, etc. in https://www.youtube.com/watch?v=GaOIvkC11Gs.
Note that the video stalls are actually from the game and not from OBS, that's exactly how the game plays (I've seen a lot of people with far higher end rigs than mine with the exact same game video stalling issue (they play on Windows and use different recording software), the game is in Early Access and under heavy development).


Edit: Changed "all of my" to "my most recent" in the TL;DR.
 

c3r1c3

Member
Double check that your sample rate in the extracted audio tracks matches the sample rate of the main/in sync/recorded tracks.

Also I would check in with the FFMPEG people seeing as that you're using that to extract the tracks. They might know of an option to 'lengthen' the tracks to match their original 'length'. Additionally (just a suggestion in general) is that you post the FFMPEG command that you're using for the extraction.

Lastly OBS supports Pulse, ALSA and JACK audio sources. You haven't posted a log so I don't know which type you're using for your sources, but please consider trying a different source type for your devices to see if the audio drop outs/out-of-sync still happens.
 

glabifrons

New Member
I forgot to attach a log. This one is from the one with a ~40 second difference, mentioned above.
 

Attachments

  • 2017-10-28 17-48-30.txt
    8.5 KB · Views: 47

glabifrons

New Member
Double check that your sample rate in the extracted audio tracks matches the sample rate of the main/in sync/recorded tracks.

Also I would check in with the FFMPEG people seeing as that you're using that to extract the tracks. They might know of an option to 'lengthen' the tracks to match their original 'length'. Additionally (just a suggestion in general) is that you post the FFMPEG command that you're using for the extraction.

Lastly OBS supports Pulse, ALSA and JACK audio sources. You haven't posted a log so I don't know which type you're using for your sources, but please consider trying a different source type for your devices to see if the audio drop outs/out-of-sync still happens.

Thanks for the incredibly quick reply!
I was grabbing the log to post already before I saw your response... posted.

The sample rates should be untouched, as I'm not re-encoding them, only extracting the streams.
Code:
$ mkvinfo 2017-10-28_17-57-52.mkv | egrep -i 'name: track|sampl'
|  + Name: Track1
|   + Sampling frequency: 44100
|  + Name: Track2
|   + Sampling frequency: 44100
|  + Name: Track3
|   + Sampling frequency: 44100
$ file 2017-10-28_17-57-52*aac
2017-10-28_17-57-52-game.aac:  MPEG ADTS, AAC, v4 LC, 44.1 kHz, stereo
2017-10-28_17-57-52-voice.aac: MPEG ADTS, AAC, v4 LC, 44.1 kHz, stereo

The ffmpeg line I used was:
ffmpeg -i *.mp4 -map 0:2 -acodec: copy -vn test.aac
ffmpeg -i *.mp4 -map 0:3 -acodec: copy -vn test1.aac

I also used mkvextract with the following script I wrote:
Code:
#!/bin/bash
#
# Extract second and third audio tracks from OBS recording to split out
# game audio and voice recording for remixing.
#
suffix1='-game.aac'
suffix2='-voice.aac'
#
while [ $# -gt 0 ];
do
    length=$(/bin/echo -e $1"\c" | wc -c)
    filebase=$(/bin/echo -e $1"\c" | cut -c -$(($length - 4)) )
    mkvextract -f $filebase.mkv tracks 2:$filebase$suffix1 3:$filebase$suffix2
    shift
done

I tried both with and without the "-f" option, and I tried aac and wav files, all with the same results.
 

c3r1c3

Member
I see that the sample rate for your devices don't match (both desktop sources are 44.1kHz and your mic is 48kHz). Please set all of them to the same as OBS, or at least the same as each other. I would start with that. If that doesn't clear it up then...

I see you're using Pulse. Try using ALSA for at least the mic, if not also the desktop audio source(s).

Edit: Removed because it was an inaccurate statement.
 

glabifrons

New Member
I see that the sample rate for your devices don't match (both desktop sources are 44.1kHz and your mic is 48kHz). Please set all of them to the same as OBS, or at least the same as each other. I would start with that. If that doesn't clear it up then...

I see you're using Pulse. Try using ALSA for at least the mic, if not also the desktop audio source(s).

Edit: Removed because it was an inaccurate statement.

The plantronics kernel module doesn't appear to have any options. I'm not sure how to change the sample rate.

Also, the mic audio is actually closer to being in sync than the game audio. It's the game audio that is horribly out of sync once extracted (using either method/container format mentioned).

I don't see how to select the sound-server used in OBS Studio. I only have 1 option in its Settings for a mic. Do I need to hand-edit a config file under ~/.config/obs-studio/?

I added the following to ~/.pulse/daemon.conf then killed pulseaudio:
realtime-scheduling=yes
default-sample-rate=44100
I'll have to do some test recordings and see how it goes.
Unfortunately, I still have the existing recordings I have to deal with... I'll have to see if I can find some editing software that can deal with multiple tracks.
 

c3r1c3

Member
For software: https://obsproject.com/forum/resources/post-production-tools-you-can-use.234/ (I *think* Lightworks will do it for you, and DaVinci should...)

To add the other source types (ALSA and JACK) it's the same as adding any other source, except you choose that instead of the one called "(PulseAudio)". If JACK isn't showing up that means you don't have the correct JACK stuff installed on your system. ALSA should always show up.

As to the sound going out of sync, like I said earlier you'll want to check in the with FFMPEG people. They know a lot more about using that to stretch out audio then I do.
 

glabifrons

New Member
I don't think simply stretching the audio will work, as that will align the beginning and end, but anywhere in between could be all over the place, depending on the distribution of the audio drop-outs that are most likely causing the length mismatches.

I tried several different video editors (Cinelerra, Avidemux (I thought for certain this would work), Shotcut, Lives (couldn't even load the recording), Ptivi..), but couldn't find a way to access the alternate audio tracks using them (I'm really surprised this isn't something common).

I figured out a clumsy way of getting around it. Rather than simply extracting the different audio channels, I'm copying the audio channel I want while re-encoding the video to an absurdly small 16:10 resolution (to save space, since the collection I'm editing are already 5GB before splitting the audio out). This way I have the audio forced to be in sync with the (unwatchably small) video. I checked, and the timing perfectly matches up across the original and both of the resulting files.

This is what I'm using to rip out audio tracks 2 & 3 into separate files, in case anyone else needs to do something similar:
Code:
ffmpeg -i 2017-10-14_16-18-53.mkv -map 0:0 -map 0:2 -c:a copy -c:v mpeg4 -vf scale=w=16:h=10 testtrack2.mkv
ffmpeg -i 2017-10-14_16-18-53.mkv -map 0:0 -map 0:3 -c:a copy -c:v mpeg4 -vf scale=w=16:h=10 testtrack3.mkv

The 0:0 specifies the video track, and the 0:2 and 0:3 specify the alternate audio tracks (0:1 is the default which everything already picks up on). The "-c:a copy" tells ffmpeg not to re-encode the audio track, but to simply copy it into the destination container file. The rest tells it to re-encode using the mpeg4 codec while scaling down to 16x10 resolution. Most others will probably need 16:9 here, if they're doing normal HD or 4K video. Mine happens to be 1920x1200, so is a 16:10 aspect ratio (yeah, I could've gone down to 8x5, but this brings the files down to 13MB from 215MB for the first test, so I'm happy).

The tracks can be identified using mkvmerge:
Code:
$ mkvmerge --identify 2017-10-14_16-18-53.mkv
File '2017-10-14_16-18-53.mkv': container: Matroska
Track ID 0: video (MPEG-4p10/AVC/h.264)
Track ID 1: audio (AAC)
Track ID 2: audio (AAC)
Track ID 3: audio (AAC)
Global tags: 1 entry
Tags for track ID 0: 1 entry
Tags for track ID 1: 1 entry
Tags for track ID 2: 1 entry
Tags for track ID 3: 1 entry
 
Top