My first thought was a mismatched sample rate, between 48kHz and 44.1kHz, and something is not resampling correctly. I've seen a few of those on this forum.
My second thought is that you have two copies of the same source, with a slight difference in delay between them, and both are getting mixed into the final soundtrack.
There's no download button on that page, but I have a downloader app that works anyway. It's questionable how much that entire chain (compression on your end, uploading to a site that recompresses it to work like a TV, then downloading that and making a file out of it again) mangles the audio to the point that the only thing left is a similar set of frequencies, but that's what we have to work with. If you have the original audio to upload to a file share (not a video share), it might be better.
The file that I ended up with has an AAC soundtrack at 48kHz, according to
my video editor, and
Audacity says that the short isolated consonant sounds are repeated about 54ms apart. That's about 2592 samples.
Choosing between 48kHz and 44.1kHz, I can't quite line up that 54ms with a common buffer size at either sample rate - 2048 samples is between 40ms and 50ms - and it would require about 60 feet (18.5m) of separation between multiple mics to do that acoustically, which I don't think you have in that studio.
If both were happening - distance between multiple mics, with the farther one taking two trips through the buffer for some reason - that would shorten the required distance to about 10 feet (3m) between mics.
The repeat sounds similar enough though, that I don't think it's multiple mics. Even two of the same mic in different places would pick up different reverb, and I don't hear that.
Of course, there's also the possibility that you have a
Sync Offset, and that plus another un-delayed copy either produces the effect all on its own, or combines with a buffer somewhere to do it.
Don't know that any of those are the correct answer, but it should give you some things to check. Screenshots of your end would be nice too, not just of that window, but all of your audio settings. If you have some processing outside of OBS, include that too.