I am using the virtual audio cable (VAC?) from:
VB-Audio Virtual Cable and App's
vb-audio.com
It sets the default rate at 48.0kHz.
Is there a different one I should try?
On a Windoze rig (I much prefer
Ardour on Linux, but whatever), I have their "cadillac" version -
VoiceMeeter Potato - and it does just fine. In addition to 5 channels of its own that can connect directly to physical inputs and 5 more for physical outputs, it also has 3 loopbacks like what you have. Route anything to anywhere, control volume, and a (very) basic set of effects that aren't very useful to me.
Anyway, I would think that the loopbacks in there are exactly the same as yours, so that part is *probably* okay.
The Audio inputs are coming via a from a soundboard into the PC using one of these:
It's incredibly easy to abuse those things without realizing it. The input is designed for a mono mic, that needs a lot of amplification. When you plug a stereo line-level input into it, what would normally be the left channel is taken to be a WAY too hot mic, and so it clips easily, and the right channel connects to a weak power supply that is supposed to go *out* to power the mic. Much less confusing to use an obviously-stereo thing that directly says that it's a line input, like a
Behringer UCA202, for example. Search Amazon for it, or wherever you like to buy stuff.
But since you're somehow capable of producing a clean recording anyway, that's not the immediate problem either.
The original file from zoom is an .m4a which I loaded into audacity to trim the 1 hr recording down to 15 s for the upload.
That is probably lossy-compressed in a different way than MP3. Probably AAC. So it throws away the original waveform already, during the original AAC encoding, which needs to be reconstructed in Audacity with less than the original information, and then the export to MP3 throws some of *that* away too. It still *sounds* like the original, but that's all it has left. If you export from Audacity as WAV, then it'll keep everything that Audacity had at least, but there's still the initial lossy encoding before it even got to Audacity.
But since the direct recording from Zoom sounds okay, that's probably not the problem either. It only makes it difficult to see what the problem really was.
I think I understand your analysis that the VAC is sampling at a different rate the the USB device, so we are getting some aliasing (do I have this right?).
*Somewhere* in the chain - don't know where - something is getting a different sample rate than it expects, and instead of actually resampling, it just inserts silence where each buffer falls short. That *somewhere* could be anywhere.
Aliasing refers to the consequence of not sufficiently filtering out everything above the Nyquist frequency, which is half of the sample rate. At 48kHz, for example, Nyquist is 24kHz. Anything above Nyquist gets "mirrored" back down and mixes with the stuff that really is down there. In fact, the entire spectrum from Nyquist up to infinity gets "accordion folded" into the range between 0 and Nyquist. (something that is *at* the sample rate becomes 0Hz or a DC offset, and it counts back up again from there, etc.) THAT is aliasing, and the way to avoid it is to lowpass the signal below Nyquist before sampling it.
In a conceptually simple converter, you would have an analog filter (resistors, capacitors, opamps), followed by a 48kHz converter chip. But a high-order "brickwall" analog filter (so as to keep 20kHz unchanged while completely stopping 24kHz) is both stupidly expensive and way too sensitive to the microweather around it. So the way it's actually done instead, is to sample in the low-to-mid MHz range, which allows a "jellybean" analog filter to do just fine, and then digitally "brickwall" it inside the converter chip before taking only the samples that it actually needs to produce 48kHz at the output. (can you tell I'm an audio engineer?)
From your note: Zoom is recording at 32kHz (?), so should I change the sampling rate for the VAC to 32kHz?
*Something* produced the 32kHz recording that you ended up sending. (Nyquist = 16kHz, which is below the top of "perfect" human hearing at 20kHz) It could be that Zoom did that, or (less likely) it could be that you had Audacity set to do that. I can only see the final product from here.
As for changing everything to 32kHz, that *might* solve it. It's entirely possible that your input card can sample at that rate, and that's what happens when you connect it directly to Zoom. But I would see instead if I could get Zoom up to 44.1 or 48 and match everything to that. Keep Nyquist entirely above audible.
After more digging in the control panel: looks like the default input sampling rate for VAC is 48kHz but the default output sampling rate is 44.1kHz.
The default rate for the USB input device is 48kHz.
So next step is to make all three match at 48kHz.
Back to church. Stay tuned.
Yep. 48k is the preferred "professional" rate. 44.1 comes from CD's, and I think that's mostly because it allowed about 8% more audio on the same size disc. But the margin between its Nyquist and the top end of audible is pretty small.
Multiples of both also exist, and audiophools are all over them, but there's really no audible benefit. In fact, looking at the converters again, most of them actually change modes depending on which multiple you're using (1x, 2x, etc.), so that the initial low-to-mid MHz rate stays the same. So the analog lowpass must also stay the same. The different mode for a higher output rate uses a different internal digital "brickwall" because it can afford to pass more high-frequency content, but it does that by reducing the rolloff rate while keeping the same cutoff right at the top end of audible. So you're not really getting much more "detail" anyway, even if you *could* somehow hear that high. Some yes, but not much.
For one of my projects, I *am* looking at 96k instead of 48k, not because it sounds any different, but because the less aggressive "brickwall" has fewer samples of latency in addition to having less time per sample. This is for a live application where the mic and the speaker are close together, so it'll form a comb filter if I'm not careful, when it mixes acoustically with the original sound. So the higher rate for me in that project is all about timing and nothing whatsoever to do with quality.
But if Zoom is involved, then the distance is probably also enough, considering the speed of sound, that the latency involved with 48k is still negligible.