I'm a software-computer engineer and audio engineer/YouTuber and here is my technical opinion from years of experience in OBS. 44.1KHz is a nightmare to work with, stick with 48KHz at 160kbps. Everything is set by default to 48KHz in Windows, and there is a good reason for it.
Sure, you can't hear the extra high-frequency sampling of 48KHz, but it's very important that you upsample 44.1KHz analog audio sources to 48KHz. If you don't then you will get some aliasing in the signal, which is kind of like Chromatic aberrations when a lens doesn't have enough megapixel of sharpness for the sensor. The Digital-to-analog converter has edge transitions from one sample to the next that will cause glitches in the audio when sampled at the same frequency. When you are sampling an analog source at 48KHz, you want to sample it at 96KHz. In layman's terms, that gives you two samples that get averaged out. The problem is that 96KHz will start to tax your CPU because of interrupts and 48KHz is good enough to eliminate aliasing.
Another good relevant hack I would like to add is that with 16-bit audio you can get a maximum amount of 96db before the signal gets noisy, which translates to 6db per bit. If you want more gain out of your mics, you should use exactly 6db of digital gain after the compression stage because it has the effect of throughout one of the bits, or bit shifting the audio sample by one bit. If you're using 24-bits you go down to 23-bits, if you use 16-bits you go down to 15-bits. It really helps with dynamic mics, you can avoid buying the cloud lifter. You should be using 24-bit audio everywhere you can.
thx for this clarification, that's helpful. There are situations when it would be useful to capture at higher sampling rates but those are rare.
This hack... did I get it right if what you are saying is akin to this: by recording audio with a digital gain of 6db, we can actually save a louder signal while keeping one bit to spare - because - it happens to be exactly twice the amplitude. And computers like that - the data is simply "shifted" one bit and hence is not more complex to encode/decode than some other arbitrary value would be, so therefore the signal can be one-bit louder - without clipping - naturally allowing us to capture more data, which is proves especially useful with less sensitive mics? If so that's genius :) thanks again
The problem with *exactly* 6dB is that it's *not* exact. By definition, 10x the amplitude is a difference of 20dB, no matter where you are on the scale, so the formula is 20*log(2), which is 6.02059991328...dB. Exactly 6dB would be a factor of 1.99526231497..., which is not a simple bit shift. Not even an exact factor of 2 would be a simple bit shift in a general-purpose gain block. They don't analyze their inputs in real-time to choose the best way to do each one. They just use the same method for everything. Much simpler logic that way.
If you're wondering why it's 20 and not 10 or even 1, that comes from the original Bel scale (for Alexander Graham Bell) being used for *power*. A difference of 1 Bel is a factor of 10 in power. The Bel turned out to be too large of a unit to use commonly though, so we use deciBels (dB) instead. Thus, the factor of 10 out front when measuring power. The factor of 20 when measuring signal amplitude, comes from power following the *square* of amplitude; it's not linear. Pull the square (power of 2) out of the logarithm, and it becomes an additional factor of 2 instead, because that's how logarithms work. Combine that with the original factor of 10 because we've chosen a smaller unit, and you get a total factor of 20.
All of that said though, YOU CAN'T TELL THE DIFFERENCE ANYWAY. We don't listen to samples and their values; we listen to frequencies and their amplitudes. Our ears are not simple microphones. They're a series of acoustic bandpass filters, each followed by its own peak-detecting nerve. Or in other words, a "biological RTA". Because of this, phase means nothing, until two copies of the same signal are delayed or phase shifted to interfere with each other, and we only hear *that* because the interference messes with the *amplitude* of each frequency.
---
For why to sample at 48kHz or 44.1kHz instead of *exactly* Nyquist, which would be 40kHz, that's because no filter is a perfect "brick wall". You need to give it some space on either side of the set frequency to stop rippling (too much) in the passband, and to get the stopband down enough that you won't notice it aliasing. More wiggle room allows a gentler, cheaper, faster filter, though most modern converters use the same digital filter for both rates and just give it a different clock, which results in a slightly different cutoff frequency for each. It's fine for both though. It's when you get into significantly different rates, like 96kHz, that it actually switches to a gentler filter with the same (or similar) cutoff frequency.
The reason to use the higher rate and gentler filter is not really for sound quality - 16-bit, 44.1kHz is already indistinguishable from analog - but for more technical reasons, like lower latency through the entire system. Not only do you have half the time between samples, but the gentler filters also have fewer samples of latency! So you gain more than what you'd think at first glance, though we're still only talking about a hand-breadth or so at the speed of sound in air.
I have a project that *does* need that low latency, so I've already done that math. I find that it's often useful to think of an audio time delay as a distance like that, so you can compare it to the distances that you're already used to. If you can tolerate a speaker being 1 foot (0.3 meter) farther away, then you can also tolerate an additional ~1ms of latency without moving the speaker.
---
Modern converters actually sample the signal with amazingly few bits at a much higher frequency than what they output. The low-res mid-MHz raw signal allows the analog anti-aliasing filter to be dirt cheap with a rolloff that goes on for miles before it gets low enough to effectively disappear. Then the digital filter that I mentioned earlier both anti-aliases for the actual output rate and fills in the lower bits. Effectively an average already, before it even leaves the converter chip.
Aliasing is when a too-high frequency to capture accurately, "wraps around" to be encoded as a lower frequency instead. The only way to prevent that is to get rid of those high frequencies *before* the sampling. So the analog anti-aliasing lowpass filter is set for just-above-audible, but has a cheap and gentle rolloff because it has all the way to that mid-MHz raw sampling rate, not the final output rate, to finally get down to where it effectively disappears. (actually almost twice that, because it's okay for it to alias as long as the digital one still gets rid of it) And then the digital one inside the chip cuts off even more high frequencies so that the final output is *only* what we can hear. THEN it picks out samples, post-digital-lowpass, from that mid-MHz stream, to send out of the chip, and just throws away the rest because they really are completely redundant at that point.
---
The reason to use 24 bits for any serious work is because it *is* technically possible, though difficult and not audible, to make an analog circuit better than 16-bit equivalent in terms of its signal/noise ratio, and because computers like whole numbers of bytes. So we use 24-bit converters both to make the computers happy and to guarantee that the digital world is in fact better than analog could ever hope to be. The bottom few bits of those converters are entirely analog noise.
Then to further give no excuse, we upscale the processing from 24 bits (with the bottom few bits of that being drenched in analog circuit noise already) to 32 bits, 40 bits, 48 bits, 64 bits, whatever is necessary to ensure that the cumulative roundoff error through all the processing, does not affect the least significant bit of a 24-bit output converter. It may be entirely noise down there, but that noise is still mathematically perfect at that resolution, because the worst-case roundoff error might be in the 28th bit or something like that.
OBS, to my knowledge, uses 32-bit floating point internally, and a lot of DAW's use 64-bit floating point because they have no idea how complicated your workflow is going to be and 64 bits is a convenient, large, well-supported floating-point format. (called "double" in C and C++, compared the 32-bit "float" in the same languages)
Floating-point is interesting because it has a constant signal/noise ratio regardless of scale. It's 1.x * 2^y, where only x and y are actually stored, in a fixed-size field for each, plus a sign bit. So for a different scale, you still have the same resolution in x, and y is simply a different number.
Typically, floating-point 1.0 is considered to be "full scale" when converting to and from the "integer" formats. I put "integer" in quotes here because this conversion method makes them entirely fractional. They're 0.x, where only x is stored. Or to assign values to each bit, from most-significant to least-significant, it's: -1, +1/2, +1/4, +1/8, +1/16, etc. Adding bits gives you finer fractions here, not bigger numbers, but they still behave like integers as long as you (as a programmer) keep track of where the fractional point ends up when they're processed by true-integer hardware.
Now, because floating-point can go much larger than 1.0, and because it has constant S/N regardless of scale, it's vastly more forgiving of terrible gain structure than analog is. If you ever try to go beyond full-scale in analog, at any point in the chain, it'll clip there and sound bad until you bring the level down at *that* point in the chain. If you turn it down later, it'll still be clipped, despite the output being softer. Floating-point digital allows gross excess of 1.0 without even noticing. As long as you get it back under 1.0 by the time it converts back to "integer", you're fine. And similar for noise performance at low levels: analog circuitry has a constant noise level regardless of signal level, so if you have it too quiet at some intermediate point and turn it up later, you amplify that intermediate noise. Floating-point digital doesn't do that, but "integer" does.
For simple, fixed-function processing, like the software-controlled on-chip volume control and maybe even EQ in a PC sound card, it might be okay to stick with "integers", but for the vast majority of general-purpose stuff, it's probably floating-point.
---
Anyway, my point in all of this is to show how much better even basic digital is than typical analog, and that an exact setting - of *any* kind, really - for some sort of optimization is completely pointless. It doesn't optimize anything, and any theoretical non-idealities are so far buried in the analog circuit noise that you won't notice anyway, if they even get that far.