Still quite baffled how this is only going to work on human voices though.
"Noise" is simply an *unwanted* signal. "Unwanted" must be defined, and there's no basis in physics for what that definition should be. That's a big part of what makes it so hard to remove; physics can't help you with it.
You're probably thinking of "noise" as an electronic hiss or a ground-loop buzz. That's usually true for humans, but physics has no idea. And it's especially hard to define "hiss" because it's literally random.
It's much easier to define what you want to *allow*, than to define what you want to stop.
- Active noise-cancelling headphones, as an overall device, are designed to stop "anything that comes in from outside", and they have microphones to tell them what that is. But they *actually* work by *allowing* "only what comes in from outside", in such an exact way that it can be polarity-reversed and played on the inside so that it cancels what gets through the passive-reduction earmuffs.
- Noise suppression processors (that *family* of processors, not just a specific one), are conceptually simpler in that they take the definition of what to allow directly and just send that out. And for the vast majority of them, that definition is a technical/mathematical approximation to "a human voice".
---
Your end goal is to remove voices and keep the ambient sounds. So you use a processor that already exists but does the opposite - removes the ambient sounds as "noise" and keeps the voices - and then you feed that into a logical inverter. Also called a NOT gate.
Electronic NOT gates take an input signal and feed the power supply to the output or not, following the opposite of what the input is doing. Either way, the original input stops there. A side-chained compressor can also be thought of as a logical inverter or NOT gate, because it turns down the main input signal to the output, following the opposite of what the detector sees on the side-chain input. And the side-chain signal stops there. A "normal compressor" simply has the main input and side-chain tied together internally. (*)
By using the side-chain separately, you make the detector look at a separate control signal instead of the one that's going to get turned down. That control signal is handled as audio, and it could even *be* audio if you ran it to some speakers, but it's used here as a control signal. No difference in analog circuitry - it's all voltages and currents there. And no difference in digital processing either - it's all streams of numbers there. Some systems try to reduce confusion by labeling things differently and not allowing certain connections, but that's just arbitrarily limiting.
Anyway, you run the raw mic both to the compressor's main input and to the noise suppressor's input. (see my diagrams again) The noise suppressor's output then feeds the same compressor's side-chain. Now, when the noise suppressor does not detect a voice, its output is quiet, and so the compressor stays open, which allows the raw sound through. When the noise suppressor does detect a voice, its output is loud (relatively speaking), and so the compressor clamps down, which makes that channel quiet.
Then to avoid the entire output dropping out, you do the same thing independently for several mics that are spread out enough that you're not going to lose *all* of them from the same conversation. One noise suppressor and one compressor dedicated to each. Then mix the outputs of those compressors together and either take that as the final output, or lightly compress the mix normally (no explicit side-chain, or side-chain connected to the main input) so that the overall level doesn't fluctuate as the individual ones drop out and come back. The more mics you have (mic *locations*, more accurately), each with its own dedicated processing, the less the level will fluctuate and the less bus compression you'll need, and the more effective you'll be at ignoring conversations anyway.
---
(*) Most analog compressors have a "side-chain insert" jack, which feeds a copy of the main input back out to some other processing (which saves you an explicit splitter), and returns the result of that to the detector alone. This allows you to modify how the compressor behaves without affecting the signal that it passes. An EQ boost in the side-chain, for example, can make it extra sensitive to that range of frequencies, without actually changing the sound except for the compressor's eagerness to drop out when that range becomes active.