Looking for audio plugin to remove voices

AaronD

Active Member
Do you have a manufacturer and part number for this Noise Suppressor?
Any of them will do. Pick one.

I happen to like this one, to use in a DAW:
But again, no *particular* one is required. They all do the same job. Being tied to a product instead of a concept, is a sign that you really don't know what you're doing.
 

jebba

Member
I'm trying to build out the blocks in your diagram. I'm just testing with a short recording for now. Once I understand it, I'll see if I can do the same in OBS Studio.

I got the noise-repellent plugin built and was checking it out in Audacity. So that's one block.

noise-repellent-1.png



In your diagram, it says "Inside the compressor" there is a block "Detector with Settings". I'm not sure what that means. Which detector is that?

Thanks,

-Jeff
 

AaronD

Active Member
I'm trying to build out the blocks in your diagram. I'm just testing with a short recording for now. Once I understand it, I'll see if I can do the same in OBS Studio.

I got the noise-repellent plugin built and was checking it out in Audacity. So that's one block.
You might want to do it in a DAW instead. Audacity is not real-time, which makes the code easier, but also makes it harder to correlate settings to sounds. If you use a DAW to play back the recording, effectively live, through the chain that you're building, then the settings take effect immediately, while it's playing.

In your diagram, it says "Inside the compressor" there is a block "Detector with Settings". I'm not sure what that means. Which detector is that?
"Inside the compressor" is literally *inside* the compressor. Every compressor has that. If you want to *build* a compressor, or any other dynamic processor, then you need one explicitly (see the app notes for analog circuitry, just below my diagrams in the same post), but if you just grab one that already exists, it's already there.
 

jebba

Member
Ok, I got the plugin built and working for Ardour.

I used this fork/branch as it had updates for the latest Ardour that aren't in the main branch yet (pending pull request):


noise-repellent-ardour-1.png


I'll see what I can figure out with it there.

I use Ardour as the mixer in the pipeline to OBS already. Still quite baffled how this is only going to work on human voices though. I'll keep plowing through.

Thanks,

-Jeff
 

AaronD

Active Member
Still quite baffled how this is only going to work on human voices though.
"Noise" is simply an *unwanted* signal. "Unwanted" must be defined, and there's no basis in physics for what that definition should be. That's a big part of what makes it so hard to remove; physics can't help you with it.

You're probably thinking of "noise" as an electronic hiss or a ground-loop buzz. That's usually true for humans, but physics has no idea. And it's especially hard to define "hiss" because it's literally random.

It's much easier to define what you want to *allow*, than to define what you want to stop.
  • Active noise-cancelling headphones, as an overall device, are designed to stop "anything that comes in from outside", and they have microphones to tell them what that is. But they *actually* work by *allowing* "only what comes in from outside", in such an exact way that it can be polarity-reversed and played on the inside so that it cancels what gets through the passive-reduction earmuffs.
  • Noise suppression processors (that *family* of processors, not just a specific one), are conceptually simpler in that they take the definition of what to allow directly and just send that out. And for the vast majority of them, that definition is a technical/mathematical approximation to "a human voice".

---

Your end goal is to remove voices and keep the ambient sounds. So you use a processor that already exists but does the opposite - removes the ambient sounds as "noise" and keeps the voices - and then you feed that into a logical inverter. Also called a NOT gate.

Electronic NOT gates take an input signal and feed the power supply to the output or not, following the opposite of what the input is doing. Either way, the original input stops there. A side-chained compressor can also be thought of as a logical inverter or NOT gate, because it turns down the main input signal to the output, following the opposite of what the detector sees on the side-chain input. And the side-chain signal stops there. A "normal compressor" simply has the main input and side-chain tied together internally. (*)

By using the side-chain separately, you make the detector look at a separate control signal instead of the one that's going to get turned down. That control signal is handled as audio, and it could even *be* audio if you ran it to some speakers, but it's used here as a control signal. No difference in analog circuitry - it's all voltages and currents there. And no difference in digital processing either - it's all streams of numbers there. Some systems try to reduce confusion by labeling things differently and not allowing certain connections, but that's just arbitrarily limiting.

Anyway, you run the raw mic both to the compressor's main input and to the noise suppressor's input. (see my diagrams again) The noise suppressor's output then feeds the same compressor's side-chain. Now, when the noise suppressor does not detect a voice, its output is quiet, and so the compressor stays open, which allows the raw sound through. When the noise suppressor does detect a voice, its output is loud (relatively speaking), and so the compressor clamps down, which makes that channel quiet.

Then to avoid the entire output dropping out, you do the same thing independently for several mics that are spread out enough that you're not going to lose *all* of them from the same conversation. One noise suppressor and one compressor dedicated to each. Then mix the outputs of those compressors together and either take that as the final output, or lightly compress the mix normally (no explicit side-chain, or side-chain connected to the main input) so that the overall level doesn't fluctuate as the individual ones drop out and come back. The more mics you have (mic *locations*, more accurately), each with its own dedicated processing, the less the level will fluctuate and the less bus compression you'll need, and the more effective you'll be at ignoring conversations anyway.

---

(*) Most analog compressors have a "side-chain insert" jack, which feeds a copy of the main input back out to some other processing (which saves you an explicit splitter), and returns the result of that to the detector alone. This allows you to modify how the compressor behaves without affecting the signal that it passes. An EQ boost in the side-chain, for example, can make it extra sensitive to that range of frequencies, without actually changing the sound except for the compressor's eagerness to drop out when that range becomes active.
 

jebba

Member
Ok, thanks for the clarification. This is quite different from doing it with a NN VAD.

I have two microphone channels running into Ardour, each with their own mono bus. For testing, I have a radio playing talk radio, getting picked up by the microphones. I/OBS am hearing nature and human voices. I created two additional busses, one for each channel, that has the "noise-repellent" plugin running. It has more or less "clean" voice, with no background. I then created two more busses. These have compressors. I am using the Dyson Compressor, not sure which is best. I can see how to add a side channel, where I feed it the output of the clean voice bus. I also can feed it the main "unclean" feed, but I'm a bit perplexed how the sidechannel and main channel plug into the compressor.

The master out of Ardour then goes to OBS where it streams on the LAN.
 

AaronD

Active Member
I'm a bit perplexed how the sidechannel and main channel plug into the compressor.
The main input works like a normal plugin, and then you do this for the side-chain:

Right-click the plugin, then "Pin Connections...":
1695155772042.png

1695155839270.png

This particular plugin actually defines 4 inputs, and that's all that the host knows. It could just as easily be quadraphonic. The first N channels connect to the channel strip by default, where N is the channel count of that strip. In the case of this side-chainable stereo compressor, the first two are treated internally as the main input, and the last two as the side-chain.

What Ardour calls "Sidechain" is simply a way to make signals available here, to route to the otherwise unused inputs of the plugin. In this case, it really is a side-chain, but nothing says it has to be.
 

khaver

Member
There are VST denoiser plugins that can analyze the noise (in your case, the nature sounds) and remove these sounds but let vocals through, but some also have a switch to let you only hear the noise being removed. You would use this switch all the time so the voices would be removed. Look for Blue Labs Denoiser.
 

AaronD

Active Member
There are VST denoiser plugins that can analyze the noise (in your case, the nature sounds) and remove these sounds but let vocals through, but some also have a switch to let you only hear the noise being removed. You would use this switch all the time so the voices would be removed. Look for Blue Labs Denoiser.
Back to the (surprisingly hard) problem of defining what "noise" is, the noise being removed often has a significant amount of the wanted signal too. It's not supposed to, but nothing is perfect. For the ones that I tried, it was easily enough to still be intelligible. Otherwise yes, that *would* be a viable option.
 

jebba

Member
I haven't completed the compressor setup yet, but I did make some progress on identification, piecing together some bits from BirdNet. It just has birds (for now), but other models could be trained in the future.

This isn't connected to OBS at all yet, but it does work on .wav files. I'd like to get a streaming version running.

 
Top