Activating layers with my microphone

vxcheg

New Member
Good time everyone

I have several GIF animations (layers). This is a streaming avatar. I animate myself and plan a lot of words from a prepared animation gif. I need my microphone (when I say something) to activate a specific layer (gif animation of a talking avatar). Only when I speak. I think the essence is clear, I'm sure there is a solution (inside the OBS), but I can't find it.

Now I have a script that includes only one layer in the folder, for convenient animation switching with one button (I have it on the numeric keypad)

upd: In addition, I would like the plugin (and script) to determine my voice. And did not capture the noise.
 

Attachments

  • 2023-06-17 16-43-43 OBS 28.1.2 (64-bit, windows) - Профиль  vx - Сцены  wonderplay OBS.jpg
    2023-06-17 16-43-43 OBS 28.1.2 (64-bit, windows) - Профиль vx - Сцены wonderplay OBS.jpg
    10.6 KB · Views: 73

AaronD

Active Member
You want to analyze the audio signal and (correctly) choose one of those 6 layers to show based on the audio signal alone? The only thing I know of that is that good at analyzing audio is speech recognition. And it's heavily optimized for speech-to-text. Can't do anything else. Probably a lot of research and hard-coding for that one application (and more recently, AI training, which also can't be reused), that would have to be repeated for a different application.

If anyone else knows of a general-purpose version of that concept, I'd be interested too!

---

Originally, before seeing your list of layers, I thought you had animated a bunch of words and wanted your avatar to lip-sync based on that. That *might* be in the realm of possibility, if you could animate enough words to cover the vast majority of what you're likely to say, and then give it the output of a (good!) speech-recognition program. You'd have to delay everything though, to match the time it takes for the speech-recognition to figure it out. It'd probably be recognizable, but still rough unless you could animate and detect phrases and sentences instead, which is even *more* animation work and more live delay.

Or, you could use the direct lip-sync and other tracking that some VR rigs have now, and render the entire (rigged) model in real-time with those inputs.
 

vxcheg

New Member
You want to analyze the audio signal and (correctly) choose one of those 6 layers to show based on the audio signal alone? The only thing I know of that is that good at analyzing audio is speech recognition. And it's heavily optimized for speech-to-text. Can't do anything else. Probably a lot of research and hard-coding for that one application (and more recently, AI training, which also can't be reused), that would have to be repeated for a different application.

If anyone else knows of a general-purpose version of that concept, I'd be interested too!

---

Originally, before seeing your list of layers, I thought you had animated a bunch of words and wanted your avatar to lip-sync based on that. That *might* be in the realm of possibility, if you could animate enough words to cover the vast majority of what you're likely to say, and then give it the output of a (good!) speech-recognition program. You'd have to delay everything though, to match the time it takes for the speech-recognition to figure it out. It'd probably be recognizable, but still rough unless you could animate and detect phrases and sentences instead, which is even *more* animation work and more live delay.

Or, you could use the direct lip-sync and other tracking that some VR rigs have now, and render the entire (rigged) model in real-time with those inputs.
Thanks

The described options look complicated and it seems I did not fully understand. Synchronization with words is not needed. I would like to do it on a simple level like that.

When the microphone is active (the lower noise threshold is cut off), it automatically activates and shows the layer I need (looped animation of the talking mouth).
When the microphone is not active (for example, less than -20 dB on the volume scale), then this layer is hidden.
It is necessary that the layer is shown only when I speak.
 

AaronD

Active Member
When the microphone is active (the lower noise threshold is cut off), it automatically activates and shows the layer I need (looped animation of the talking mouth).
When the microphone is not active (for example, less than -20 dB on the volume scale), then this layer is hidden.
It is necessary that the layer is shown only when I speak.
Ah! Okay. So you really only need two states, possibly with some hysteresis between them to avoid it flipping back and forth with a constant audio level. The Advanced Scene Switcher plugin can do that!
1687129023390.png

1687129123627.png

Of course, you can tweak that however you need. I'd consider the "only on change" checkbox to be important; that keeps it from running constantly while the condition remains true.
 

vxcheg

New Member
Ah! Okay. So you really only need two states, possibly with some hysteresis between them to avoid it flipping back and forth with a constant audio level. The Advanced Scene Switcher plugin can do that!
View attachment 95164
View attachment 95166
Of course, you can tweak that however you need. I'd consider the "only on change" checkbox to be important; that keeps it from running constantly while the condition remains true.
This is wonderful! I managed to set it up the way I wanted. Thank you very much for the prompt advice.
 
Top