LocalVocal: Local Live Captions & Translation On-the-Go v0.2.3

royshilkrot · Aug 14, 2023

royshilkrot submitted a new resource:

LocalVocal - Live stream AI assistant - Real-time, local transcribe speech to captions - no GPU, no cloud costs, no network, no downtime!

LocalVocal live-streaming AI assistant plugin allows you to transcribe, locally on your machine, audio speech into text and perform various language processing functions on the text using AI / LLMs (Large Language Models). No GPU required, no cloud costs, no network and no downtime! Privacy first - all data stays on your machine.

Current Features:

Transcribe audio to text in real time in 100 languages

Display captions on screen using text sources

Roadmap...

Read more about this resource...

ContentDeveloper · Aug 15, 2023

Looks like a great tool!
Can’t wait to try it!

wvanleer · Aug 15, 2023

Hello,

I use several portable installations of OBS. Is it possible to create a zip file for plugin installation?

royshilkrot · Aug 15, 2023

wvanleer said:
Hello,

I use several portable installations of OBS. Is it possible to create a zip file for plugin installation?

Yes. It's already going to be included in the next version. I'll try to release within a day or two

Guuvita · Aug 17, 2023

OBS hard crashes the moment I add the filter to my mic. Using OBS 29.1.3 and 0.0.1 of the plugin

royshilkrot · Aug 17, 2023

Guuvita said:
OBS hard crashes the moment I add the filter to my mic. Using OBS 29.1.3 and 0.0.1 of the plugin

yes we're investigating a crash on Windows https://github.com/royshil/obs-localvocal/issues/2
i'll look into this asap to resolve it

LaughterOnWater · Aug 18, 2023

Tried it. Crashed obs. Also, Norton wouldn't allow part of the install:

Norton:
Category: Data Protector
Date & Time,Risk,Activity,Status,Recommended Action,Status,Program Path,Program Name,Date & Time,Action Observed,Target
8/18/2023 8:40:21 AM,High,Data Protector blocked a suspicious action by obs-localvocal-0.0.1-windows-x64-Installer.tmp,Action Blocked,No Action Required,Action Blocked,C:\Users\Chris\AppData\Local\Temp\is-F3AET.tmp\obs-localvocal-0.0.1-windows-x64-Installer.tmp,obs-localvocal-0.0.1-windows-x64-Installer.tmp,8/18/2023 8:40:21 AM,Suspicious process attempted to open a file protected by Data Protector,C:\ProgramData\Microsoft\Windows\Start Menu\Programs\obs-localvocal\obs-localvocal on the Web.url

I've added the crash report to github. https://github.com/royshil/obs-localvocal/issues/2#issuecomment-1683881228

xDainsleifx · Sep 1, 2023

Hi, how does this plug in work? i put a filter on the mic. but yeah nothing changes or is showing :3 would be nice for quick introduction how this is done. thx in advance o/

royshilkrot · Sep 1, 2023

xDainsleifx said:
Hi, how does this plug in work? i put a filter on the mic. but yeah nothing changes or is showing :3 would be nice for quick introduction how this is done. thx in advance o/

Yes I will shoot a tutorial video soon. I'm traveling at the moment

Alisizz · Sep 1, 2023

Hi @royshilkrot

I am a newbee OBS develop, and I have noticed that in another post you mentioned that：
i'm also thinking about real-time auto translation to other languages utilizing Speech-to-text -> Translation -> Text-to-speech

How can i achieved the following process: (quite similar with what you are going to do)

I was livestreaming and i need a auto audio reply to the text that my audience type in the chatbox, now i have collect the read time text, but how can i get the following process done, thanks for your help, really thanks!

Real time text (i got from a liverstreamer page and stored in my server) -> AI Models with prompt (like GPT, API i already build in server) to handle real time text and generate answers -> generated answers converted to Speech -> Speech was brodcast via OBS to the Auidence

royshilkrot · Sep 1, 2023

Alisizz said:
Hi @royshilkrot

I am a newbee OBS develop, and I have noticed that in another post you mentioned that：
i'm also thinking about real-time auto translation to other languages utilizing Speech-to-text -> Translation -> Text-to-speech

How can i achieved the following process: (quite similar with what you are going to do)

I was livestreaming and i need a auto audio reply to the text that my audience type in the chatbox, now i have collect the read time text, but how can i get the following process done, thanks for your help, really thanks!

Real time text (i got from a liverstreamer page and stored in my server) -> AI Models with prompt (like GPT, API i already build in server) to handle real time text and generate answers -> generated answers converted to Speech -> Speech was brodcast via OBS to the Auidence

Hi
This may be difficult to achieve in OBS internally. But text to speech models exist, like Bark. Right now they require a strong GPU. I was doing research to find a small model that can run in OBS, but didn't find one yet. It's a matter of time before it happens though. We need to be patient.
However if you're capable with coding you can make a local Python server that runs e.g. Bark model. It will need to generate an audio stream which you can pick up in OBS like RTMP... It's not a super easy task but possible. In python you could use gstreamer API to build the stream and push data to it from the speech to text engine.
That's what I'm currently thinking.

Alisizz · Sep 1, 2023

royshilkrot said:
Hi
This may be difficult to achieve in OBS internally. But text to speech models exist, like Bark. Right now they require a strong GPU. I was doing research to find a small model that can run in OBS, but didn't find one yet. It's a matter of time before it happens though. We need to be patient.
However if you're capable with coding you can make a local Python server that runs e.g. Bark model. It will need to generate an audio stream which you can pick up in OBS like RTMP... It's not a super easy task but possible. In python you could use gstreamer API to build the stream and push data to it from the speech to text engine.
That's what I'm currently thinking.

Great thanks for you reply, and i am trying to build this, let me try this. Great thanks to you

adamesek · Sep 3, 2023

royshilkrot said:
Yes I will shoot a tutorial video soon. I'm traveling at the moment

as above.
Video needed - installation, use on Mac

adamesek · Sep 4, 2023

adamesek said:
as above.
Video needed - installation, use on Mac

I have identical: https://github.com/royshil/obs-localvocal/issues/10
Where is the problem?

royshilkrot · Sep 12, 2023

royshilkrot updated LocalVocal - Live stream AI assistant with a new update entry:

v0.0.2 - stability and performance

This release bring stability improvements as well as performance boost on Windows.

What's Changed

Attempt fix crash on create / startup by @royshil in #4

Add file save output option by @royshil in #11

Add OpenBLAS acceleration on Windows by...

Read the rest of this update entry...

appa561 · Sep 14, 2023

Installation was pretty straightforward. Tiny seems to miss or incorrectly identify words easily. Base does much better. I'm curious how much impact each jump in Whisper model has on the system. The real reason I am hoping to use this plugin is for translation. When selecting models other than (Eng), the download fails.

Mostly, my use case would be my English speech to another language CC... I have an international audience in my Twitch stream. The Spanish speakers are the ones that struggle the most with the spoken word, so most of the time the CC would be in Spanish. I can foresee a need to do other languages, depending on who is in the majority.

BenAndo · Sep 14, 2023

appa561 said:
Installation was pretty straightforward. Tiny seems to miss or incorrectly identify words easily. Base does much better. I'm curious how much impact each jump in Whisper model has on the system. The real reason I am hoping to use this plugin is for translation. When selecting models other than (Eng), the download fails.

Mostly, my use case would be my English speech to another language CC... I have an international audience in my Twitch stream. The Spanish speakers are the ones that struggle the most with the spoken word, so most of the time the CC would be in Spanish. I can foresee a need to do other languages, depending on who is in the majority.

I had the same issue adding in other models. See this Github issue with instructions on how to manually add them in: https://github.com/royshil/obs-localvocal/issues/5

BenAndo · Sep 14, 2023

Is it possible for this to display words in real-time? It seems to wait for a sentence's worth of words before displaying them. I've tried all the settings I can think of but nothing seems to speed it up. As such, it's always a good 3-8 seconds behind what was said.
Perhaps when GPU support is added it'll be closer to real-time?

royshilkrot · Sep 14, 2023

BenAndo said:
Is it possible for this to display words in real-time? It seems to wait for a sentence's worth of words before displaying them. I've tried all the settings I can think of but nothing seems to speed it up. As such, it's always a good 3-8 seconds behind what was said.
Perhaps when GPU support is added it'll be closer to real-time?

Thanks for using the plugin!
I'll look at shorter time buffers and perhaps make it parametric so you have control. The minimum is 1 second though that's a whisper.cpp thing. Please open an issue for it so we keep track

Destroy666 · Sep 15, 2023

appa561 said:
I'm curious how much impact each jump in Whisper model has on the system.

While tiny -> base is still ok, base -> small can be tough. Small takes around 10 seconds of heavier CPU usage on fairly modern CPUs. There's also an option for GPU now, which worked better (but not perfectly) for me just when testing Whisper directly, but I'll have to test it with the plugin.

LocalVocal: Local Live Captions & Translation On-the-Go v0.2.3

Member

Member

New Member

Member

New Member

Member

New Member

New Member

Member

New Member

Member

New Member

New Member

New Member

Member

What's Changed​

New Member

Member

Member

Member

Member

What's Changed