LocalVocal: Local Live Captions & Translation On-the-Go

LocalVocal: Local Live Captions & Translation On-the-Go v0.3.3

Supported Bit Versions
  1. 64-bit
Source Code URL
https://github.com/occ-ai/obs-localvocal
Minimum OBS Studio Version
29.0.0
Supported Platforms
  1. Windows
  2. Mac OS X
  3. Linux
LocalVocal plugin allows you to transcribe & translate speech into text locally on your machine in real time. ✅ No GPU required*, ✅ no cloud costs, ✅ no network and ✅ minimal lag! Privacy first - all data stays on your machine.
(* GPU acceleration via CUDA or AMD is supported!)

If this plugin has been valuable to you consider adding a ⭐ to the GH repo, rating it here on OBS, subscribing to my YouTube channel, and supporting my work: GitHub, Patreon or OpenCollective. Check out the Home for Open Source Content Creators AI.
Need help setting up? Contact live support https://discord.gg/J5RgzMmPqM

localvocal new.png

Do more with LocalVocal:

The plugin adds an Audio Filter - use it on a speech source (mic, video) to get a transcription. Send the captions to a Text Source to show on scene.

Current Features:
  • Transcribe audio to text in real time in 100 languages
  • Display captions on screen using text sources
  • Send captions to a .txt or .srt file (to read by external sources or video playback) with and without aggregation option
  • Sync'ed captions with OBS recording timestamps
  • Send captions on a RTMP stream to e.g. YouTube, Twitch
  • Bring your own Whisper model (any GGML)
  • Translate captions in real time to major languages (both Whisper built-in translation as well as NMT models with CTranslate2)
  • CUDA, OpenCL, Apple Arm64, AVX & SSE acceleration support
Roadmap:
  • More robust built-in translation options
  • Additional output options: .vtt, .ssa, .sub, etc.
  • Speaker diarization (detecting speakers in a multi-person audio stream)
Internally the plugin is running a neural network (OpenAI Whisper) locally to predict in real time the speech and provide captions.

It's using the Whisper.cpp project from ggerganov to run the Whisper network in a very efficient way on CPUs and GPUs. For translation it's using CTranslate2 and the M2M100 model.

If you use this plugin - let us know! We would love to feature your work/vids and showcase your success.

Check out our other plugins:
  • Background Removal removes background from webcam without a green screen.
  • Detect will detect and track >80 types of objects in real-time inside OBS
  • URL/API Source that allows fetching live data from an API and displaying it in OBS.
If you are a broadcasting company or service looking to integrate local AI technology into your pipelines - reach out to inquire about our enterprise services.
Author
royshilkrot
Downloads
17,399
Views
59,137
First release
Last update
Rating
4.43 star(s) 7 ratings

More resources from royshilkrot

Latest updates

  1. v0.3.3 Partial real-time Transcripts! new OBS, many bugfixes

    In this release: New simplified and streamlined filter UI, and properties refactoring File...
  2. v0.3.2 Improvements all around! Caption presentation, logs and bugfixes

    Lots of things going on in this busy release! Adding filter & replace option Improving...
  3. v0.3.1 - more models, fix timestamps

    In this release: Adding more whisper model options Only allowing English selection for English...

Latest reviews

I've been meaning to set up closed captions on my stream for ages, but never knew how to do it. It only took me about 5-minutes with LocalVocal. The default model is efficient enough to have no negative effect on my stream, while giving great closed captions. Thanks for making this, Roy!
Its a good plugin. It runs an AI Model in the background that process in real time the audio from the microphone and generate transcriptions to a label in OBS or to a file.

I'd like to have both options at the same time, but I guess that they are working on :)
Impossible to work
i7 4790 K and 32 RAM
royshilkrot
royshilkrot
I'm sorry this isn't working for you right away please reach out https://discord.gg/CJHr5zHXD3 and I will help you set it up
Does exactly what it says on the tin! An amazing tool.
This is brilliant. The fully-local implementation of speech-to-text already works very well.

I can't wait to see what transpires as this matures.
Easy to install and setup. Exactly what I needed
This will be huge once it gets a bunch of optimizations, whether on plugin's or Whisper's side.

You can use it for standard subtitle-related stuff, but since it can output to text files unlike other similar plugins, it can be also used with e.g. Advanced Scene Switcher as something that fuels it with voice commands.

For now, on medicore modern CPUs, it works well with tiny model (except that has troubles recognizing certain words and phrases) and base (better at recognizing, but CPU and response time struggle a bit more). For me personally it works best with CUDA version, so if your GPU is more free or better than CPU, I recommend compiling for that. Bigger models are not too usable.
Top