Like so many times... the correct answer is ... it depends
Your description doesn't indicate critical details
- is the other user and you both live? are you both in the same room/location connected to same PC, or ??
etc
but let me guess, the source video you plan to translate comes from another user in another location. right? if yes, then you can't do that with OBS alone. You have to get the video feed from the remote user into OBS, then stream that and your audio overlay
There are multiple 3rd party options for remote video. One was obs-ninja. NDI v5 now has a remote option. and there are others. depends on how much control you have of remote user, their technical sophistication, and more. Including, how professional do you want this to sound/look? as getting remote video feed over Internet, which does NOT have guaranteed audio/video jitter and latency means your results may (will) vary session by session