Bug Report OBS sometimes freezes when reloading script with Lua filter

wondible

New Member
I've got a weird case. I've had occasional freezes while trying to develop a script. I tried to push through it for a while because script-reload is not a common user operation, but it has gotten pretty annoying. I've tried to reduce a minimal example, but have not had great success. It seems to depend on existence of video_tick function as well as peculiar contents the script_properties functions; most changes beyond this point make the freeze less likely.

Load the attached script, add the Minimal Crash filter to a video source (I was using text_gdiplus), and then reload the script util OBS freezes. Often takes around 20 reloads in dedicated testing, can take many more.

Since OBS freezes and I have to kill it, there is no crash log.
 

Attachments

Built OBS and found deadlock in debugger. I compiled with OBS master 238df3da3a507571d248aef9932a7f45c234aa33

It a conflict between OBS sources_mutex and the individual script mutex, which occurs when the source has a video tick and a properties function which enumerates sources (at least, I have not tired to exhaustively search for causes and I did not isolate the property source enum when I was trying to get a smaller reproduction)

Graphics Thread:

tick_sources takes the sources_mutex, then when it hits a script defined source (filter in this case) takes the script mutex. It is stuck at an infinite mutex lock from obs_lua_source_video_tick.

Main Thread:

obs_lua_script_get_properties takes the script lock, and then script's properties function calls obs_enum_sources, which attempts to take the sources_mutex.

Now to see if I can find a way to work around this in my script.
 
Found another one, although I have removed video_tick in my script, it seems to have hit some more mutexes in the process of looking. I don't fully understand this one, but am reporting where things are stuck. I have switched back to the source for 21.1.0 because of audio source issues in master.

Graphics Thread:

Stuck on lua source definition_mutex in obs_lua_source_video_tick. Holding sources_mutex, but don't see a conflict there.

Main Thread:

Stuck on context->mutex in obs_context_data_remove. Also holding the lua source definition_mutex, script mutex, lua_source_def_mutex.

I see that graphics is stuck on the definition_mutex held by main, but I'm not clear why the main thread is stuck on the context mutex.
 
Found another, when loading OBS with a script with frontend callback, examining scenes/items on OBS_FRONTEND_EVENT_SCENE_CHANGED

Main Thread:

Attempting to get video_lock from obs_scene_find_source, holding script callback from frontend_event_callback

Graphics Thread:

Attempting to get script lock from obs_lua_source_get_width, holding video_mutex
 
I see that graphics is stuck on the definition_mutex held by main, but I'm not clear why the main thread is stuck on the context mutex.

Found another similar case during context initialization, and noticed that the context mutex is OBS sources_mutex, which explains the deadlock in both cases. Graphics thread is iterating sources and hits defintion mutex to see if the source might have a script tick method, main thread defining a source and tries to lock sources mutex while managing the context.
 
I think I found the context init/destroy problem. I was creating text sources for use in my custom source, and it sometimes causes problems in the source create/destroy events, where they would otherwise appear to belong. So for now I'm creating all text sources as-needed in render, and letting them leak afterwards instead of trying to clean up in destroy - it's better than a deadlock.

Otherwise I'm splitting my script into parts to try and minimize the chances that the same script lock will be involved, and in particular separate the video_tick enumeration applied to custom sources from things happening out of custom sources.
 
I think I found the context init/destroy problem. I was creating text sources for use in my custom source, and it sometimes causes problems in the source create/destroy events, where they would otherwise appear to belong. So for now I'm creating all text sources as-needed in render, and letting them leak afterwards instead of trying to clean up in destroy - it's better than a deadlock.

Otherwise I'm splitting my script into parts to try and minimize the chances that the same script lock will be involved, and in particular separate the video_tick enumeration applied to custom sources from things happening out of custom sources.

Thank you for outlining your issue and what steps you took to attempt to resolve. It's unfortunate the clean-up aspect is off-timed or mismanaging memory leading to deadlocks. Seems like you're trying to do the right thing by closing but that's led to raising the likelihood of crashing instead. Have the changes you implemented proven successful? What have been the adverse effects of leaving the memory unclosed, if any?
 
Back
Top