Desperately trying everyting to fix these Missed Frames when switching to 4k sources

dcmouser

Member
I have been trying to solve this issue sporadically over many months, often spending hours messing with the OBS source code trying to identify at least the proximal location of this problem, with no luck. I've tried on different installs of OBS, with a clean set of only two scenes.

I initially thought the problem had to do with the capture card but having now tried with a different 4k capture card and find the problem persists, so something more fundamental is going wrong.

Here's the issue: When switching to a scene with a 4k source, after a few minutes of being away from that scene, there is a hiccup where it experiences about 10-20 missed frames (this is a 30fps stream), and dips into high frame rendering times. After this hiccup it is happy on the scene and is rendering frames in the 1ms range and holds steady there.

So it is trivially rendering the 4k sources easily, except that after it's been off the source for a few minutes and switches back to it, that it chokes for a half a second or so. (note that you can see the choke/hiccup regardless of what transition you use, including a cut).

It's particularly frustrating since the PC is having such an easy time rendering the scenes, it's not being taxed at all -- i could throw four 4k sources on the scene with no problem.

It's also completely reliably reproducible. To further add to the mystery, remember that if i transition between these scenes and sources with less than a minute or two delay, no hiccup at all.. It only happens if i've been off the scene for more than a minute or so.. WEIRD.

Like I said I have spend many hours in the source code trying to figure out where the problem is happening, looking into aynchronous video source caching, etc. but I cannot seem to figure out even WHERE the problem is happening. If I could figure out where maybe I could have some luck solving the problem.

I have tried every configuration I can think of regarding camera outputs, framerates, encoders, canvas sizes, and I've run logs through the obs log analyzers, and cannot figure out what is going on. The sources are not set to deactivate when not showing, and.

If anyone can help, I'm desperate to figure out this problem in an otherwise joyous OBS experience, and I'm happy to experiment with OBS source code if I can figure out some clue as to the nature of the problem.

Thank you in advance.

ps. Here is an earlier thread where I have been trying to figure out the problem thinking it might be specific to the BlackMagic decklink capture plugin: https://obsproject.com/forum/thread...agic-4k-decklink-plugin-capture-source.162687
 

dcmouser

Member
Attaching log file.
In case I was not clear -- this is happening even with no streaming or recording.
 

Attachments

  • 2023-03-27 21-45-20.txt
    16.4 KB · Views: 16

dcmouser

Member
I was asked to clarify what I meant by a "4k source": I mean a 4k video capture source (from a camera, a lumix gh5s). I have tried changing a wide variety of settings from the camera, from 24.98 to 60fps, with no change in the phenomena.

Mainly I use a blackmagic quad capture pci card and the blackmagic source type, but the problem is also exactly the same with a 4k elgato camlink 4k usb3 based capture device.

I should also say, the problem happens whether the canvas size is 1920x1080 or 4k.
 
Last edited:

dcmouser

Member
I've recorded a video of the problem so you might get a better idea for what it looks like in action:
 

dcmouser

Member
Another clue:
The problem behavior is EXACTLY the same if i simply toggle on and off the visibility of the 4k capture source in the scene:
If i toggle it off for 5-10 seconds then toggle it back on, no problem.. If I leave it off for a minute or so then toggle it back on, the hiccup occurs.

Note that the source is *not* set to "decativate when not showing".
 

dcmouser

Member
So I finally tracked down the nature of this bizarre problem and coded a workaround.. I will post a long write up when I catch my breath.. Pretty bizarre problem with an even more bizarre solution..
 

dcmouser

Member
OBS 4k Capture Stutter Story

This document explains a problem I first encountered with OBS as long as a year ago, and have been hunting for an explanation for, and solution to, since then, and describes its final resolution.

The problem:

I am running OBS on a beefy AMD Ryzen Threadripper 3960X 24-Core Processor 3.80 GHz with 128gb, with a high end 3090 graphics card, with ssds, etc.

My OBS setup has no problem recording and streaming multiple 4k capture sources (with a blackmagic quad hdmi, or camlink 4k usb capture dongle), with normal frame render times of 1-2 msg and 1% cpu, and can do so for hours with no missed or dropped frames.

EXCEPT… Except that if I switch away from a scene with a 4k capture source for more than a minute or so and then go back to it, there is a hiccup stuttering/hang of about 300ms, which causes a bunch (10-20) missed frames over the course of a second or so. It’s distracting when it happens, though OBS quickly resumes normal operation afterward. For my streaming, I frequently switch between some 4k sources (top-down camera) so this problem happen can happen many times per session.

You can see two threads I posted to the OBS forum about this problem in the past:

And a video demonstrating the problem here:

The clues:

Note that this stutter/hang happens even when not recording or streaming, is 100% repeatable, and does not happen if I only switch away from the 4k capture source scene for 30 seconds or so. It is unaffected by any settings on the capture sources (color format, etc.).

At first I thought that the problem was related to the blackmagic 4k capture plugin/drivers, but then confirmed that the problem occurred even when using a completely different 4k capture device.

Then I thought the problem was deep inside the functions for caching asynchronous video source frames.

Locating the source of the problem:

Eventually after many sessions I traced the final location of the 300ms hiccup to a series of function calls from obs-source.c upload_raw_frame(), which calls graphics.c gs_texture_set_image()..

My expectation was that at the end of my hunt I was going to find some kind of caching code in OBS that was letting go of a big 4k texture after some period of disuse..

The source of the problem is unexpected:

Instead what I found was that the 300ms time stall hiccup was eventually traced to the bare bones std library memcpy call in graphics.c inside gs_texture_set_image().

That is, the 300ms black hole was actually being caused by the copying of a large block (16-32mb) of memory between two locations in main memory. This was quite shocking to me -- it would not seem like even in worst case scenario there should be such a delay on copying, even if the pages of memory need to be brought into cache.

So at the end of the trail we have what appears to be an unreconcilable hardware problem. It may very well be that a different architecture (intel?) would not exhibit this behavior, but after some experimentation with some alternative memcpy implementations it seemed clear to me that this problem would not have an easy solution.

Searching for a solution:

Now, I should say that one possible solution to this problem might be to improve core OBS code so that the texture memory locations don’t move around in memory as much as they do. I don’t really understand the core OBS rendering functions well enough to figure out if such a thing is happening or fixable, but it would seem that that might be the ideal solution, which would mean that even switching between scenes would not move the target canvas texture memory out of system memory cache. But I don’t know the OBS source code well enough to try such a thing.

At this point in the story it seemed to me I had come to the end of the line -- with no way to improve an essentially atomic 300ms library function copying the giant block of texture data, which was unavoidably going to result in missed frames and a stutter.

A workaround idea:

Then I had an idea for a workaround kludge. If I could not reduce the time required to move the large block of texture memory, could I at least eliminate the hiccup hanging delay experienced by OBS? Perhaps I could do so by detecting when the occasional frame texture data was taking too long to move, and then aborting the rendering of that frame. The loss of an occasional video capture frame while transitioning between scenes might even be undetectable.

The workaround in action:

So instead of an “atomic” memcpy on 15-30mb of data which was occasionally hanging for 300ms, I replaced it with a function in graphics.c that loops and performs small memcpy chunks, and TIMES the memcpy operations. As soon as a small chunked copy takes more than a few milliseconds, the loop aborts the memcpy procedure, as it knows it is dealing with a system hardware memory caching problem.

This leaves the target texture in an unfinished state, so the next patch that is required is in obs-video.c for the video rendering function to realize that one or more sources failed to render. In such a case, what I would ideally like to do is have OBS simply REUSE the last valid frame generated. While I found no trivial way to do this, I added code to store a simple texture buffer of the last valid frame, so that on each frame rendering it either SAVES a rendered frame into this last-good-frame buffer, or REPLACES a failed rendered frame with the last good one.

The result is complete seamless elimination of all stuttering and missed frames -- it is not noticeable at all.

Generically detecting aborted source rendering for advanced plugins:

In addition to detecting aborted source frame rendering for display, I used the same detection of aborted internal source rendering in my ObsAutoZoom plugin which internally renders multiple 4k scenes to internal buffers and scans them for data. This plugin now also checks for aborted source renders and skips scanning and processing frames in those cases.

As a general feature, the new modifications allow any source to report that it temporarily FAILED to render itself and that the last good output frame of OBS should be reused. Note that this should only be used in emergencies (in most cases it would be preferable for the source itself to cache it’s last good output and reuse it if needed).

As a side effect, this stuttering fix also eliminates the need for some very kludgey workaround that I originally put in place in my obsAutoZoom plugin that would essentially keep all source cameras awake and in cache by touching them on every cycle; a similar processor-intensive fix was initially used by me to avoid the rendering stutter by using a Source Dock plugin to keep the 4k sources rendering on every frame even when not visible.

Summary of changes to obs core code:

Actual changes to OBS core code:
  • Graphics.c: Uses new chunked and timed memcpy replacement, sets global failedRender flag. (note that for minimal performance impact this replacement memcpy is ONLY ever used in the one function in graphics.c and only when moving huge blocks of 4k image textures).
  • Obs-video.c: Checks for global failedRender flag set by a render call in Graphics.c; updates and uses a global last-good-frame texture for failed renders.
  • Obs.c: Exports functions to access failedRender global flag for advanced plugins that need to detect when source renders fail; declares global failedRender flags.
  • Because the new code does add one additional output texture buffer copy every frame, and because the new memcpy function runs in a loop with elapsed time checks, there is a very minor impact on the rendering time for every frame -- this increased workload is negligible on my setup.

NOTE: I know that this fix is not appropriate for incorporating into official OBS release. I mentioned earlier that there may be a clear way to fix this problem for others by changing the way that OBS allocates memory for target rendering of textures for display and output so that the memory does not leave system memory cache. Or it may be that this is an issue only experienced by AMD or that some other bios/system setting can eliminate it.
 
Top