GPU crash mid-recording, could not restart

AaronD

Active Member
Where's the log file analyzer? I've been there before, but I can't find it now. I've restarted and closed OBS again since this problem, so even "Upload previous log" doesn't catch the relevant one, but I did find it in the config folder. As important as it is, can a direct link be added to the navigation bar?

Anyway:


I have a rig that uses two copies of OBS for video and a DAW for audio:
  • OBS Master feeds a remote meeting. It's pretty much the same as a live stream, except that the output is entirely through the Virtual Camera and the audio Monitor. No actual stream or recording. No mics either, as that's handled in the DAW.
  • OBS Slave window-captures the meeting to show to a local display, and records it. The reason for that instead of showing the meeting directly, is to allow a featured video from the Master to temporarily replace the meeting window. So there's a scene for each of those, with automation to follow the Master. The recording audio comes finished from the DAW; no other audio.
  • DAW handles the room mics, grabs whatever additional soundtrack the Master may have, handles the meeting send and return, and (separately) feeds the room speakers and the recording.
About 45 minutes into a 90-minute meeting, OBS Slave appeared to have a GPU crash:
10:17:20.751: [NVIDIA NVENC H.264 (FFmpeg) encoder: 'advanced_video_stream'] Encoding queue duration surpassed 5 seconds, terminating encoder
10:17:20.751: Error encoding with encoder 'advanced_video_stream'
10:17:20.800: [ffmpeg muxer: 'adv_file_output'] Output of file '/media/aaron/DATAPART1/2023-04-22_Sat_09-31-00.mkv' stopped
10:17:20.800: Output 'adv_file_output': stopping
10:17:20.800: Output 'adv_file_output': Total frames output: 83221
10:17:20.800: Output 'adv_file_output': Total drawn frames: 83258 (83407 attempted)
10:17:20.800: Output 'adv_file_output': Number of lagged frames due to rendering lag/stalls: 149 (0.2%)
10:17:20.803: ==== Recording Stop ================================================
10:17:21.135: libfdk_aac encoder destroyed
10:17:21.135: libfdk_aac encoder destroyed
10:17:21.135: libfdk_aac encoder destroyed
10:17:21.135: libfdk_aac encoder destroyed
It also had a popup message with a copy of that first line. I clicked OK in the popup, and Start Recording again, and it never got past "Starting Recording..." The stats window never showed any dataflow, and no additional file was created. The file that was there afterwards only covers from the start until the crash.
The fullscreen projector kept running through all of this, with the only issue being about 1 additional second of latency compared to what it was to start the meeting.

The Master log has nothing interesting in it, as far as I can tell, but I attached it anyway. The Slave is the one that crashed.

My specs, according to the log:
08:16:34.522: User enabled --multi flag and is now running multiple instances of OBS.
08:16:34.522: Command Line Arguments: --disable-updater --multi --studio-mode --profile Meeting_Slave --collection Meeting_Slave
08:16:34.522: Using EGL/X11
08:16:34.522: CPU Name: Intel(R) Core(TM) i7-4940MX CPU @ 3.10GHz
08:16:34.523: CPU Speed: 3180.823MHz
08:16:34.523: Physical Cores: 4, Logical Cores: 8
08:16:34.523: Physical Memory: 32009MB Total, 27364MB Free
08:16:34.523: Kernel Version: Linux 5.15.0-70-lowlatency
08:16:34.523: Distribution: "Ubuntu" "22.04"
08:16:34.523: Session Type: x11
08:16:34.523: Window System: X11.0, Vendor: The X.Org Foundation, Version: 1.21.1
08:16:34.524: Qt Version: 6.2.4 (runtime), 6.2.4 (compiled)
08:16:34.525: Portable mode: false
08:16:34.647: OBS 29.0.2 (linux)
08:16:34.647: ---------------------------------
08:16:34.705: ---------------------------------
08:16:34.705: audio settings reset:
08:16:34.705: samples per sec: 48000
08:16:34.705: speakers: 2
08:16:34.705: max buffering: 960 milliseconds
08:16:34.705: buffering type: dynamically increasing
08:16:34.715: ---------------------------------
08:16:34.715: Initializing OpenGL...
08:16:34.763: Loading up OpenGL on adapter NVIDIA Corporation Quadro K5100M/PCIe/SSE2
08:16:34.763: OpenGL loaded successfully, version 3.3.0 NVIDIA 470.182.03, shading language 3.30 NVIDIA via Cg compiler
08:16:34.791: ---------------------------------
08:16:34.791: video settings reset:
08:16:34.791: base resolution: 1920x1080
08:16:34.791: output resolution: 1920x1080
08:16:34.791: downscale filter: Lanczos
08:16:34.791: fps: 30/1
08:16:34.791: format: NV12
08:16:34.791: YUV mode: Rec. 709/Partial
08:16:34.791: NV12 texture support not available
08:16:34.791: P010 texture support not available
08:16:34.791: Audio monitoring device:
08:16:34.791: name: Monitor of JACK sink (PA_out_Record)
08:16:34.791: id: PA_out_Record.monitor
08:16:34.791: ---------------------------------

Any pointers why it crashed, and how to avoid it for future meetings?
 

Attachments

  • Master.txt
    18.8 KB · Views: 9
  • Slave.txt
    27.5 KB · Views: 12

AaronD

Active Member
Where's the log file analyzer? I've been there before, but I can't find it now. I've restarted and closed OBS again since this problem, so even "Upload previous log" doesn't catch the relevant one, but I did find it in the config folder. As important as it is, can a direct link be added to the navigation bar?
Thanks @Lawrence_SoCal for having it in your signature!
Now if only it were easy to find without that...

Anyway:


I set it up at home and let it run for a few hours. Ran perfectly. I was hoping to catch it with a verbose log, but no. It just worked.

So I ran the Slave log above through the Analyzer, and it said:

No critical issues

No warnings

0.2% Rendering Lag

Your GPU is maxed out and OBS can't render scenes fast enough. Running a game without vertical sync or a frame rate limiter will frequently cause performance issues with OBS because your GPU will be maxed out. OBS requires a little GPU to render your scene.

Enable Vsync or set a reasonable frame rate limit that your GPU can handle without hitting 100% usage.

If that's not enough you may also need to turn down some of the video quality options in the game. If you are experiencing issues in general while using OBS, your GPU may be overloaded for the settings you are trying to use.

Please check our guide for ideas why this may be happening, and steps you can take to correct it: GPU Overload Issues.
I figured 0.2% was just a handful of frames when it was still doing its startup initialization. Personally, I wouldn't start counting frames until the init was done, but I see this a lot and don't worry about it. Should I?

The GPU Overload Issues page also has some interesting points:
  • Check for Other Programs Using the GPU
    • I have two copies of OBS, and a DAW that also uses the GPU. I didn't think it was *that* intensive though!
  • Build Simpler Scenes
    • Yes in general, but I'm already about as simple as it gets: one video source per scene, some live and some files, plus some global audio.
  • The remaining points were about manually optimizing a game (which this isn't) or Windows specifically (hmm, I wonder why?)

Having just done the math now, 0.2% is a full second, on average, every 8 minutes or so. That *does* seem significant. It'd be nice, then, if the log would show *when* the dropped frames actually are. Maybe verbose does that?

The one verbose log that I have shows 1 dropped frame over 5 hours. Searching it for "frame" returns nothing interesting, and I'm *not* looking through 60MiB manually for something that *might* be it!

Encoder start error

An encoder failed to start. This could result in a bitrate stuck at 0 or OBS stuck on "Stopping Recording". Depending on your encoder, try updating your drivers. If you're using QSV, make sure your iGPU is enabled. If that still doesn't help, try switching to a different encoder in Settings -> Output.
Yes, it did. Good job! Can you tell me why??? Maybe if I knew that it was going to do this and had the verbose log to start with...?

That brings up an interesting question of software diagnostics: "Can/should it detect that something has gone wrong and force verbose on, even retrospectively (timestamps possibly out of order, but at least they're *there*!), so that intermittent non-reproducible problems have a better chance at being understood?"

I guess in the meantime I can *always* use the verbose option, just in case......except that the Analyzer doesn't like a 60MiB text file! I can get it down to 46KiB though, just by removing the repetitive "everything is normal" messages (that version is attached here), and then the Analyzer says it's all good...except that Your OBS version identifies itself as 'which', which cannot be parsed as a valid OBS version number. What!?!

The driver question immediately came to mind before seeing this report, but I'm already on the latest supported driver for this 2015-era GPU, which I installed manually because the graphical driver manager doesn't have it. (it does have one that was ancient even then) Other encoders are FFmpeg VAAPI H.264, which always throws an error immediately, and x264, which I really want to stay away from if at all possible!

Third-Party Plugins (3)

You have the following third-party plugins installed:
  • advanced-scene-switcher
  • source-copy
  • source-defaults
Yep. All of these are here on purpose, but I can see how this report would be useful for people who do anything and everything they happen to think of and don't remember any of it. ("How did I get that? *I* never installed it!" Uhh...yes you did.)



I think that's about all I can get out of the logs, at least until it fails with a verbose one. (if it ever does)

One more possibility might be thermal throttling??? I didn't think to look at the temps, but:
  • The fans on this laptop draw air from the bottom and push it out the back through some good-looking heatpipes and fins. (I've had the cover off a few times)
  • It was sitting directly on a solid plastic shelf when it failed, compared to a slightly elevated dock on an open plastic grid for the 5-hour test at home.
I guess it could have taken about 45 minutes to reach the throttling temperature, at which point it could no longer keep up with the encoding load. But if that's what happened, then what we see here is a pretty lousy way to report it! It would also explain the inability to restart immediately, but not (as an outside observer anyway) the indefinite hang on "Starting Recording..." Surely it didn't stay hot for 45 more minutes doing (almost) nothing!
 

Attachments

  • Slave.txt
    46.5 KB · Views: 10
Top