VAAPI and AMF encoding running at only ~8 fps with 35% GPU usage?

dgatwood

Member
I'm currently running a Mac Pro 16-core (current-generation) for OBS with a Radeon 580X GPU.

I gave up running macOS because the NDI framework currently has a bug where decoding even a single instance of 4K video stutters unusably in macOS, even though if I grab the H.264 stream from the camera using the embedded SDK, I can play 19 copies of the stream simultaneously in separate VLC instances. I have no idea if this is a NewTek bug or macOS bug, but I've seen identical performance on two different machines with wildly different specs (factor of 4 difference in CPU speed, completely different H.264 decoding hardware, etc.), so I'm guessing the former. It kind of looks like they're literally doing all the decoding on a couple of cores and are running them at 4 GHz continuously while leaving the rest of the chip idle. I filed a support request, but am forced to give up on macOS for now.

In search of an alternative that might work, I installed Linux onto an external (USB) SSD. Stream decoding is smooth as silk in Linux, though it uses an order of magnitude more CPU in Linux than it does in macOS (1.5% while not recording in macOS versus 11.4% or so in Linux), presumably because it's all being done on the CPU instead of being offloaded onto the T2 chip. But it is usable, without the random frame drops, and there's still loads of CPU to spare, so that part is an improvement.

The problem I'm encountering is with encoding. MPEG 2 seemed to work fine at up to 80 Mbps (I didn't try higher). Most of the ProRes settings also work, but holy disk space hog, Batman. H.264 and HEVC are... somewhat more problematic.

On macOS, I was using H.264 in high profile, CBR, with two-second keyframe interval, at medium quality, with 8 Mbps bitrate (and debating whether to crank that bitrate up). I'd like to match that in Linux if I can. Unfortunately, I had to move down by four full quality settings to "superfast" just to keep it from dropping frames during crossfades. I'm not sure if x264 was offloading the work onto the T2 chip or what (and if it was, I have no idea whether the "medium" quality really was comparable with "medium" quality using the software codec), but the difference in performance is huge.

So I was thinking I might get an NVIDIA GPU and do GPU-based encoding, but before I went to the trouble of paying an extra $200 above retail for a GPU from a scalper, I figured I'd see how well AMD hardware encoding worked. It did not work well.

I first tried vaapi. The frame rate *dropped* from about 25 fps to about 8 fps. Now bear in mind that the GPU should be doing nothing other than rendering the OBS previews on-screen and whatever color conversion it might be doing to show content on my monitor. It should basically be idle. All hardware acceleration on NDI decoding is turned off to ensure that the maximum amount of GPU horsepower was available for encoding.

Then I tried installing the official AMD drivers and using amf with a custom build of ffmpeg and OBS. Same speed. The GPU is only running at about 35% capacity according to radeontop, and nothing approaches 100% except for the shader clock speed. So the GPU should not be overloaded, yet encoding is still utterly unusable.

This chipset should should be able to handle H.264 encoding in real time at 60 fps or more. I have no idea how to debug the amf performance to figure out where the bottlenecks might be. (I haven't done anything in the Linux kernel in 20 years, and that was before those parts of the kernel even existed.) But this concerns me greatly.

I don't *think* the problem is the USB drive — doing a dd from /dev/random to a file clocks in at 157 MB/second, which is more than two orders of magnitude more than the 8Mbps data rate in question, so unless OBS is round-tripping through the disk or something nuts, that shouldn't be the cause of the encoder lag.

So I'm kind of running out of ideas to explain the poor performance with hardware encoding enabled.

My concern is this: if the AMD hardware acceleration is showing such a poor frame rate despite the GPU being mostly idle, I'm wondering if I'll burn several hundred dollars on an NVIDIA card and still end up with 10 fps. Is that a realistic possibility, or am I just being too paranoid here?

Are there any thread scheduler tuning changes I should make (Ubuntu 20.04) that might improve performance — specifically, anything that might help the fact that (assuming the CPU percentages in top are semi-truthful) OBS in Linux is starting to drop frames while using only about 13 or 14 cores out of 16 (or 32 if you count hyperthreading)? (I tried cpufreq-set, and can squeeze a little more out of it, but not enough to matter, so maybe this part isn't worth chasing.)

There's nothing that looks particularly interesting in the log files:
 

dgatwood

Member
Just to add a point here, the performance issue isn't with OBS. It's ffmpeg. I get appallingly bad encode speeds on the command line, too. But I figure I'm more likely to get useful ideas here than in an ffmpeg forum, if there even is one. :)
 

Tuna

Member
I experienced similar, but not that grave. I tried a 1080p60 recording but was able to get only about 30 fps. I tried with my stupid GStreamer plugin and it worked much better.

As seen here: https://www.youtube.com/watch?v=IQyJHrkFKKk

I never got around to investigate it, but yeah, it seems like the default FFMPEG/VAAPI encoder bottlenecks somewhere.

EDIT:
But note that in case of NVIDIA GPU you will not be using VAAPI but NVENC. I guess that is the more commonly used hardware and API, and from what I heard behaves quite nice.

P.S. The official AMD drivers are known to have caused issues in the past. Unless you have a specific reason to do so I would suggest to just use the regular drivers that ship with the Linux kernel.
 
Last edited:

dgatwood

Member
But note that in case of NVIDIA GPU you will not be using VAAPI but NVENC. I guess that is the more commonly used hardware and API, and from what I heard behaves quite nice.

Thanks. I wasn't sure if that went through similar code paths or not. Sounds like not.


P.S. The official AMD drivers are known to have caused issues in the past. Unless you have a specific reason to do so I would suggest to just use the regular drivers that ship with the Linux kernel.

That's what I did originally, but VAAPI was horribly slow, so I figured I'd give the amf encoder a shot, which AFAIK requires the non-free drivers. But since it was just as slow as VAAPI, that obviously wasn't a useful approach, so I'll roll them back. :D
 
Top