NVENC HEVC VBR settings

DayGeckoArt

Member
I've been trying to figure out optimal NVENC HEVC settings for screen recording so I've done some experiments over the past few days with the settings in this doc that Koala posted in the RGB thread https://gist.github.com/asus4/f5aef0f3f46fde198436da12f0332013

I've made some surprising discoveries and I have questions!

I'm testing with 4K drone video. My settings are 4K 30fps I444 with 40000bps bitrate, but the encoder ignores that number. Keyframe interval is 60. I have a Quadro T400 with the latest Nvenc chip.
Generally, nvenc hevc encoding uses about half of the CPU power of my 4 core E5-1620V2 Xeon. I figured out the usage by stopping recording and just letting the video play, and it would drop from 95% to half that. Also, encoder settings have a big impact on CPU usage, despite the popular belief that Nvenc is totally handled by the GPU.

From this forum I've learned the basic settings Rate Control Variable Bitrate and Constant Quality number
rc=vbr cq=##

## is a number between 0 and 51. 0 is Automatic and the other numbers are a scale where 1 is highest quality and 51 is lowest quality. From experimenting recording the same 4K video clip, I've figured out that 1-30 basically produce the same results. You get 20 megabits pretty much all the time, and if the frame is totally motionless, bitrate drops. The still frame bitrate is what the cq value affects, but even then, the difference is like 5Mbps vs 8Mbps. cq=1 to cq=30 basically CBR that will reduce bitrate if there really is nothing to record.

Around cq=33 is where bitrate starts to vary more, it goes between about 12Mbps and 16Mbps, occasionally jumping to 18mbps. The quality is barely distinguishable but there's higher CPU load so with my old 4 core I get some dropped frames.

At cq=35 it's 6Mbps to 9Mbps! A massive drop. But there are even more dropped frames . It still looks pretty good but is visually distinguishable from the higher quality settings.

At cq=40 you get 4-6Mbps which is starting to look pretty bad but would be OK for some content.

I also did a bit of testing with rc=vbr_hq and the cq= scale is totally different. At cq=1 you get 40Mbps and at cq=25 you get 20Mbps, but without much variation. At cq=30 there's more variation, between 13Mbps and 20Mbps. Overall rc=vbr_hq uses more CPU power and I get dropped frames just like with the higher vbr cq values.

So are these weird scales intentional? Are you supposed to get almost constant bitrates until you reach a certain threshold? Is there another setting besides rc=vbr and rc=vbr_hq that will allow more bitrate variation but without dropping frames? Is it worth trying rc=constqp?
 

koala

Active Member
Nvenc is "zero CPU" (actually: zero copy) only, if you use color format NV12. Only with this color format the encoder pulls the raw image data directly from the GPU frame buffer and the raw data never leaves GPU memory. With all other color formats, OBS needs to copy the frame buffer into CPU memory and copy it again to the input buffer of the encoder. This is quite huge for 4k material.

I cannot tell anything about your perceived difference between vbr and vbr_rc. I did research before posting to that other thread, and that posting contains my current knowledge. I mainly discovered the difference between rc=vbr cq=(value) and rc=constqp qp=(value), which was unknown to me before (where I recommended constqp, which I don't recommend any more). I also didn't consider any rc modes that are marked as deprecated in current ffmpeg binary builds. The _hq variants of cbr and vbr are not deprecated, but are variants I didn't explore yet.

To fully understand hevc_nvenc, you need to google and research yourself, and additionally make many test recordings with different settings and observe different behavior with quality and resulting bitrate at vbr.

For test recordings, I recommend to not use OBS for this. Instead, create a lossless test recording (with OBS) and use this as master for your tests. For actual testing, get the latest ffmpeg binary and use the lossless recording as input file. This way you can vary the parameters more easy than with obs. This also enables you to use the psnr filter of ffmpeg to get a measure how different the encoded video is to the original. This is far better than visually comparing two video.
 

DayGeckoArt

Member
Another thing I tried was rc-lookahead=30 and 60 and 120. I think it’s supposed to give the encoder a multi frame buffer to optimize encoding. but it dramatically increased Nvenc and CPU usage and dropped a lot of frames
 

koala

Active Member
My main drive for using hevc_nvenc is recording (not streaming) with my new monitor, which has a horizontal resolution of 5120 - not supported by h264_nvenc any more. I simply use the default color space and color format settings of OBS, then hevc_nvenc with rc=vbr and qp=24. That's resulting in quite small files given the resolution and indistinguishable quality. I don't currently intend to explore beyond these settings. It gets 5.5% CPU usage for OBS, that's sufficiently low.
 

DayGeckoArt

Member
Nvenc is "zero CPU" (actually: zero copy) only, if you use color format NV12. Only with this color format the encoder pulls the raw image data directly from the GPU frame buffer and the raw data never leaves GPU memory. With all other color formats, OBS needs to copy the frame buffer into CPU memory and copy it again to the input buffer of the encoder. This is quite huge for 4k material.

I cannot tell anything about your perceived difference between vbr and vbr_rc. I did research before posting to that other thread, and that posting contains my current knowledge. I mainly discovered the difference between rc=vbr cq=(value) and rc=constqp qp=(value), which was unknown to me before (where I recommended constqp, which I don't recommend any more). I also didn't consider any rc modes that are marked as deprecated in current ffmpeg binary builds. The _hq variants of cbr and vbr are not deprecated, but are variants I didn't explore yet.

To fully understand hevc_nvenc, you need to google and research yourself, and additionally make many test recordings with different settings and observe different behavior with quality and resulting bitrate at vbr.

For test recordings, I recommend to not use OBS for this. Instead, create a lossless test recording (with OBS) and use this as master for your tests. For actual testing, get the latest ffmpeg binary and use the lossless recording as input file. This way you can vary the parameters more easy than with obs. This also enables you to use the psnr filter of ffmpeg to get a measure how different the encoded video is to the original. This is far better than visually comparing two video.

Thanks, I will look into PSNR. I have used batch files to run FFMPEG to change framerates of timelapses so batch may be a good way to automate processing for testing.
I had no idea Nvenc was only zero CPU for NV12… that explains why the Nvidia replay feature is so effiicent
 

DayGeckoArt

Member
My main drive for using hevc_nvenc is recording (not streaming) with my new monitor, which has a horizontal resolution of 5120 - not supported by h264_nvenc any more. I simply use the default color space and color format settings of OBS, then hevc_nvenc with rc=vbr and qp=24. That's resulting in quite small files given the resolution and indistinguishable quality. I don't currently intend to explore beyond these settings. It gets 5.5% CPU usage for OBS, that's sufficiently low.
Oh I thought rc=vbr required cq=# but it also works with qp=# ?
 

koala

Active Member
vbr can be used without cq (and qp). Without cq, it will use some internal ffmpeg default, may be even some kind of variable default. But the bitrate will not exceed the upper limit (which is what you enter in the "bitrate" input field) in either case.

What will happen if you both use cq and qp, or if you use cq with a rate control different to constqp, is a mystery to me. I'm sure my computer will not explode by using this, but that's the only thing I'm sure about.
 

DayGeckoArt

Member
I found this thread that discussed qmin and qmax https://github.com/HandBrake/HandBrake/issues/2231
The guy says that qmin and qmax constrain cq so he set it to the same value. I tried it myself and I think this is the key!

I tried my drone video where the drone is sometimes still and sometimes moving.
rc=vbr cq=34 just makes a constant 20Mbps video and when paused drops to 6Mbps just like my previous tests.

rc=vbr cq=34 qmin=34 qmax=34 does exactly what we want! I saw 23Mbps in sections with more sky and 75Mbps with the camera pointed at more detailed features on the ground

I then tried rc=vbr cq=35 qmin=35 qmax=35 and it scaled really nicely down to 20-70Mbps. I think this is the setting I'll use for my recording
 

Attachments

  • 2021-12-30-21-55-18.jpg
    2021-12-30-21-55-18.jpg
    370 KB · Views: 127
  • 2021-12-30-21-55-49.jpg
    2021-12-30-21-55-49.jpg
    233.3 KB · Views: 128

DayGeckoArt

Member
I did some more VBR CQ QMIN QMAX testing using the lower bitrate video I had used before (about 20Mbps 4k).

The raw drone video I tested last night was over 100Mbps and with rc=vbr cq=35 qmin=35 qmax=35 got 20-70Mbps.
With the 20Mbps video I got 5-7Mbps!!!

So apparently with these settings the encoder is even smarter than I thought and adapts significantly to the source material.

Great, but the 5-7Mbps is also pretty low quality. The raw drone video had much more detail to start with, so it looked good encoded down to 20-70Mbps, but the 20Mbps encoded down to 5-7Mbps has noticeable artifacting

Then I tried rc=vbr cq=30 qmin=30 qmax=30 on the lower bitrate source and got 8-21 Mbps... Getting closer to indistinguishable quality but not quite there.

Then rc=vbr cq=25 qmin=25 qmax=25 produces 14-26 Mbps slightly visible quality reduction

Then rc=vbr cq=20 qmin=20 qmax=20 produces 22-67 Mbps to me this is indistinguishable

So for recording already compressed video, I think 20-25 vbr/cq/qmin/qmax is the sweet spot
 

DayGeckoArt

Member
rc=vbr cq=1 qmin=20 qmax=20 does the same thing as rc=vbr cq=20 qmin=20 qmax=20

The Qmin and Qmax values seem to override the cq. BUT if you leave out cq they have no effect, it's just like if you only put rc=vbr
 

DayGeckoArt

Member
I've gone a bit further. I tried different qmin and qmax... rc=vbr cq=20 qmin=20 qmax=25 on the compressed video and this yields a range of 15 to 40Mbps

I tried rc=vbr cq=20 qmin=20 qmax=25 on the 100Mbps drone video and got 100 to 160Mbps. Somehow it adapts really well to whatever you give it

Next was rc=vbr cq=20 qmin=20 qmax=35 on the 100Mbps drone video which surprisingly gave the same result as setting all to 35, about 20-70Mbps
 
Last edited:

DayGeckoArt

Member
rc=vbr cq=20 qmin=20 qmax=35 on the lower quality 20Mbps drone video was also surprising... 17 to 22Mbps. Very different from setting all to 35 and also different from setting all to 22. I can't make sense of what these numbers do
 

DayGeckoArt

Member
Now I get it, if you use qmin and qmax that the encoder doesn't like, it ignores them. 20-35 has the same effect as leaving them blank, you get solid 17-22Mbps which is basically CBR. qmin=20 qmax=25 shows variation 16-27 with some spikes to 45 when scenes change suddenly
 

DayGeckoArt

Member
Bananaman in that Github thread says that CONSTQP is better at any given bitrate so I decided to try that one. The scale is somewhat similar. Quality DOES seem to be better for a given bitrate but I'll have to do more testing. For now I've figured out that the closest equivalent to rc=vbr cq=1 qmin=20 qmax=25 is rc=constqp qp=24 or 25

I'm inclined to use rc=constqp constqp=## because it's easier to change settings and it seems to be better or equivalent. Plus it's documented, unlike qmin/qmax
1641177039272.png
 

koala

Active Member
I'm not sure what you're aiming at, but if you want more than just a listing of qp and corresponding file size, use the psnr filter to compare the raw video with each of the encoded videos and sort the resulting video list by psnr. This will get you an immediate hint which parameter combination will produce the best quality/size ratio.
Psnr is a measure for the difference between two videos. The smaller the psnr, the less the difference, thus the better the quality. You cannot compare video quality by just watching them.
 

DayGeckoArt

Member
I'm not sure what you're aiming at, but if you want more than just a listing of qp and corresponding file size, use the psnr filter to compare the raw video with each of the encoded videos and sort the resulting video list by psnr. This will get you an immediate hint which parameter combination will produce the best quality/size ratio.
Psnr is a measure for the difference between two videos. The smaller the psnr, the less the difference, thus the better the quality. You cannot compare video quality by just watching them.

I researched it and found there's a better way to compare created by Netflix called VMAF, it differs in that it's supposed to match perception better. But either PSMR or VMAF is a different process because the files would have to match up, IE a simple re-encode using FFMPEG command line which is different from recording the screen. But that's the next step I want to try, just to directly compare constqp and vbr for a given bitrate.

PSMR/VMAF doesn't make sense for what I've done so far because I've just been trying to figure out what these settings do and what makes sense for screen recording to be visually indistinguishable
 

koala

Active Member
As far as I remember, I recommended you make a reference video and use this reference as input for your encoding tests. So you have "the original" and the encoded version. That's the only thing what is required to use the psnr filter (or whatever other similarity filter is available from ffmpeg).
Create comparison:
ffmpeg -i "reference.avi" -i encoded.mp4 -filter_complex "psnr" -f null - 2> report.log
Then grep the last line from the log, that will look like this:
[Parsed_psnr_0 @ 0000024dec41b280] PSNR y:33.358496 u:39.906199 v:39.881322 average:34.662145 min:32.927402 max:37.659071

Use the "average" value for sorting. You need this, if you really want to identify what is "visually indistinguishable". You cannot rely on your eyes, you can only rely on comparison algorithms.

You can also use vmaf of course, but this is awfully slow.
 
Top