Ok, so there's the post processing. If you're willing to attempt to hit 120fps and actually do that editing chain, then that's your own choice.
The reason you're encountering freezes/stutters in your recording is because of what's called "encoding lag". OBS is a realtime encoder. This means that all of the actions for each frame of processing need to take place within the time limits of the framerate that is being output.
For 60fps, this gives a 16ms window. For 120fps, this goes down to an 8ms window.
There's several steps in this chain. You can see the overall report of this process at the tail end of the logs:
Code:
01:40:45.587: obs_graphics_thread(8.33333 ms): min=0.084 ms, median=0.442 ms, max=666.15 ms, 99th percentile=2.29 ms, 99.7533% below 8.333 ms
01:40:45.587: ┣tick_sources: min=0.001 ms, median=0.019 ms, max=666.026 ms, 99th percentile=0.093 ms
01:40:45.587: ┣output_frame: min=0.066 ms, median=0.258 ms, max=144.565 ms, 99th percentile=1.509 ms
01:40:45.587: ┃ ┗gs_context(video->graphics): min=0.066 ms, median=0.257 ms, max=144.565 ms, 99th percentile=1.508 ms
01:40:45.587: ┃ ┣render_video: min=0.003 ms, median=0.219 ms, max=36.306 ms, 99th percentile=0.416 ms
01:40:45.587: ┃ ┃ ┣render_main_texture: min=0.002 ms, median=0.03 ms, max=6.624 ms, 99th percentile=0.101 ms
01:40:45.588: ┃ ┃ ┣render_convert_texture: min=0.01 ms, median=0.014 ms, max=0.189 ms, 99th percentile=0.028 ms, 0.654032 calls per parent call
01:40:45.588: ┃ ┃ ┗output_gpu_encoders: min=0 ms, median=0.033 ms, max=0.415 ms, 99th percentile=0.101 ms, 0.654032 calls per parent call
01:40:45.588: ┃ ┗gs_flush: min=0.015 ms, median=0.03 ms, max=144.541 ms, 99th percentile=1.415 ms
01:40:45.588: ┗render_displays: min=0 ms, median=0.15 ms, max=42.848 ms, 99th percentile=0.699 ms
Specifically, we're looking at the "render_video" subsection.
There's a rendering process where it grabs the frame from each source and composites them all together. There's a small conversion process after this to send to the encoder.
The encoder then takes this composited image and runs it through the nvenc encoding algorithm.
If the encoder doesn't get the next frame within the frame timing window since the last frame it received, then that frame is not encoded and results in a "frame lost due to rendering lag".
If the encoder takes longer than the timing window to complete its compression of the frame it's working on when it receives the frame from the rendering chain, then that frame is also not encoded, but results in a "frame lost due to encoding lag".
In your specific case, your encoder is not able to keep up with the rate at which it is expected to encode frames. This doesn't result in any kind of extra performance hit to your GPU -- your encoder is separate silicon to the rest of your GPU (this is why NVenc encoding is such a non-issue for the overall performance hit). Even if it were, it would not be working any harder if the rate that it's being asked to work exceeds its max output -- it has a limit, and cannot perform past that point.
One thing that you are asking your encoder to do is use "Psychovisual tuning". This is an extra feature outside of the possibilities of the core NVenc silicon, and it requires CUDA to run extra processing. This requires extra communication between the NVenc encoder and your cuda cores (which may or may not be used by your game), and adds extra processing time to the encoding pipeline, making it harder to hit the frame timing.
tl;dr Basically, it's a matter of being able to hit a strict time target. If the chain can't complete from start to finish within that time, you lose a frame. If it's bad enough, then you will lose several, resulting in stuttering or even worse, large chunks of video that don't exist because the frames can't get from start to finish fast enough for the recording framerate.