Can you explain this one to me? Are you saying your primary GPU has reduced performance due it its slot dropping to 8x due to the second card?
Yes, if you have one GPU, it is driven by x16 pci-express bus speed. But if you insert a second one, both cards are driven with x8 speed. Consumer CPUs aren't designed to output 2 cards simultanously with x16 speed. They split the data transfer to 2x8, one x8 for each separate card. If you have 3 GPU cards, you get 1x8 and 2x4. (The Intel iGPU doesn't count for this, by the way)
On consumer hardware, the main purpose of a GPU is to create, compose and output frames for some graphics application. The graphics assets required for this are loaded by the CPU from HD into CPU memory through the sata interface, then uploaded through the pci-express to GPU memory, then processed by the GPU to fill the frame buffer with the final image, then output to a monitor through hdmi or whatever connector.
The pci-express bus is used for uploading graphics assets into GPU memory, and this stresses the pci-e bus only moderately, because once uploaded, the graphics assets are cached in GPU memory and not much transferred again for the lifetime of one game map/loading screen.
In this scenario, pci-e speed of x16 or x8 isn't important, because the slower transfer time for the graphics assets is either hidden behind game loading screens or not seen at all. Because of this, SLI can work (SLI divides the maximum bandwidth of 1x16 of consumer CPUs into 2x8 for 2 GPU cards).
Now there is OBS, which does frame processing after the frame is rendered by the GPU. It's best feature is to take several video sources, filter, rescale, crop and finallly composite them, and output the composited video to a stream.
This compositing is taking place on the GPU. OBS creates a hidden frame buffer and merges all sources into this hidden frame buffer. This is done on the GPU, because the GPU is extremely efficient in doing this - in contrast to the CPU. Because of this efficiency, even medium-end laptops can be used for streaming.
The best use of resources is if the frame compositing is taking place on the same GPU where the largest frame source is. Usually, this is the game you're game-capturing. OBS just takes the frame buffer of the game and can use it directly as source for the video. No copying of data required.
If you want to offload video compositing to a second GPU, OBS has to copy the captured frame data from one GPU memory to the other GPU memory. This is a download and upload for every frame, continuously. This is huge stress for the pci-express bus, and suddenly the reduced bandwidth (1x8) becomes relevant and may constitute a bottleneck.
Every hardware scenario where the original frames of the game you capture must be transferred through the pci-express bus to a GPU for OBS to composite the stream puts more load to the pci-express bus than envisioned by the hardware designers. This includes capture cards, because although capturing is free (you read it invisibly from hdmi), the data must still be uploaded to the GPU to enable OBS to process it further. Its pci-express bandwidth requirement is half of the dual GPU scenario where compositing is taking place on a different GPU than frame production. But the ideal scenario is game capture in a single GPU system: the game data doesn't have to be transferred at all, because it is created where it can be directly be used.
I didn't speak of GPU load, I spoke about pci-express bus load - which is unfortunately not displayed by any task manager. But it's still there, and if you do things not envisioned by the hardware designers of consumer PCs (downloading stuff from GPU frame buffers), you might get to the limits of a resource deemed unlimited for regular use.