You can't run a line from a headphone jack to a microphone input, those are at different levels. You'll get low or no audio.
There's no connection between whatever arrangement or duplication of window capture sources you want (to change SBS to some other arrangement) and the audio device(s) you select for capture. Window and Display captures don't include audio.
As long as your machine is powerful enough to do it, you can copy a source as a duplicate and crop it differently if you need to do so for your arrangement. Your audio will not be affected, as those are separate sources. Your audio and video may desynchronize if duplicating the source makes the scene too complicated for your machine to run without dropping frames.