This is a very old thread to be sure, but for anyone who finds their way here, the best way to solve this issue, in my opinion, is to set OBS to record a canvas space of 3840X1080. You can place your webcam as a full sized object in one of the two halves and still get a full 1080p video of whatever you were recording along with the webcam (a lets play video for example). you can then open this , admittedly large file, in a video editor and cut the video in half, this will allow you to move your webcam footage around the other portion of the screen at will during editing, so that it does not obstruct anything important. Once you have cut the two pieces apart, you can resize your webcam footage as desired and after re-positioning it where you want it in your lets play footage, you can crop out the black space leaving you with a single 1080p video, but with what is essentially (for editing purposes) two video tracks, which should both be perfectly synchronized right away so that even if you trim the video (whose single video track links your two videos) you don't have to mess with trying to make sure they get re-synced together afterward.
At this level of encoding, you will put quite a strain on a single graphics card if you use it to encode your video as well as have it process the graphics for your game. Because many gamers put a premium on keeping their case cool, your best options are to either: Encode the video at the software level (uses a large chunk of CPU power, so you'll need a powerful CPU), or have a second, not necessarily super powerful, graphics card that is used only to encode the video (OBS does support using a different graphics card than the one doing video output as the encoder, you just have to fiddle with the encoding settings). If you use an NVIDIA GeForce GTX 1080 for gaming, for example, you could likely get away with having a 1050, or perhaps something even less powerful, doing the encode on the video, as this process is not nearly as strenuous on the card as drawing, and displaying a video game's graphics in their higher settings. Having said this, you would need something similar in power to your primary card, if you were recording the game footage in 4K, as there is much more data being input into the encoder at that resolution. and while the output video will be at 1080p, the original video was 4K, and the encoder has to compress and encode a lot more into that space than it usually would.
As a final option, which is what most of us had to do up to this point, you can record the "webcam" footage on a GoPro, or other camcorder with removable/transferable digital storage, and then edit them into a single saved video, doing this will require you to do your best to sync the two videos up with one another, which is very tricky to do. The best way to sync these two separate objects up is to have voice audio recorded as a separate track from the game audio on the game footage, then sync your voice up in the two videos and disable the vocal audio from whichever source sounds less appealing.