I'd still go for the 24-core setup myself.
The CPU Boildown is using is a quad-core with hyperthreading (so 8 virtual cores), which is why it tries to open 12 (1.5x virt-cores), apparently overloading the system.
It's not necessarily a non-linear scaling, but that he doesn't HAVE 12 cores available, so the threads will be 'fighting' to use each core, potentially being deallocated and swapped between cores, which I'd expect to cause a noticeable performance drop. I'd also expect that only using 8 threads might run into a point where other processes were grabbing CPU time away, leading to lower performance (especially as ALL CORES will be in use by the encoder, and the OS still needs processing time for overhead), leading to the 10-core compromise.
With a 24-core system, it would attempt to open 36 threads. If you manually limited it to 22 threads (the max recommended for efficiency) for 24 cores, you'd likely get a majority of that 46GHz-equivalent processing power (minus SMP losses), with two cores available for the OS to shove its processes on. Whereas on the 16-core you'd run into the same issue with the OS fighting for CPU cycles with the encoder.
Again, if it was my own money I was spending, I'd go for the 24-core.
Even with the additional synchronization overhead incurred, it's not going to cause enough overhead to eliminate the 12% greater sum processing speed, and that's without factoring in the loss-prevention of having two dedicated, encoder-untouched cores all for the OS that aren't even counted toward that 12% figure... it'd be a 20% increase, if you did count those.
Spitballing, I'd expect between 10-18% more performance from the 24-core system, before any overclocking, depending on how much overhead is generated by sync'ing the additional threads, and how much of that is offset by not having the OS fighting with the encoder for CPU time.
Additionally, I'd expect a multiplicatively greater gain from any overclock on the 24-core system (every .1GHz you push it higher becomes an effective extra 2.4GHz instead of only 1.6GHz with the 16-core, or 2.2GHz counting the 22-thread limit); it also tends to be easier to overclock a part that starts at a lower operating frequency, too, so it'd be more likely to be able to handle a larger relative OC and gain.