Question / Help For video encoding: More cores or more Ghz?

Videophile

Elgato
Simple question:

Is it better to have more cores? (like 24 cores at 2.4Ghz) vs more ghz? (8 cores at 4.2Ghz)? in terms of video encoding.

Thanks,

-Shrimp
 

paibox

heros in an halfshel
In the case of x264, they do not recommend using more than 22 threads, since it's currently not possible to stay efficient when trying to sync up that many threads.

Thus 24 cores wouldn't give you the full benefit of the theoretically available processing power, and if those are 8 cores with two threads each, the 8 cores would definitely give you better performance.
 

FerretBomb

Active Member
Case 1: 2.1GHz * 22 = 46.2
Case 2: 2.5GHz * 16 = 40

Theoretically, the first case would give you more raw processing power as video encoding CAN be distributed among multiple cores. Additionally, the two unused cores could be used for system processes (OS-level, other on-system tasks) while those would bite into the processing power offered by the latter. Conversely though, I don't know what kind of overhead is incurred by adding another processing thread... I doubt that it would be enough to eat up the gains though (and additional overhead cut away by having two idle threads). Also, if you were running any single-thread processes, they'd run better on the second case.

If you're going all-out on an encoding-only box between a 2.1 and 2.5GHz setup, I'd go for case 1.
If you had a dual CPU quad core+HT@4.2GHz, I'd go for case 2 (as in any case it'd provide more overall processing power at 4.2*16 =67.2, not factoring in the SMP losses)
 

Boildown

Active Member
There's another recommendation of 40 vertical pixels per thread: http://mewiki.project357.com/wiki/X264_Settings#threads . Really only a factor, upon doing the math, if you stream at 864p or smaller.

And remember that x264 opens 1.5 threads by default per virtual core, so on your 12-core, 24-with hyperthreading, its going to open 36 threads unless you use the "threads=x" command to hold it back.

With my 2600k, it wants to open 12 threads, but via testing, when doing one OBS session, I find that 9 or 10 threads yields the highest performance, higher than letting it open all 12.

Anyways, based on all that, I'd go with the smaller number of cores, as it "fits" better when multiplied out, and you get the added speed.
 

FerretBomb

Active Member
Boil, the multiplication up there was holding back to only h264 efficiency levels, and using the theoretical.

2.1GHz@24threads = 50.4GHz (some of which will not be able to be used for encoding, which will be dedicated to OS utilization, freeing up the rest entirely for encoding)
2.5GHz@16threads = 40GHz (The actual 'lower cores, higher GHz' processor he's looking at, 4/5 the performance for multithreaded apps, but will have OS-level operations cut into that further)
4.2GHz@16threads = 67.2GHz (Initial 'spitball' which he is not actually considering, was just thrown out for the purposes of a somewhat random comparison)

Of the three, I'd still go for the 2.1@24, it allows for the most processing power to be devoted to encoding with over 4GHz set aside for the OS operations. The only case in which I'd go for the 2.5@16 is if it was going to be a single-system gaming and encoding rig, AND planning to play CPU intensive games that are not optimized to take advantage of multithreading.
For a standalone encoder, the 2.1@24 stands out as the winner hands-down in my mind, as compared to the 2.5@16. The (theoretical) 4.2@16 would blow both out of the water of course, if it was an actual option and not just for the sake of argument.
 

Videophile

Elgato
Boildown, thanks for that.

I think there should be a sticky somewhere detailing the better performance when using 10 threads via "threads=10" vs letting OBS do its own thing. I tried the "threads=10" and yes, per-core-usage goes up, but the encoder is faster and more efficient. I guess scaling after a certain number of cores is not linear.

That being said, I think ill go for a Either the 2.5Ghz@16threads, or a 4930k(12 cores) and OC it a bit.

Ferret, any input on that?

Thanks,

-Shrimp
 

FerretBomb

Active Member
I'd still go for the 24-core setup myself.
The CPU Boildown is using is a quad-core with hyperthreading (so 8 virtual cores), which is why it tries to open 12 (1.5x virt-cores), apparently overloading the system.
It's not necessarily a non-linear scaling, but that he doesn't HAVE 12 cores available, so the threads will be 'fighting' to use each core, potentially being deallocated and swapped between cores, which I'd expect to cause a noticeable performance drop. I'd also expect that only using 8 threads might run into a point where other processes were grabbing CPU time away, leading to lower performance (especially as ALL CORES will be in use by the encoder, and the OS still needs processing time for overhead), leading to the 10-core compromise.

With a 24-core system, it would attempt to open 36 threads. If you manually limited it to 22 threads (the max recommended for efficiency) for 24 cores, you'd likely get a majority of that 46GHz-equivalent processing power (minus SMP losses), with two cores available for the OS to shove its processes on. Whereas on the 16-core you'd run into the same issue with the OS fighting for CPU cycles with the encoder.

Again, if it was my own money I was spending, I'd go for the 24-core.
Even with the additional synchronization overhead incurred, it's not going to cause enough overhead to eliminate the 12% greater sum processing speed, and that's without factoring in the loss-prevention of having two dedicated, encoder-untouched cores all for the OS that aren't even counted toward that 12% figure... it'd be a 20% increase, if you did count those.

Spitballing, I'd expect between 10-18% more performance from the 24-core system, before any overclocking, depending on how much overhead is generated by sync'ing the additional threads, and how much of that is offset by not having the OS fighting with the encoder for CPU time.
Additionally, I'd expect a multiplicatively greater gain from any overclock on the 24-core system (every .1GHz you push it higher becomes an effective extra 2.4GHz instead of only 1.6GHz with the 16-core, or 2.2GHz counting the 22-thread limit); it also tends to be easier to overclock a part that starts at a lower operating frequency, too, so it'd be more likely to be able to handle a larger relative OC and gain.
 

Videophile

Elgato
Alright, thanks for the answers guys!

I think, coming from my own research as well, that I will go with the 24 core setup. I will be cooling each CPU with an H80i....so OC'ing shouldnt be a problem(If its oossible to OC)

-Shrimp
 

Boildown

Active Member
I hope you post some performance metrics! It'll be fascinating to see how it performs with differing amounts of threads active, where diminishing returns kicks in, etc.

I take it you're using an Elgato device for your video source... which one? And which motherboard are you going with?
 

Videophile

Elgato
Haha, you would expect me to use a Game Capture HD. For streaming im actually using the USB 3.0 Xcapture-1.

Once I acquire all the parts, and do some testing on my own, I will probably right a guide. It is true that setting lower thread amounts artificially in OBS has made performance differences.
 

TheOne647

New Member
I have an 8 core 2138 gamer ultra desktop. the current one on the market is wrong from the one i have since my mobo is a their previous version since i got it from new egg a few years ago but i am wondering if i am really unlocking the full potential of my cores if its only running at 3.1 ghz. :/ I would love some help or information on understanding better please. I also have an Dual x R9 280 graphics card and 3.1 ghz is more than enough to run games but i am wondering if i am really using the pc's full potential.
 

alpinlol

Active Member
technically the fx 8xxx series is also an quadcore with moduled threads from amd instead of the hyperthread on intels side

and the fx8xxx series also performs worse than i7's

as of right now and intels latest releases theres the i7 5820k with a shitton of cache and 6 hyperthreaded cores (12 threads) for 50 bucks more than a 4790k
 

Sphinctone1

New Member
An old(er) thread,but still a very relevant and interesting topic,as I can seem to locate too much info on exactly how many cores(threads) is "maxed out" for the current versions of .264(and/or .265)!
On a side not,would also be interested in knowing if there would be any performance gains at all using a new(er) M.2 or PCIe SSD as the main system drive? (no bottleneck in read/write speeds might offer some small gains in shortening encoding times,no?)
 

Boildown

Active Member
I don't think there's any bottleneck at all on the hard drive side. Live encoding bitrates are much less than the sustained write speed of even low-end laptop platter hard drives.
 

FerretBomb

Active Member
No change in the thread efficiency/rate of reducing returns since this thread, as far as I'm aware.
Hard drive isn't a bottleneck, even a spinning platter drive has more than enough; even a fairly economy drive will be able to write at 60MB/s (480mbps, ~480000kbps) and recording rates even for uncompressed video is going to be under that without going to something (currently) ridiculous like 4K@60fps.

Even less of an issue as if you're going significantly parallel, you're going to be compressing the video down greatly.
The uncompressed video frames are stored in VRAM; it doesn't touch the hard drive prior to compression.

.265 probably won't see real-time use within the next 5-10 years, even if it's the current buzzword.
 

Boildown

Active Member
HEVC (H.265) will be a lot sooner than that for commercial video. Live stream of sporting events, etc. Hobbiests live streaming HEVC to Twitch and YouTube may be a long time though.
 

Sphinctone1

New Member
FerretBomb, just curious if you're aware they have announced Bluray releases in native 4k are supposed to out sometime in the first quarter of 2016! (some even claiming X-mas 2015) and the new BDUHD players around New Year as well.
Many 4k TV's can easily do 4k@60fps and for less than $1500(U.S.),and most if not all the new 170 series MB's(1151) can also display at that level without a graphics card of any kind.
The reason I brought any of this up is because I'm planning on building a new "Encoding" machine. I'm still running an i3 (2.8 GHz) which has served me well(ish) for .264,but I wouldn't attempt .265 on it.
So far it's looking like a i7 "skylake" set up, but an older model "workstation" with dual Xeons can be had relatively cheap too,so my reading/searching continues.
 

FerretBomb

Active Member
A majority of the 4K panels I've seen top out at 30fps, with a few doing 30->60 interpolation. I'm aware that some can do 'true' 4K@60, but they're prohibitively expensive and this thread is less aimed at that, and more toward CPU efficiency. I was simply using full-frame uncompressed 4K@60 video as a worst-case example for disk recording. :b

Skylake is pointed more at consumer/gaming grade setup. It's going to perform better in games, which tend toward single-thread performance almost entirely.
For real-time encoding which can more fully utilize separate threads, a single 4-8 core Skylake machine at 4-5GHz isn't going to be able to keep up with a dual Xeon 10-core at 2.5GHz, when it comes down to brass tacks. Heck, even a 6-core 5820k at 3.3GHz is likely going to outperform, much less a 5960x when put up against a 6700k, if we want to keep things to present-day releases.

Honestly trying to make a decision on this myself, if I want to pull the trigger on a 5820, 5960, or whole-hog it and jump to SMP-Xeon. I'd looked at Skylake and discarded it for streaming purposes.

To the question raised though which revived the thread from a year ago, disk read/write speeds are almost never the bottleneck when it comes to either livestreaming or locally recording.
 
Top