![]() |
![]() |
#1 |
"/X\(‘-‘)/X\"
Jan 2013
60238 Posts |
![]()
It seems Intel is going up to 18 cores on HEDT.
http://wccftech.com/intel-core-x-sky...ore-36-thread/ Still only 4 memory channels though, so hopefully it lowers the prices of the lower core count chips. |
![]() |
![]() |
![]() |
#2 |
∂2ω=0
Sep 2002
República de California
5×2,351 Posts |
![]()
Those sound like real monsters - the 18-core flagship i9 roughly the same as the soon-to-arrive Xeon Skylake server chips (which will power AWS' new C5 instances, among other things), but presumably at a much lower price point than a server CPU. (Albeit still very high relative to run-of-the-mill Skylake quads and such.)
Let's do an Intel vs AMD head-to-head-compare based on what both companies have revealed to date: o AVX-512 (in the i9s) vs AVX-256; o True AVX-512 support vs each AVX-256 instruction broken into two 128-bit uops; o Similar core counts and clock speeds. But I worry about the i9 memory subsystems' ability to keep those data-hungry vector units fed, even with those yuuge L2/3 caches. And the second bullet point above actually plays out less detrimentally for AMD than one might think, because breaking a wide vector-op into two half-width uops helps hide latency: E.g. say I have 8 independent AVX-256 vector MULs I need to do, assume 2 can start per cycle with a 5-cycle latency. Intel: 2 MULs start on each of clocks 0-3, but then we idle until cycle 5 waiting for the first results to become available. AMD: 2 half-width MULs start on each of clocks 0-7, and ensuing instructions can start using the early-issued-MUL results before the late-issued ones have even begun. I'm seeing this play out in my Mlucas runs on Ryzen, where I get better than 50% the per-cycle throughput as on my Haswell, i.e. better total throughput for the Ryzen 8-core than for the Intel quad. It'll be interesting to do similar head-to-head compares - not just of total throughput but also of FLOPS-per-watt-and-hardware-dollar - once both vendors' new CPUs hit market, that's for sure. |
![]() |
![]() |
![]() |
#3 |
Sep 2016
2·5·37 Posts |
![]()
Not to spoil anything, but there are conflicting rumors that the HEDT Skylake X processors will not have true AVX512 but rather double-cycled 256-bit execution units.
Of the leaked benchmarks that I've seen so far:
It's already known that not all the server Xeons will have full throughput AVX512. The question is which (if any) of the HEDT Skylakes will have it. If we assume that the AVX512 units take up a significant amount die area as well a lot of TDP, it makes sense for Intel to selectively disable them to improve yields. The resulting market segmentation probably plays in their favor if they want to milk people for more money to get the full AVX512. |
![]() |
![]() |
![]() |
#4 |
Feb 2016
UK
3·149 Posts |
![]()
http://www.anandtech.com/show/11464/...umers-for-1999
Interesting times, I was reading up at the link above but not fully digested yet. Anyone care to discuss what the new cache arrangement might mean for performance? 1MB/core L2 and 1.375MB/core non-inclusive L3 is quite a change. |
![]() |
![]() |
![]() |
#5 |
Feb 2016
UK
3·149 Posts |
![]()
Ian Cutress from Anandtech has confirmed with Intel that each core will have an AVX512 unit.
|
![]() |
![]() |
![]() |
#6 | ||
∂2ω=0
Sep 2002
República de California
5·2,351 Posts |
![]() Quote:
Quote:
o genuine AVX-512 [high-end i9] o emulated AVX-512 [low-end i9] o genuine AVX-256 [old and new i7] o emulated AVX-256 [AMD] Hopefully it won't be too long before we have actual i9 hardware to play on. |
||
![]() |
![]() |
![]() |
#7 | |
Sep 2016
2×5×37 Posts |
![]() Quote:
I have benchmarks from a 40-core Skylake Gold system. (Which I can't really disclose since the source doesn't even know if he's under NDA) Based on the small-data scaling, I'm about 90% sure that model has the full-throughput AVX512. However, the AVX2 -> AVX512 scaling for large-data is so hilariously bad that it makes Knights Landing look good. Part of the problem is likely due to the NUMA since the source said he has no access to the BIOS to enable node-interleaving nor did he mention anything about "numactl --interleave=all". Last fiddled with by Mysticial on 2017-05-30 at 23:10 |
|
![]() |
![]() |
![]() |
#8 | |
Sep 2016
2×5×37 Posts |
![]()
NDAs lift today. According to this: http://www.anandtech.com/show/11550/...7800x-tested/3
Quote:
This raises a bunch of questions:
Agner Fog is gonna have some fun with these. |
|
![]() |
![]() |
![]() |
#9 | |
"/X\(‘-‘)/X\"
Jan 2013
309110 Posts |
![]()
Also interesting:
Quote:
|
|
![]() |
![]() |
![]() |
#10 |
Sep 2016
5628 Posts |
![]()
Originally, I had assumed that all of the LCC Skylake X chips would have only half-throughput AVX512.
So my plan was to get the 8-core one for development and do correctness testing on all the AVX512 code that I've accumulated since 2013. Then come October, trade it up for the 16 or 18-core one for proper performance tuning. (especially around the anticipated memory bottleneck) Since the full-throughput chip is coming out now, I'll get that so I can start early. But I'm not sure if I still want to double-dip on another high-end chip in just 4 months from now. |
![]() |
![]() |
![]() |
#11 | |
∂2ω=0
Sep 2002
República de California
5·2,351 Posts |
![]() Quote:
The article only mentioned the two-half-width-uops implementation for 512-bit FMA ... that surely also includes pure-FMUL, but are they also lumping FADD in with that? If vector add were able to execute 2-per-cycle at full 512-bit width that would give a nice boost to FFT arithmetic, which is add-dominated. Last fiddled with by ewmayer on 2017-06-20 at 03:03 |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
The Secret CPU Inside Your Intel Processor | ewmayer | Tales From the Crypt(o) | 21 | 2017-11-23 03:02 |
64 bit intel processor? | Unregistered | Hardware | 2 | 2006-08-30 22:21 |
Intel Core Duo processor | drew | Hardware | 5 | 2006-05-29 07:00 |
Intel processor lineup | Peter Nelson | Hardware | 12 | 2005-07-04 20:42 |
Which type of Intel processor to choose? | Mike | Hardware | 11 | 2004-12-21 04:10 |