![]() |
![]() |
#1 |
∂2ω=0
Sep 2002
República de California
23·32·163 Posts |
![]()
Starting late last summer I ran a p-1 stage 1 to b1 = 10^7 on F33 on my Knights Landing cheapie-refurb mini-workstation. After installing a big wad of server dimm-RAM I've been running 10^8-sized stage 2 intervals using the stage 1 residue, with a view to starting a distributed such effort among interested forumites with suitable hardware.
But before risking wasting others' runtime, it's important to be sure stage 1 result is correct. Our own Mike/Xyzzy has been kindly running a separate stage 1 computation on his Intel 18c36t i9, that was roughly 70% done (~10m s1 iterations of the needed 14427494) when he recently shut said machine down and sold it off. The problem is that like most Intel manycore offerings, his machine had woefully inadequate memory bandwidth to keep those cores fed on a big-footprint (~4GB) FFT-modmul running data-hungry avx-512 8-fold-double code - using 16c32t he was getting 900-1000 ms/iter at 512M FFT, roughly half the speed of my KNL running out of the onboard 16GB HBM. If one were targeting F33 stage 1 work, I wonder what the most bang-for-buck-ish non-KNL avx-512 option would be. One would want at least 4 cores but no more than 8 due to memory-bandwidth constraints, as large an L3 cache as possible (on the KNL the MCDRAM acts as such) and - lacking any kind of HBM - a mobo which supports fast high-bandwidth RAM, with DIMM slots filled with low-capacity but very-fast sticks, say 16-32GB total. Maybe a 1-2-year-old used CPU, if the newer ones don't really offer much max-throughput for the above type of big-footprint workloads? if you have such a machine and are willing to do some timings, here's how: o Get and build and the current version of Mlucas, using instructions here. If your system has < 24GB RAM, you'll need a couple of post-build tweaks to reduce the memory footprint; PM me for those once your automated 'bash makemake.sh' parallel build completes. o 512M FFT will have a strong preference for power-of-2 threadcounts, and 2-threads-per-core assuming it's a hyperthreaded Intel CPU (AFAIK no AMD chips have avx-512 support at present). Assuming your machine has N physical cores and P = largest power of 2 <= N, you want to pin 2*P threads to the same subset of P physical cores. Using the Intel core numbering convention: ./Mlucas -iters 100 -fft 512M -f 33 -shift 0 -cpu 0:P-1,N:N+P-1 Thus e.g. on a 6c12t CPU, the args to the latter flag would be '-cpu 0:3,6:9'. The resulting timing captured in the fermat.cfg file will be ~10% pessimistic due to data-and-thread-init overhead. |
![]() |
![]() |
![]() |
#2 | |
Sep 2002
Database er0rr
32·467 Posts |
![]() Quote:
Newer chips run cooler, not like Skylake. |
|
![]() |
![]() |
![]() |
#3 | |
∂2ω=0
Sep 2002
República de California
2DD816 Posts |
![]() Quote:
Admittedly, it's a niche sort of optimization problem, and quite possibly a cheap used RAM-less KNL will prove the best option, I mainly wanted a sense of whether there were any consumer-grade Intel offerings which could provide similar total memory-bandwidth at comparable cost. |
|
![]() |
![]() |
![]() |
#4 | ||
Sep 2002
Database er0rr
32×467 Posts |
![]() Quote:
Quote:
![]() I looked at NewEgg. An Intel 12700k ($400) plus an Asus Strix motherboard ($500) and 64GB DDR5 (~$600) dual channel. The chip will run AVX512 if the motherboard allows the disablement of E-cores, resulting in 8 cores. Last fiddled with by paulunderwood on 2022-03-02 at 07:24 |
||
![]() |
![]() |
![]() |
#5 |
Aug 2002
North San Diego County
10110111012 Posts |
![]()
Just a note on AVX-512 vs FMA3 on a dual channel board. Used CpuSupportsAVX512F=0 or 1 to toggle AVX-512.
Code:
3200K FFT DCLL on 60198527 @4700 -4698 Mhz on all cores reported by CPU-Z for both FMA3 and AVX-512 runs. AVX-512 FMA3 2.88 ms/iter 3.025 ms/iter 2.87 ms/iter 3.022 ms/iter 2.91 ms/iter 3.018 ms/iter 1 worker 8 cores on all. Last fiddled with by sdbardwick on 2022-03-02 at 20:10 |
![]() |
![]() |
![]() |
#6 |
Aug 2002
2·19·223 Posts |
![]()
We are currently running this job on a 32GB NUC.
Code:
top - 17:08:22 up 5 days, 11 min, 1 user, load average: 5.10, 5.27, 4.73 Tasks: 281 total, 1 running, 278 sleeping, 0 stopped, 2 zombie %Cpu(s): 1.0 us, 0.3 sy, 50.0 ni, 48.2 id, 0.0 wa, 0.4 hi, 0.1 si, 0.0 st GiB Mem : 31.1 total, 12.8 free, 8.1 used, 10.1 buff/cache GiB Swap: 0.0 total, 0.0 free, 0.0 used. 22.1 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 121965 m 30 10 22.3g 5.0g 4.0m S 400.0 16.2 5230:10 ./mlucas -cpu 0:3 ![]() |
![]() |
![]() |
![]() |
#7 | |
Aug 2002
North San Diego County
10110111012 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#8 | |
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
2×5×599 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#9 |
Aug 2002
North San Diego County
2DD16 Posts |
![]()
According to Intel Extreme Tuning Utility, Package TDP for AVX-512 is 200W, FMA3 is 176W. Both stabilize at 4.7GHz, with the -512 running with an extra 0.1 V for core voltage.
Last fiddled with by sdbardwick on 2022-03-04 at 15:44 |
![]() |
![]() |
![]() |
#10 | |
∂2ω=0
Sep 2002
República de California
23×32×163 Posts |
![]() Quote:
Update: Paul Underwood has kindly agreed to run the stage 1 DC to completion, taking over from Mike/Xyzzy around iteration 9.4M. He's getting 502 ms/iter @512M FFT running 64c128t on his KNL, right around what I expected based on the 470 ms/iter I got on my KNL, which at 1.4 GHz clocks 0.1 GHz higher than his. At that rate, with ~5Miters left to go, ETA for the DC is 29 days from now, assuming uninterrupted 24/7 running. Last fiddled with by ewmayer on 2022-03-04 at 22:25 |
|
![]() |
![]() |
![]() |
#11 |
"Marv"
May 2009
near the Tannhäuser Gate
23×32×11 Posts |
![]()
Intel plans to fuse disable AVX-512 support from Alder Lake cpus
even tho it is on the chip. Previously they were kinda, possibly, maybe going to support it but have changed their minds. The link below is to my favorite leaks and rumors page, Gamer Meld. I have found them to be brand agnostic and very accurate. The Alder Lake section starts at 1:59 https://www.youtube.com/watch?v=LNQVX1YP7m4&t=207s Up yours Intel !! |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
CPUs and GPUs (Oh My) | Primeinator | Hardware | 15 | 2021-03-08 15:39 |
Combining CPUs | Edmond | Lounge | 11 | 2017-07-03 16:31 |
Can't Merge CPUs | Rodrigo | PrimeNet | 11 | 2012-03-03 19:45 |
Which of these CPUs is most productive? | Rodrigo | Hardware | 123 | 2011-02-05 21:42 |
A tale of 3 CPUs | chris2be8 | Hardware | 7 | 2010-07-20 23:12 |