mersenneforum.org AVX512 hardware recommendations?
 Register FAQ Search Today's Posts Mark Forums Read

2020-05-31, 22:22   #23
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·1,279 Posts

Quote:
 Originally Posted by ewmayer That seems quite promising in terms of getting useful work out of both CPU and GPU ... did you observe the TDP for these 3 states? 1. System powered up but otherwise idle; 2. Prime95 running in max-throughput configuration; 3. Both Prime95 and gpuowl running in max-throughput configuration.
Only informally. More now: floor level ambient ~81F
gpu mfakto active (with prime95), gpu-z reports:
gpu chip 22.5w
vddc 17
vddci 3w
gpu 1124Mhz 71C
cpu 89C, 7.7 ms/iter at M56610787 LLDC

cpuid hwmonitor reports 15W cpu tdp
prime95 stop, cpuid reports ~2W tdp, cpu drops to ~60C

switch to gpuowl
gpu chip power 20W, vddc 15W, vddci 3w

43% system memory utilization
shut down, insert wattmeter at power plug. Following are plug draw

~15-20W draw during boot, 9-12W editing this text file
prime95 running jacobi check 17W total input;
iterating, 33W
stopped prime95, mfakto running, 33W
stopped mfakto, 9W
gpuowl resumed 33W
prime95 resumed 57W input
Quote:
 Re. heat, did you try popping the plastic top panel like I suggested? My i3-NUC has yet to arrive, but if the chassis is similarly designed as my Broadwell NUC, there's a flat sheet-metal panel underneath the plastic cap which can serve as a radiator, and is also a tempting target for affixing a heatsink-possible-with-fan.
No. Hadn't even removed the plastic finish protector film yet. Just took the film off. It was so flaky in the early going I thought I would be returning it shortly. It took ~a dozen restarts from crashes or hangs to get through OS install completion, initial configure and setup, including at least 4 during the prime95 benchmark. Temperatures don't look bad to me today, and I won't be modding it during the returns period.
Quote:
 My, you've been busy. :) Not sure what I'm supposed to be seeing in the plots on page 5 and 6 - in 5 you do best-fits using a monomial which gives an x^(-1.08...) best-fit behavior, I would be interested in seeing how that compares to a best-fit of a straight line to the data scaled as (iters/sec)*(n log n), i.e. the expected throughput based on FFT opcount.
Look at the local scatter. What causes the variation from one point to its immediate neighbors? George's rounding and threshold decisions? Or something intrinsic to 2-smooth vs. 3-smooth vs. n-smooth FFT lengths? Added 2 columns, and a graph on page 8.
Quote:
 ...Sounds like you're having fun, in your own distinctive Krieselian "data ... must have ... more data" fashion. :)
"MORE INPUT!" https://en.wikipedia.org/wiki/Short_Circuit_(1986_film)
Data -> analysis->sometimes improved understanding ->sometimes higher performance
(share, feedback, iterate)
"data scaled as (iters/sec)*(n log n)" -> more columns ->tinier print ->more moaning about readability? ;)

 2020-06-03, 21:53 #24 ewmayer ∂2ω=0     Sep 2002 República de California 22×3×72×19 Posts My NUC arrived couple of days ago, was too busy finishing up my new multi-GPU build to unpack it until late yesterday afternoon. AFAICT system is not just "like new" but in fact brand-new - shrink wrap on box looks like the professional factory variety, no sign of anything inside having been previously touched. First thing was to pop the plastic decorative cap - yep, more or less the same nice-looking but horribly heat-trapping design as my older Broadwell NUC (but see below - it only *looks* the same). Clean Ubuntu 19.10 install (bye, bye, Windows) no problems, Mlucas v19 built in both avx2 and avx512 SIMD mode, here the mlucas.cfg-file timings captured via the standard '-s m [-cpu ...]' self-tests. Without any qualification 'core' refers to physical core (or pcore), of which there are 2, alongside 2 additional logical cores (lcores) by way of hyperthreading. Thus e.g. '1c2t' in the table means 1 physical core was overloaded with 2 threads by way of assigning 1 thread to each of the 2 logical cores mapping to that physical core. All timings in ms/iter. The Max-Thruput column is based on the AVX-512 2c4t data immediately to its left: Code: AVX2 build: AVX-512 build: Max Thruput FFT(k) 1c1t 1c2t 2c2t 2c4t 1c1t 1c2t 2c2t 2c4t iters/sec 2048 16.80 15.78 9.10 8.63 13.35 12.60 7.70 7.02 142.5 2304 19.88 18.11 10.49 11.11 15.76 13.99 8.46 7.71 129.7 2560 21.65 19.84 11.37 12.10 17.56 15.39 9.52 8.53 117.2 2816 25.37 22.66 13.18 13.82 20.06 17.96 10.79 9.71 103.0 3072 26.48 25.09 14.12 15.38 21.10 20.24 11.94 10.68 93.6 3328 29.56 26.95 15.56 16.34 24.55 21.55 12.92 11.75 85.1 3584 30.61 28.26 17.27 17.09 25.06 22.01 13.78 13.79 72.5 3840 34.21 31.85 19.49 19.17 27.50 24.40 15.70 15.17 65.9 4096 35.53 32.30 19.91 19.88 28.59 26.00 17.48 16.34 61.2 4608 39.95 37.30 22.61 22.63 33.88 29.65 19.27 17.66 56.6 5120 44.74 41.29 25.41 25.50 37.35 33.66 21.32 20.89 47.9 5632 51.99 47.33 29.23 28.74 43.02 39.19 24.62 23.85 41.9 6144 58.16 54.36 32.45 32.85 45.24 43.59 27.91 26.38 37.9 6656 62.43 56.27 35.37 34.99 52.49 46.99 29.88 28.85 34.7 7168 64.12 58.93 36.17 36.23 53.56 48.22 31.48 29.89 33.5 7680 72.02 65.98 40.89 40.01 58.34 53.55 33.63 32.60 30.7 Thus for 1-core we see a ~10% gain from 2-threads via the hyperthreading, but for 2-core the boost from running 4 threads is much more modest, and sometimes even negative for the AVX2 build. For the AVX-512 build the 2c2t->2c4t boost is consistently of the desired sign, but highly FFT-length-dependent, ranging from ~10% to 0. The gain from using AVX-512 over AVX2 is a modest 1.2-1.4x, depending on FFT length and core|thread count, said modestness likely reflects the half-speed AVX-512 vector-MUL support on this CPU. Per Ken's numbers, Max Thruput for George's code ranges from 210 iters/sec @2048K to 52 iters/sec @7680K so George is faster, as expected, but not by a huge margin. I fired up a 2c4t job on a 5.5M-FFT exponent and let that run overnight, timing was rock-steady around 24.7 ms/iter, matching the value in the table. Thus throttling not an issue despite the fact that it was a warm night, and this a.m. the metal surface under the decorative plastic cap I'd popped off yesterday was merely lukewarm to the touch, suggesting that Intel solved the cap-traps-heat design problem I see in my Broadwell NUC. Gonna leave it off, though, until I get a chance to run gpuOwl on the AMD Radeon 540 GPU, to see if that affects the heat equation. I also tried running 2 jobs each using the 1c2t setup (using Intel's logical core numbering convention, lcores 0,2 map to pcore 0 and lcores 1,3 map to pcore 1, thus Mlucas core affinities for said 2 jobs were -cpu 0,2 and -cpu 1,3 (note: comma, not colon!), respectively), throughput was indistinguishable from 2c4t, i.e. each of the 2 jobs' per-iter times were 2x that of the 1-job timing. The max. throughput of this NUC is ~1.8x that of my older Broadwell NUC running AVX2 code, also in 2c4t mode. To set up for gpuOwl running, I followed the same recipe as I did on my Ubuntu 19.10 systems hosting Radeon VII cards ... all went smoothly until the link step: Code: ewmayer@ewmayer-NUC8i3CYS:~/RUN$git clone https://github.com/preda/gpuowl && cd gpuowl && make Cloning into 'gpuowl'... remote: Enumerating objects: 159, done. remote: Counting objects: 100% (159/159), done. remote: Compressing objects: 100% (106/106), done. remote: Total 5303 (delta 95), reused 95 (delta 53), pack-reused 5144 Receiving objects: 100% (5303/5303), 12.67 MiB | 1.79 MiB/s, done. Resolving deltas: 100% (3801/3801), done. ./tools/expand.py < gpuowl.cl > gpuowl-expanded.cl cat head.txt gpuowl-expanded.cl tail.txt > gpuowl-wrap.cpp echo \"git describe --long --dirty --always\" > version.new diff -q -N version.new version.inc >/dev/null || mv version.new version.inc echo Version: cat version.inc Version: "v6.11-311-gfa76bd9" g++ -MT Pm1Plan.o -MMD -MP -MF .d/Pm1Plan.Td -Wall -O2 -std=c++17 -c -o Pm1Plan.o Pm1Plan.cpp g++ -MT GmpUtil.o -MMD -MP -MF .d/GmpUtil.Td -Wall -O2 -std=c++17 -c -o GmpUtil.o GmpUtil.cpp g++ -MT Worktodo.o -MMD -MP -MF .d/Worktodo.Td -Wall -O2 -std=c++17 -c -o Worktodo.o Worktodo.cpp g++ -MT common.o -MMD -MP -MF .d/common.Td -Wall -O2 -std=c++17 -c -o common.o common.cpp g++ -MT main.o -MMD -MP -MF .d/main.Td -Wall -O2 -std=c++17 -c -o main.o main.cpp g++ -MT Gpu.o -MMD -MP -MF .d/Gpu.Td -Wall -O2 -std=c++17 -c -o Gpu.o Gpu.cpp g++ -MT clwrap.o -MMD -MP -MF .d/clwrap.Td -Wall -O2 -std=c++17 -c -o clwrap.o clwrap.cpp g++ -MT Task.o -MMD -MP -MF .d/Task.Td -Wall -O2 -std=c++17 -c -o Task.o Task.cpp g++ -MT checkpoint.o -MMD -MP -MF .d/checkpoint.Td -Wall -O2 -std=c++17 -c -o checkpoint.o checkpoint.cpp g++ -MT timeutil.o -MMD -MP -MF .d/timeutil.Td -Wall -O2 -std=c++17 -c -o timeutil.o timeutil.cpp g++ -MT Args.o -MMD -MP -MF .d/Args.Td -Wall -O2 -std=c++17 -c -o Args.o Args.cpp g++ -MT state.o -MMD -MP -MF .d/state.Td -Wall -O2 -std=c++17 -c -o state.o state.cpp g++ -MT Signal.o -MMD -MP -MF .d/Signal.Td -Wall -O2 -std=c++17 -c -o Signal.o Signal.cpp g++ -MT FFTConfig.o -MMD -MP -MF .d/FFTConfig.Td -Wall -O2 -std=c++17 -c -o FFTConfig.o FFTConfig.cpp g++ -MT AllocTrac.o -MMD -MP -MF .d/AllocTrac.Td -Wall -O2 -std=c++17 -c -o AllocTrac.o AllocTrac.cpp g++ -MT gpuowl-wrap.o -MMD -MP -MF .d/gpuowl-wrap.Td -Wall -O2 -std=c++17 -c -o gpuowl-wrap.o gpuowl-wrap.cpp g++ -MT sha3.o -MMD -MP -MF .d/sha3.Td -Wall -O2 -std=c++17 -c -o sha3.o sha3.cpp g++ -o gpuowl Pm1Plan.o GmpUtil.o Worktodo.o common.o main.o Gpu.o clwrap.o Task.o checkpoint.o timeutil.o Args.o state.o Signal.o FFTConfig.o AllocTrac.o gpuowl-wrap.o sha3.o -lstdc++fs -lOpenCL -lgmp -pthread -L/opt/rocm-3.3.0/opencl/lib/x86_64 -L/opt/rocm-3.1.0/opencl/lib/x86_64 -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L. /usr/bin/ld: cannot find -lOpenCL collect2: error: ld returned 1 exit status make: *** [Makefile:19: gpuowl] Error 1 'apt list --installed | grep libopencl1' shows this: Code: WARNING: apt does not have a stable CLI interface. Use with caution in scripts. 1349:ocl-icd-libopencl1/eoan,now 2.2.11-1ubuntu1 amd64 [installed,automatic] Last fiddled with by ewmayer on 2020-06-03 at 22:43  2020-06-03, 22:50 #25 kriesel "TF79LL86GIMPS96gpu17" Mar 2017 US midwest EFD16 Posts Ernst, run yours 24/7 and let's see how stable it is. Mine's poor. Longest uptime maybe 2 days between hangs, crashes, bugchecks, bluescreens and sometimes fails to start/restart; I've had it take as little as a minute between stops. It appears to be at least partly OS-independent; I've had multiple POST fails also. Found it this afternoon displaying a solid green screen and completely unresponsive except to pressing the power button for four seconds. Several stops and a little useful work later, it's probably going back soon for refund.  2020-06-03, 23:04 #26 ewmayer ∂2ω=0 Sep 2002 República de California 101011101001002 Posts I've been running nearly 24 hours w/o any problems ... suggest you request a return/refund on yours and order a like-new (= brand-new in my case) one from the same seller I bought from: https://www.amazon.com/Intel-BOXNUC8.../dp/B07HHB2YLG That item is listed via Amzn for$343 + free-ship, listing says "Available at a lower price from other sellers that may not offer free Prime shipping" ... but if your click on the 'other sellers' embedded link, at top is OEM XS INC, $255 + free-ship. I could not be more pleased with mine, I simply want to double my total throughput by also crunching on the GPU. Last fiddled with by ewmayer on 2020-06-03 at 23:04 2020-06-03, 23:27 #27 kriesel "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 3×1,279 Posts Quote:  Originally Posted by ewmayer I've been running nearly 24 hours w/o any problems Thanks. Try run prime95 and gpuowl on it together. That tips mine over sometimes within minutes. My seller's tech support asked me to run Seagate SSD diagnostics on the rotating HD! I get a variety of Windows stop codes, rarely a solid green screen, and often POST fails on restart. I suspect system ram. It's on a UPS. NUC is on the floor and that's often 80-85F ambient. Last fiddled with by kriesel on 2020-06-03 at 23:32 2020-06-03, 23:39 #28 ewmayer 2ω=0 Sep 2002 República de California 2BA416 Posts Quote:  Originally Posted by kriesel Thanks. Try run prime95 and gpuowl on it together. I'd like to, but as noted above my attempt to compile gpuOwl fails with an OpenCL-related link error. 2020-06-04, 00:20 #29 kriesel "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 3·1,279 Posts Quote:  Originally Posted by ewmayer I'd like to, but as noted above my attempt to compile gpuOwl fails with an OpenCL-related link error. So presumably then mfakto and cllucas also would have link problems. Try a prebuilt image. Maybe something from the mersenne.ca mirror. Or cross compile on your Haswell? There have been gpuowl builds and other things compiled for linux and posted on the forum. Last fiddled with by kriesel on 2020-06-04 at 00:21 2020-06-04, 01:05 #30 preda "Mihai Preda" Apr 2015 32×5×23 Posts Quote:  Originally Posted by ewmayer /usr/bin/ld: cannot find -lOpenCL collect2: error: ld returned 1 exit status make: *** [Makefile:19: gpuowl] Error 1[/code] 'apt list --installed | grep libopencl1' shows this: Code: WARNING: apt does not have a stable CLI interface. Use with caution in scripts. 1349:ocl-icd-libopencl1/eoan,now 2.2.11-1ubuntu1 amd64 [installed,automatic] does clinfo work? which OpenCL provider did you install -- e.g. ROCm (does ROCm support that GPU?) or amdgpu-pro. First step is to get clinfo to report at least one OpenCL device (other than the CPU). Next you can search for libOpenCL:$sudo updatedb
$locate OpenCL If needed, edit the Makefile and add the path with libOpenCL to -L (libs search path for linking). 2020-06-04, 01:34 #31 ewmayer 2ω=0 Sep 2002 República de California 22·3·72·19 Posts Quote:  Originally Posted by paulunderwood Something like this should fix it: Code: sudo ln -s /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0 /usr/lib/x86_64-linux-gnu/libOpenCL.so Check that the first file exists -- it might be 1.0 ir something else. You nailed it: Code: ewmayer@ewmayer-NUC8i3CYS:~$ ll /usr/lib/x86_64-linux-gnu/libOpenCL.so*
lrwxrwxrwx 1 root root    18 Apr  5  2017 /usr/lib/x86_64-linux-gnu/libOpenCL.so.1 -> libOpenCL.so.1.0.0
-rw-r--r-- 1 root root 43072 Apr  5  2017 /usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0
Ensuing git-clone/cd-into-gpuowl/make succeeds ... but after creating a worktodo with a couple PRP assignments, when try running, get the expected usual startup stuff but left hanging - ctrl-c and ctrl-z have no effect, and 'pidof gpuowl' in a separate window comes up empty:
Code:
2020-06-03 18:26:55 gpuowl v6.11-311-gfa76bd9
2020-06-03 18:26:55 device 0, unique id ''
[then nothing, empty space where occasional checkpoint output should be]
Quote:
 Originally Posted by preda does clinfo work? which OpenCL provider did you install -- e.g. ROCm (does ROCm support that GPU?) or amdgpu-pro. First step is to get clinfo to report at least one OpenCL device (other than the CPU). Next you can search for libOpenCL: $sudo updatedb Gives 'sudo: updatedb: command not found' Quote: $locate OpenCL
After installing the locate package, gives
Code:
/etc/OpenCL
/etc/OpenCL/vendors
/etc/OpenCL/vendors/amdocl64.icd
/opt/rocm-3.5.0/lib/libOpenCL.so
/opt/rocm-3.5.0/lib/libOpenCL.so.1
/opt/rocm-3.5.0/lib/libOpenCL.so.1.2
/opt/rocm-3.5.0/opencl/lib/libOpenCL.so
/opt/rocm-3.5.0/opencl/lib/libOpenCL.so.1
/opt/rocm-3.5.0/opencl/lib/libOpenCL.so.1.2
/usr/lib/x86_64-linux-gnu/libOpenCL.so
/usr/lib/x86_64-linux-gnu/libOpenCL.so.1
/usr/lib/x86_64-linux-gnu/libOpenCL.so.1.0.0
/usr/share/doc/ocl-icd-libopencl1/html/libOpenCL.html
/usr/share/man/man7/libOpenCL.7.gz
/usr/share/man/man7/libOpenCL.so.7.gz
4th-from-bottom is the ...1.0.0 Paul suggested looking for.

Oh, /opt/rocm/bin/rocm-smi shows
Code:
GPU  Temp   AvgPwr  SCLK    MCLK    Fan     Perf  PwrCap  VRAM%  GPU%
0    48.0c  3.214W  214Mhz  300Mhz  18.82%  auto  25.0W     1%   0%
Is ROCm appropriate for this model AMD GPU?

Last fiddled with by ewmayer on 2020-06-04 at 01:39

 2020-06-04, 01:55 #32 ewmayer ∂2ω=0     Sep 2002 República de California 101011101001002 Posts Forgot to answer Mihai's Q re. clinfo - that shows no valid device: Code: Number of platforms 1 Platform Name AMD Accelerated Parallel Processing Platform Vendor Advanced Micro Devices, Inc. Platform Version OpenCL 2.0 AMD-APP (3137.0) Platform Profile FULL_PROFILE Platform Extensions cl_khr_icd cl_amd_event_callback Platform Extensions function suffix AMD Platform Name AMD Accelerated Parallel Processing Number of devices 0 NULL platform behavior clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) No platform clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) No platform clCreateContext(NULL, ...) [default] No platform clCreateContext(NULL, ...) [other] No platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) No devices found in platform ROCm, OTOH, seems to see a valid device.
 2020-06-04, 02:36 #33 paulunderwood     Sep 2002 Database er0rr 5×643 Posts I think you are out of luck: https://github.com/RadeonOpenCompute...ftware-Support https://en.wikipedia.org/wiki/Radeon_RX_500_series

 Similar Threads Thread Thread Starter Forum Replies Last Post heliosh Hardware 19 2020-01-18 04:01 simon389 Software 20 2018-12-13 21:01 Mr. Odd Hardware 7 2016-06-02 01:07 ixfd64 Hardware 45 2012-11-14 01:19 Mr. Odd Factoring 12 2011-11-19 00:32

All times are UTC. The time now is 12:24.

Thu Jun 4 12:24:18 UTC 2020 up 71 days, 9:57, 1 user, load averages: 1.14, 1.14, 1.20