mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Closed Thread
 
Thread Tools
Old 2020-07-10, 02:38   #1
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

132248 Posts
Default Integrated graphics processors, how to run GIMPS software on them, and why you may not want to

This thread is intended to hold only reference material specifically for integrated graphics processors, mostly Intel IGPs, which are what I have to experiment with.
(Suggestions are welcome. Discussion posts in this thread are not encouraged. Please use the reference material discussion thread http://www.mersenneforum.org/showthread.php?t=23383. Off-topic posts may be moved or removed, without warning or recourse, to keep the reference threads clean, tidy, and useful.)

IGPs typically have OpenCL and OpenGL capable drivers. CUDA applications do not run on them. So no Mfaktc, CUDAPm1, or CUDALucas possibilities. Performance is typically low, by design. Using IGPs requires short assignments, long expiration dates, and great patience.

A wide variety of results were obtained. Depending on IGP model:
both mfakto and early versions of gpuowl usable
mfakto but not gpuowl
neither
mfakto usable temporarily but motherboard permanently damaged!
Combination prime95 and mfakto increases total GhzD/day
Combination running lowers total GhzD/day


Table of contents
  1. This post
  2. Intel i7-7500U/HD620 https://www.mersenneforum.org/showpo...48&postcount=2
  3. Intel i7-8750H/UHD630 https://www.mersenneforum.org/showpo...51&postcount=3
  4. Intel i7-4790/HD4600 https://www.mersenneforum.org/showpo...52&postcount=4
  5. Intel i5-1035G1/UHD920 https://www.mersenneforum.org/showpo...83&postcount=5
  6. other possibilities https://www.mersenneforum.org/showpo...84&postcount=6
  7. Intel Celeron G1840/HD https://www.mersenneforum.org/showpo...91&postcount=7
  8. Intel i3-4170/HD4400 https://www.mersenneforum.org/showpo...59&postcount=8
  9. Intel i7-1165G7/Iris Xe https://www.mersenneforum.org/showpo...16&postcount=9

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-09-13 at 19:11 Reason: added i7-1165G7/Iris Xe results
kriesel is offline  
Old 2020-07-10, 03:38   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×5×172 Posts
Default Intel i7-7500U/HD620

It's possible to run gpuowl on the HD620 IGP, but only with very old versions, and it's not worthwhile except as a learning experiment. Running gpuowl subtracts much more throughput from prime95's throughput on the cpu, than gpouwl on the HD620 produces, in my limited testing.
Mfakto along with prime95 produces the most computing credit.

To summarize results:
Prime95 alone, 9.58 GhzD/day PRP
Prime95 + gpuowl V0.5, 2.3 + 2.36 = 4.66 GhzD/day PRP
Prime95 + gpuowl V1.9, 1.71 + 2.92 = 4.63 GhzD/day PRP
Prime95 + mfakto v0.15pre6, 4.8 GhzD/day PRP + 20 GhzD/day TF = 24.8 total *
Mfakto v0.15pre6 on IGP only, 18 GhzD/day TF
Gpuowl V1.9 on IGP only, 3.28 GhzD/day PRP (M74207281 4M DP FFT)
Gpuowl V1.9 on CPU only, 0.33 GhzD/day PRP (M74207281 4M DP FFT)


The system came with the OpenCL driver and Windows 10 OS preinstalled. All tests described here were performed under Windows 10.

Mfakto
Installation of mfakto from a prebuilt zip file is straightforward.
Create a folder. Unzip the file there.
Customize the mfakto.ini file
Create worktodo.txt with contents somehow.
If you need more guidance on installation, follow the howto guide for mfakto. Or see https://mersenneforum.org/showpost.p...1&postcount=77
It's useful for increasing total GhzDays/day computing credit from the laptop, and completing small TF tasks.
Its TF throughput is less than 1% of a modern GPU such as an RTX2080, so don't expect much from it. Tune the fast GPU before bothering with an IGP, to be the most productive first.

Gpuowl
Installation of gpuowl from a prebuilt zip file is straightforward.
Create a folder. Unzip the file there.
Create a suitable config.txt for versions that support that if you like.
Create worktodo.txt with contents somehow.
I generally create also a small batch file with a very short name in the folder to launch Gpuowl with just a few keystrokes from the command line. A desktop shortcut to launch cmd /k g05.bat or whatever in that folder completes the setup. Since I run a variety of versions I generally encode version number in the batch file name and in the shortcut name as a precaution against misplaced runs.
The batch file might look something like
Code:
title %cd%
if not exist help.txt gpuowl -h >help.txt
gpuowl
Look in the help.txt which will generally contain a list of OpenCL devices that gpuowl identified somehow. (Some early versions don't)

Prime95 v29.8b6 was already running M116.9M (539.85 GhzDays) at ~42 ms/iter before launching any gpuowl version on the IGP. (~24 iter/sec; 9.58 GhzD/day; estimated 56 days to PRP)
This IGP is known to be capable of running Mfakto. (At the cost of about half the prime95 throughput, the igp gives ~20 GhzD/day trial factoring throughput; about 18 without prime95 running at all.)

gpuowl-v6.11-318 (then the latest commit) failed to launch on the hd620 IGP.

gpuowl v2.0-dbc5a01 failed to launch on the hd620 IGP.

gpuowl V1.9-74f1a38 will run on it. That offers only PRP, in just 2M, 4M, and 8M fft length, in SP, DP, M31, and M61 transforms. Includes GEC, does not include proof generation capability.
Code:
gpuOwL v1.9- GPU Mersenne primality checker
Command line options:

-size 2M|4M|8M : override FFT size.
-fft DP|SP|M61|M31  : choose FFT variant [default DP]:
                DP  : double precision floating point.
                SP  : single precision floating point.
                M61 : Fast Galois Transform (FGT) modulo M(61).
                M31 : FGT modulo M(31).
-user <name>  : specify the user name.
-cpu  <name>  : specify the hardware name.
-legacy       : use legacy kernels
-dump <path>  : dump compiled ISA to the folder <path> that must exist.
-verbosity <level> : change amount of information logged. [0-2, default 0].
-device <N>   : select specific device among:
    0 : AMD Radeon VII 60 @d:0.0, gfx906 1801MHz
    1 : AMD Radeon VII 60 @7:0.0, gfx906 1801MHz
    2 : Intel(R) HD Graphics 4600, 20x1200MHz
    3 : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz,  8x3600MHz
There are no available PRP or PRP-DC assignments suitable for 2M or 4M fft length. SP and M31 are not useful. M61 was marginally useful, as an independent check, or for running 4M size up to slightly higher exponents than DP can handle, at slightly faster iteration times than 8M DP would provide. (See https://www.mersenneforum.org/showpo...31&postcount=8)
Running M96758209 (350.48 GhzDays credit) in 8M DP, along with prime95, gpuowl manages 106-109 ms/iter using 580MB on the IGP calculation. That timing corresponds to an estimated 120 days ETA, 9.35 iter/sec and 2.92 GhzD/day. (Using an 8M fft on that exponent is not very efficient.)
Prime95 iteration time rises to 216-253 ms/iter (drops to ~4-5 iterations/second, 17-20% of solo throughput) and %cpu utilization shown in task manager drops to a fluctuating value ranging 6 to 21% while gpuowl runs. This is equivalent to 4.27 iterations/second and 1.71 GhzD/day.
Gpuowl v1.9 initially failed to launch on the cpu as an OpenCL device on M96758209. Subsequent attempts for M74207281, 4M DP FFT gave 728.6 ms/iter using all 4 hyperthreads, or 0.33 GhzD/day!

Gpuowl v0.5 will run on the HD620 IGP. That version offers only 4M DP LL with offset and no Jacobi check.
Observed time was 97.7 ms /iteration on the test exponent 70100200 (187.42 GhzDays credit), for 10.2 iterations/second (2.36 GhzD/day) on the IGP. It takes up most of the cpu package's TDP, and slows the prime95 timing to a very fluctuating ~170ms/iteration ~6 iterations/second about 1/4 of solo throughput, ~2.3 GhzD/day.
The combined throughput is a considerable reduction from prime95-only.

gpuowl V2.0-dbc5a01 failing to launch on the hd620:
Code:
v2.0>gpuowl -device 0 -user kriesel -cpu falcon-hd620
gpuOwL v2.0- GPU Mersenne primality checker
Intel(R) HD Graphics 620-24x1050-
Note: using short, fused carry and fused tail kernels
OpenCL compilation in 5829 ms, with " -DEXP=83871443u  -I. -cl-fast-relaxed-math -cl-kernel-arg-info "
PRP-3: FFT 5000K (625 * 4096 * 2) of 83871443 (16.38 bits/word) [2020-06-20 19:24:23 Central Daylight Time]
Starting at iteration 3391500
error -54 (fft4K)
Assertion failed!

Program: C:\Users\User\Documents\gpuowl\v2.0\gpuowl.exe
File: clwrap.h, Line 267

Expression: check(clEnqueueNDRangeKernel(queue, kernel, 1, __null, &workSize, &groupSize, 0, __null, __null), name.c_str())
Gpuowl v1.9-74f1a38 failing to launch using the i7-7500U cpu:
Code:
gpuowl-v1.9-74f1a38>gpuowl -user kriesel -cpu 

falcon-hd620 -verbosity 2 -device 1
gpuOwL v1.9- GPU Mersenne primality checker
Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz,  4x2700MHz
Compilation started
Compilation done
Linking started
Linking done
Device build started
Device build done
Kernel <fftW> was not vectorized
Kernel <fftH> was not vectorized
Kernel <fftP> was not vectorized
Kernel <carryA> was successfully vectorized (8)
Kernel <carryM> was successfully vectorized (8)
Kernel <carryB> was successfully vectorized (8)
Kernel <square> was successfully vectorized (8)
Kernel <multiply> was successfully vectorized (8)
Kernel <carryConv> was not vectorized
Kernel <tail> was not vectorized
Kernel <transposeW> was not vectorized
Kernel <transposeH> was not vectorized
Done.
OpenCL compilation in 3210 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0  -DEXP=96758209u -DWIDTH=2048u -

DHEIGHT=2048u -DLOG_NWORDS=23u -DFP_DP=1 "
Note: using long carry kernels
PRP-3: FFT 8M (2048 * 2048 * 2) of 96758209 (11.53 bits/word) [2020-06-20 17:06:07 Central Daylight Time]
Starting at iteration 10500
error -5 (fftP)
Assertion failed!

Program: C:\Users\User\Documents\gpuowl\v1.9\gpuowl-v1.9-74f1a38\gpuowl.exe
File: clwrap.h, Line 230

Expression: check(clEnqueueNDRangeKernel(queue, kernel, 1, __null, &workSize, &groupSize, 0, __null, __null), name.c_str())
Gpuowl v6.11-318 failed to launch on the i7-7500's hd620 igp:
Code:
2020-06-20 16:23:28 config: -cpu falcon/hd620 -user kriesel -device 2 -use NO_ASM -safeMath
2020-06-20 16:23:28 device 2, unique id ''
2020-06-20 16:23:28 falcon/hd620 1398269 FFT: 128K 256:1:256 (10.67 bpw)
2020-06-20 16:23:28 falcon/hd620 Expected maximum carry32: D0000
2020-06-20 16:23:28 falcon/hd620 OpenCL args "-DEXP=1398269u -DWIDTH=256u -DSMALL_HEIGHT=256u -DMIDDLE=1u -DPM1=0 -

DWEIGHT_STEP=0xa.12080bedffap-3 -DIWEIGHT_STEP=0xc.b5e196139dc98p-4 -DNO_ASM=1  -cl-unsafe-math-optimizations -cl-

std=CL2.0 -cl-finite-math-only "
2020-06-20 16:23:29 falcon/hd620 ASM compilation failed, retrying compilation using NO_ASM
2020-06-20 16:23:30 falcon/hd620 OpenCL compilation error -11 (args -DEXP=1398269u -DWIDTH=256u -DSMALL_HEIGHT=256u 

-DMIDDLE=1u -DPM1=0 -DWEIGHT_STEP=0xa.12080bedffap-3 -DIWEIGHT_STEP=0xc.b5e196139dc98p-4 -DNO_ASM=1  -cl-unsafe-
math-optimizations -cl-std=CL2.0 -cl-finite-math-only  -DNO_ASM=1)
2020-06-20 16:23:30 falcon/hd620 1:58:26: warning: unsupported OpenCL extension 'cl_khr_int64_base_atomics' - 

ignoring
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
                         ^
1:59:26: warning: unsupported OpenCL extension 'cl_khr_int64_extended_atomics' - ignoring
#pragma OPENCL EXTENSION cl_khr_int64_extended_atomics : enable
                         ^
1:1916:29: error: call to 'atom_add' is ambiguous
if (get_local_id(0) == 0) { atom_add(&out[0], sum); }
                            ^~~~~~~~
opencl-c.h:13346:12: note: candidate function
int __ovld atom_add(volatile __global int *p, int val);
           ^
opencl-c.h:13347:21: note: candidate function
unsigned int __ovld atom_add(volatile __global unsigned int *p, unsigned int val);
                    ^
opencl-c.h:13350:12: note: candidate function
int __ovld atom_add(volatile __local int *p, int val);
           ^
opencl-c.h:13351:21: note: candidate function
unsigned int __ovld atom_add(volatile __local unsigned int *p, unsigned int val);
                    ^
1:2315:1: error: call to 'atom_add' is ambiguous
atom_add((global ulong *) &carryStats[0], carryMax);
^~~~~~~~
opencl-c.h:13347:21: note: candidate function
unsigned int __ovld atom_add(volatile __global unsigned int *p, unsigned int val);
                    ^
opencl-c.h:13351:21: note: candidate function
unsigned int __ovld atom_add(volatile __local unsigned int *p, unsigned int val);
                    ^
opencl-c.h:13346:12: note: candidate function
int __ovld atom_add(volatile __global int *p, int val);
           ^
opencl-c.h:13350:12: note: candidate function
int __ovld atom_add(volatile __local int *p, int val);
           ^

2:58:26: warning: unsupported OpenCL extension 'cl_khr_int64_base_atomics' - ignoring
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
                         ^
2:59:26: warning: unsupported OpenCL extension 'cl_khr_int64_extended_atomics' - ignoring
#pragma OPENCL EXTENSION cl_khr_int64_extended_atomics : enable
                         ^
2:1916:29: error: call to 2020-06-20 16:23:30 falcon/hd620 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram 

at clwrap.cpp:246 build
2020-06-20 16:23:30 falcon/hd620 Bye
It's somewhat different error specifics for v6.11-318 on the i7-7500U cpu:
Code:
2020-06-20 17:14:19 gpuowl v6.11-318-g3109989
2020-06-20 17:14:19 config: -cpu falcon/hd620 -user kriesel -device 1 -use NO_ASM -safeMath
2020-06-20 17:14:19 device 1, unique id ''
2020-06-20 17:14:19 falcon/hd620 1398269 FFT: 128K 256:1:256 (10.67 bpw)
2020-06-20 17:14:19 falcon/hd620 Expected maximum carry32: D0000
2020-06-20 17:14:20 falcon/hd620 OpenCL args "-DEXP=1398269u -DWIDTH=256u -DSMALL_HEIGHT=256u -DMIDDLE=1u -DPM1=0 -

DWEIGHT_STEP=0xa.12080bedffap-3 -DIWEIGHT_STEP=0xc.b5e196139dc98p-4 -DNO_ASM=1  -cl-unsafe-math-optimizations -cl-

std=CL2.0 -cl-finite-math-only "
2020-06-20 17:14:20 falcon/hd620 ASM compilation failed, retrying compilation using NO_ASM
2020-06-20 17:14:20 falcon/hd620 OpenCL compilation error -11 (args -DEXP=1398269u -DWIDTH=256u -DSMALL_HEIGHT=256u 

-DMIDDLE=1u -DPM1=0 -DWEIGHT_STEP=0xa.12080bedffap-3 -DIWEIGHT_STEP=0xc.b5e196139dc98p-4 -DNO_ASM=1  -cl-unsafe-

math-optimizations -cl-std=CL2.0 -cl-finite-math-only  -DNO_ASM=1)
2020-06-20 17:14:20 falcon/hd620 Compilation started
1:58:26: warning: unsupported OpenCL extension 'cl_khr_int64_base_atomics' - ignoring
#pragma OPENCL EXTENSION cl_khr_int64_base_atomics : enable
                         ^
1:59:26: warning: unsupported OpenCL extension 'cl_khr_int64_extended_atomics' - ignoring
#pragma OPENCL EXTENSION cl_khr_int64_extended_atomics : enable
                         ^
1:1916:29: error: call to 'atom_add' is ambiguous
if (get_local_id(0) == 0) { atom_add(&out[0], sum); }
                            ^~~~~~~~
opencl-c.h:13346:12: note: candidate function
int __ovld atom_add(volatile __global int *p, int val);
           ^
opencl-c.h:13347:21: note: candidate function
unsigned int __ovld atom_add(volatile __global unsigned int *p, unsigned int val);
                    ^
opencl-c.h:13350:12: note: candidate function
int __ovld atom_add(volatile __local int *p, int val);
           ^
opencl-c.h:13351:21: note: candidate function
unsigned int __ovld atom_add(volatile __local unsigned int *p, unsigned int val);
                    ^
1:2315:1: error: call to 'atom_add' is ambiguous
atom_add((global ulong *) &carryStats[0], carryMax);
^~~~~~~~
opencl-c.h:13347:21: note: candidate function
unsigned int __ovld atom_add(volatile __global unsigned int *p, unsigned int val);
                    ^
opencl-c.h:13351:21: note: candidate function
unsigned int __ovld atom_add(volatile __local unsigned int *p, unsigned int val);
                    ^
opencl-c.h:13346:12: note: candidate function
int __ovld atom_add(volatile __global int *p, int val);
           ^
opencl-c.h:13350:12: note: candidate function
int __ovld atom_add(volatile __local int *p, int val);
           ^
Compilation failed

2020-06-20 17:14:20 falcon/hd620 Exception gpu_error: BUILD_PROGRAM_FAILURE clBuildProgram at clwrap.cpp:246 build
  2020-06-20 17:14:20 falcon/hd620 Bye
(GhzDays credit calculated with https://www.mersenne.ca/credit.php?w...onent=70100200 etc.)


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-09-13 at 16:09
kriesel is offline  
Old 2020-07-10, 03:43   #3
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×5×172 Posts
Default Intel i7-8750H/UHD630

Mfakto:
Install by unzipping a pre-built package into a user's working directory. (NOT in Program Files or Windows directory, or system owned space on Linux)
Review the mfakto.ini file and make any necessary edits, such as for your system name and PrimeNet username.

My Mfakto worktodo file for the UHD630 contains the following comments at the end, summarizing some issues seen in the lengthy self-test:
# there were several self test errors all above 82 bits,
# so restrict assignments to finish at or below 81

Prime95 on an i7-8750H CPU (6 cores + HT), PRP 113090209 (496.34 GhzDays credit) FMA3 6M fft took:
Prime95 solo 19.64 ms/it = 50.93 iterations/sec for 19.31 GhzD/day.
Prime95 with Mfakto v0.15pre6 also running, 30.146 ms/it = 33.17 iterations/second for 12.58 GhzD/day
Running mfakto with prime95 cost 17.76 iterations/second of prime95 throughput, or 6.73 GhzD/day of prime95 throughput.
Mfakto produced 10.863 GhzD/day running with prime95; combined prime95 and Mfakto credit is 23.44 GhzD/day.
Mfakto produced 20.05 GhzD/day running while prime95 is stopped.

Factor=(aid),111027737,73,74 was the Mfakto assignment run for test purposes. This will take most of a day without prime95 running, or nearly 2 days with prime95 running.

Gpuowl:
Some IGPs won't run Gpuowl at all. IGPs are quite slow generally, whether Gpuowl or Mfakto are used.
In my experience Intel IGPs such as HD620 or UHD630 are no better than ~20 GhzD/day in TF, and considerably lower in PRP or LL.
Neither seems capable of running Gpuowl v2.0 or newer, based on limited testing.
Very slow IGP, using power inefficiently, plus slow old limiting software, is an unproductive combination, fit only for brief experimentation to satisfy curiosity.
Previous testing on a different system with an HD620 was similarly unproductive; only very old Gpuowl versions ran, and were not worthwhile.
Getting a recent Gpuowl version to work on IGPs seems a poor allocation of programming time, given the disappointing power and speed performance of early versions compared to the prime95 or Mlucas alternative for using the package power budget, and other possibilities for improving Gpuowl further for discrete GPUs. So Gpuowl on IGPs is not likely to improve soon, perhaps ever.

Running old versions on IGPs may be useful in very limited circumstances. Perhaps some newer IGPs perform better. Some AMD IGPUs are reportedly about as fast as an RX550.

To try it anyway:
If not already present, install OpenCL support for the Intel IGP. This normally is part of the driver installation and OS installation by the computer manufacturer if sold with preinstalled Windows or preinstalled Linux. (I have no hands-on experience with preinstalled Linux, or running Gpuowl on an IGP under Linux. Its performance may be mildly better, but not change conclusions.)
Confirm there is a working OpenCL computing environment with a test utility.

Obtain and install a version of Gpuowl. Very early versions suggested: V0.5 LL (which includes offset but no Jacobi check), v0.6 4M fft LL with Jacobi check but not offset, v1.9 PRP with GEC, 2, 4 or 8M fft size, DP, SP, M31 or M61 transforms. From the readme file, "Make sure that the Gpuowl.cl file is in the same folder as the executable". Obtain pre-compiled from links at https://www.mersenneforum.org/showth...539#post488539 or https://download.mersenne.ca/gpuowl
Pre-compiled versions for Windows available for download are many; equivalents for Linux are rare.

Or use usual Linux git clone and make practice with https://github.com/preda/gpuowl
For more detail on building for Windows or for Linux see https://www.mersenneforum.org/showpo...4&postcount=21 or the Linux-specific thread https://mersenneforum.org/showthread.php?t=25601
Create a worktodo.txt and put an assignment in it for test.
Optionally create a batch file or script to run Gpuowl, or use the command line.
Beware that syntax and options change frequently with Gpuowl version.

Test results
The following was while running Windows 10. Multiple other Windows applications were loaded but not being actively used at the time. I chose not to disrupt during this experiment, the usual usage of this laptop by which I access other worker systems by remote desktop, use the internet, write things like this post, etc. Those remained active during the test since it would be representative of my expected usage on this IGP if any.

In V1.9-74f1a38 Gpuowl a 4M test exponent 77936867 PRP (243.2 GhzDays credit) took 182 msec/iteration with the more efficient DP transform, producing 5.5 iterations/second while prime95 is already running, yielding 1.48 GhzDays/day on the IGP.
Without prime95 running, Gpuowl took 117.5 ms/iter, so produced 8.5 iteration/second, yielding 2.29 GhzDays/day on the IGP.

Prime95 on an i7-8750h CPU (6 cores + HT), PRP 95038813 (344.25 GhzDays credit) 5M FFT took:
solo 12.7 ms/it = 78.7 iterations/sec for 24.63 GhzD/day.
with Gpuowl v1.9, 15.5ms/it = 64.5 iterations/second for 20.19 GhzD/day
Running Gpuowl with prime95 cost 14.2 iterations/second of prime95 throughput, or -4.44 GhzD/day of prime95 throughput.

The net loss is 9.7 iteration/second (actually more because the iterations lost are larger-fft) by running Gpuowl too, or 70.+ iterations/sec loss by running Gpuowl instead (again, effectively more because of fft-size difference). A 5M fft iteration is at least 25% more effort or valuable than a 4M fft iteration. One might make the case they are (5M log 5M) / (4M log 4M) more valuable. The GhzD/day calculations supposedly account for that.

To summarize, don't run Gpuowl on the UHD630 if you want maximum throughput.
Prime95 only, 24.63 GhzD/day (maximum throughput case)
Prime95 on CPU and Gpuowl on IGP, 20.19 + 1.48 = 21.67 GhzD/day (88% of maximum; 12% loss)
Gpuowl on IGP only, 2.29 GhzD/day (9.3% of maximum; 90.7% loss)
idle system 0 GhzD/day

If a more recent version of Gpuowl could be made to run on the IGP and had double the performance of v1.9, it would still not be worth running on the IGP.
Prime95: 24.63
mixed extrapolation: 20.19 + 1.48 x 2 = 23.15
Gpuowl-only extrapolation: 2.29 x 2 = 4.58

It seems as if the Intel IGP is a particularly inefficient GPU implementation.
Getting a current version of Gpuowl to work on an Intel IGP would not be a good use of Mihai's time. Running Gpuowl on an Intel IGP seems not to be a good use of laptop power budget.

The IGP power consumption was indicated in GPU-Z as about 11W running with prime95, so about 2 Joules / 4Mfft-iteration.
Running prime95 but not Gpuowl, the package power consumption was indicated in CPUID HWMonitor as 40.5W, so about 0.515 J / 5Mfft-iteration. Prime95 on the cpu could be regarded as much more efficient than that, since without prime95 or Gpuowl, the observed "idle"-package power is around 35 W. Considering only the incremental power consumption of prime95, 5.5W / 78.7 iterations/sec = 70 millijoules / 5Mfft-iteration!

Gpuowl v2.0 launched and hung before any appreciable progress was observed in the file system or at the console or in the log. GPU-Z showed the IGP sitting at idle conditions. The entire log contained:
Code:
gpuOwL v2.0- GPU Mersenne primality checker
Intel(R) UHD Graphics 630-24x1100- 
Note: using long carry and fused tail kernels
OpenCL compilation in 18087 ms, with " -DEXP=83871443u  -I. -cl-fast-relaxed-math -cl-kernel-arg-info "
PRP-3: FFT 5000K (625 * 4096 * 2) of 83871443 (16.38 bits/word) [2020-06-18 02:07:40 Central Daylight Time]
Starting at iteration 3391500
Gpuowl V6.5-61-g5c0db85 produces error on load, after minutes from launch, seemingly independent of -fft value +0 through +3 (next fft length).
Code:
C:\Users\kkrie\Documents\gpuowl-65-test>gpuowl-win -device 0 -fft +0 -carry long -use ORIG_X2
2020-06-17 16:43:25 gpuowl v6.5-61-g5c0db85
2020-06-17 16:43:25 Note: no config.txt file found
2020-06-17 16:43:25 config: -device 0 -fft +0 -carry long -use ORIG_X2
2020-06-17 16:43:25 87398387 FFT 5120K: Width 256x4, Height 64x4, Middle 10; 16.67 bits/word
2020-06-17 16:43:25 using long carry kernels
2020-06-17 16:43:27 OpenCL args "-DEXP=87398387u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=10u -DFRAC=12357831637820925542ul -DWEIGHT_STEP=0xa.0e81d99e13ac8p-3 -DIWEIGHT_STEP=0xc.ba55dbe3e5aep-4 -DWEIGHT_BIGSTEP=0x9.837f0518db8a8p-3 -DIWEIGHT_BIGSTEP=0xd.744fccad69d68p-4 -DINVWEIGHT_LIMIT=0xc.cccccccccccdp-29 -DORIG_X2=1  -I. -cl-fast-relaxed-math -cl-std=CL2.0"
2020-06-17 16:44:11 OpenCL compilation in 44604 ms
2020-06-17 16:44:13 87398387.owl loaded: k 87000000, block 1000, res64 d2d69bc89926f0a4
2020-06-17 16:46:52 87398387 EE loaded: 87000000, blockSize 1000, c89b639632165de5 (expected d2d69bc89926f0a4)
2020-06-17 16:46:52 Exiting because "error on load"
2020-06-17 16:46:52 Bye
(GhzDays credit computed using https://www.mersenne.ca/credit.php?w...onent=95038813 etc.)


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-06-08 at 15:59 Reason: cosmetic
kriesel is offline  
Old 2020-07-10, 03:58   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×5×172 Posts
Default Intel i7-4790/HD4600

Attempts to run Mfakto on the HD4600 IGP had been repeatedly thwarted by conflicts between its OpenCL driver installation and OpenCL driver installations for AMD or for NVIDIA GPUs. Getting this IGP to even hold an OpenCL functional state long enough to benchmark was a long effort. It finally began working after the first motherboard on this build failed with arcing and flame, and the motherboard was replaced. Quite unexpectedly, the HD, ram and cpu from the failed system produced working Mfakto on the second motherboard without even trying, and with two Radeon VIIs running gpuowl at the same time.

Note, the first motherboard failure may have damaged some gpus installed at the time. The second motherboard (also an Asrock H81 BTC Pro v2.0) failed in a similar fashion, while running prime95 and mfakto together on the i7-4790 & HD4600. They may just be more current draw than the motherboard design can handle on a continuous basis.

To summarize:
Prime95 alone, 30.3 GhzD/day PRP
Prime95 with Mfakto 31.47 + 4.77 = 36.24 GhzD/day (but not recommended; second motherboard failed quickly running this combination)
Mfakto alone, 16.9 GhzD/day TF
gpuowl on HD4600 produced opencl compile errors and a quick halt; no gpuowl throughput.

The IGP Mfakto throughput while prime95 runs on the cpu is very small, <0.2% of a modern gpu such as the RTX2080. Run time of a typical TF wavefront assignment consequently can be very long.

While prime95 V29.8b6 runs PRP on 95932673 FMA fft 5M (347.49 GhzDays credit), 10.33ms/iter = 30.30 GhzD/day, and 2 Radeon VIIs run gpuowl PRP3 & LL,
mfakto v0.15-pre6 exp=110044411 bit_min=75 bit_max=76 (69.54 GHz-days)
gave 8.0 GhzD/day on mfakto in the first class to finish (782 seconds), running at perhaps a mix of 1200Mhz and 350Mhz / 2.9 W; after halting prime95, hd4600 upshifts to 1200Mhz and 15 Watts. Second class to complete is 6.04 GhzD/day and 1036 seconds, completing after prime95 was stopped. Third class is 370.25 sec, 16.9 GhzD/day, with prime95 stopped for most or all. Let run for more Mfakto-only classes to complete, it yields (while the cpu still services the gpuowl on radeonvii 1 and 2 as required, for Jacobi or GEC), 16.89, 16.91, so 16.90 seems to be its throughput. During this period IGP temp is 57C, cpu temp 45C.
Continue the prime95 session at 20:53 July 8 and observe the change in Mfakto throughput;
cpu and IGP go to 99C; cpu quickly becomes thermally throttling so ms/iter lengthens from initially 9.2 to 10.4 ms/iter quickly;
next class is 12GhzD/day, probably transitional; HD4600 has gone back to 3W / 350Mhz.

Next class is 4.77GhzD/day, 1312.6 seconds. Prime95 throughput seems unaffected, or even improved, at 9.945ms/iter,
Next class is 4.78GhzD/day.

Gpuowl v1.9 on the hd4600 failed as shown below. Other versions were not tried.

Code:
\gpuowl-v1.9>gpuowl -device 2
gpuOwL v1.9- GPU Mersenne primality checker
Intel(R) HD Graphics 4600, 20x1200MHz
OpenCL compilation error -43 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0  -DEXP=77973559u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 )
OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math  -DEXP=77973559u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 )
In file included from 1:1:
.\gpuowl.cl:67:26: error: OpenCL extension 'cl_khr_fp64' is unsupported
.\gpuowl.cl:135:52: error: use of undeclared identifier 'M_SQRT1_2'
.\gpuowl.cl:136:52: error: use of undeclared identifier 'M_SQRT1_2'
.\gpuowl.cl:225:61: error: no matching function for call to 'fabs'
c:/j/workspace/llvm/llvm/tools/clang/lib/cclang\<stdin>:370:61: note: candidate function not viable: no known conversion from 'T2' (aka 'double2') to 'float' for 1st argument
c:/j/workspace/llvm/llvm/tools/clang/lib/cclang\<stdin>:371:62: note: candidate function not viable: no known conversion from 'T2' (aka 'double2') to 'float2' for 1st argument
c:/j/workspace/llvm/llvm/tools/clang/lib/cclang\<stdin>:372:62: note: candidate function not viable: no known conversion from 'T2' (aka 'double2') to 'float3' for 1st argument
c:/j/workspace/llvm/llvm/tools/clang/lib/cclang\<stdin>:373:62: note: candidate function not viable: no known conversion from 'T2' (aka 'double2') to 'float4' for 1st argument
c:/j/workspace/llvm/llvm/tools/clang/lib/cclang\<stdin>:374:62: note: candidate function not viable: no known conversion from 'T2' (aka 'double2') to 'float8' for 1st argument
c:/j/workspace/llvm/llvm/tools/clang/lib/cclang\<stdin>:375:63: note: candidate function not viable: no known conversion from 'T2' (aka 'double2') to 'float16' for 1st argument
c:/j/workspace/llvm/llvm/tools/clang/lib/cclang\<stdin>:423:85: note: candidate function not viable: no known conversion from 'T2' (aka 'double2') to 'half2' for 1st argument
c:/j/workspace/llvm/llvm/tools/clang/lib/cclang\<stdin>:424:50: note: candidate function not viable: no known conversion from 'T2' (aka 'double2') to 'half3' for 1st argument
c:/j/workspace/llvm/llvm/tools/clang/lib/cclang\<stdin>:425:19: note: candidate function not viable: no known conversion from 'T2' (aka 'double2') to 'half4' for 1st argument
c:/j/workspace/llvm/llvm/tools/clang/lib/cclang\<stdin>:425:74: note: candidate function not viable: no known conversion from 'T2' (aka 'double2') to 'half8' for 1st argument
c:/j/workspace/llvm/llvm/tools/clang/lib/cclang\<stdin>:426:44: note: candidate function not viable: no known conversion from 'T2' (aka 'double2') to 'half16' for 1st argument
c:/j/workspace/llvm/llvm/tools/clang/lib/cclang\<stdin>:423:31: note: candidate function not viable: call to __host__ function from __host__ function

 Bye
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-08-07 at 18:20
kriesel is offline  
Old 2020-07-11, 17:04   #5
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×5×172 Posts
Default Intel i5-1035G1/UHD920

HWInfo64 v6.28 identifies the IGP on the i5-1035G1 as a UHD920.
OpenCL 2.1 and the Windows 10 OS came preinstalled on the hardware purchased used.
GPU-Z indicates numerous OpenCL parameters in the Advanced tab, OpenCL type.

Mfakto:

This fails to run Mfakto v0.15pre6 correctly; it passes only 14 of the 30 in the brief startup self-tests, and then aborts.
That was the case for all attempts made, as follows, in that order:
  1. install, use all default settings, add some work, try a run.
  2. change ini file to GPUType=Intel, retry
  3. change BIOS to turn off C-states, retry
  4. change Vectorsize to 1, gridsize to 1, numstreams to 1, retry
  5. switch from GPU sieving to CPU sieving, retry
Gpuowl:
V1.9 failed to compile properly. The failure messages go on at great length, about 120KB
./gpuowl.cl:69:26: warning: unsupported OpenCL extension 'cl_khr_fp64' - ignoring
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
The GPU-Z Advanced tab OpenCL data includes this nugget: "DP Capability None". So it seems incapable of doing the sorts of computations any version of gpuowl would require.

V0.5 fails also quite verbosely, at 59KB.

So, no way found to use this IGP for GIMPS.
If someone finds a way to get mfakto to work on this IGP please post how.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-06-08 at 16:03
kriesel is offline  
Old 2020-07-11, 17:16   #6
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·5·172 Posts
Default Other possibilities

AMD:
Mfakto
There are reports of running Mfakto on AMD iGPUs, at 45 or over 120 GhzD/day. https://www.mersenneforum.org/showpost.php?p=547866&postcount=1618
https://www.mersenneforum.org/showpo...0&postcount=11
(HD 6550D / A8 3850) two instances at 20 each: https://www.mersenneforum.org/showpo...&postcount=671
Ryzen 3400G APU 200+GhzD/day https://www.mersenneforum.org/showpo...postcount=1632
AMD A8-9600 (Radeon R7) https://www.mersenneforum.org/showpo...postcount=1659

Gpuowl
https://www.mersenneforum.org/showpost.php?p=553381&postcount=1620
Radeon Vega 8 iGPU of a Ryzen 3 3200G, 18.1ms on 111M gpuowl PRP, corresponding to 487.654 GhzD/(.0181*111111113/86400) = 20.95 GhzD/day. https://www.amd.com/en/products/apu/amd-ryzen-3-3200g
https://www.mersenneforum.org/showpo...postcount=1625
Ryzen 4700u https://mersenneforum.org/showpost.p...postcount=2504
AMD A8-9600 (Radeon R7) https://www.mersenneforum.org/showpo...postcount=2615


Mobile devices:
As far as I know, there is no code to run GIMPS computations on smartphone GPUs at this time.


Other Intel IGP models:
Mfakto:
HD3000 apparently no; https://www.mersenneforum.org/showthread.php?t=17636
HD4000 was supported as of mfakto v0.12 https://mersenneforum.org/showpost.p...8&postcount=10
Iris 640 on MacOS was problematic https://mersenneforum.org/showpost.p...8&postcount=74
https://mersenneforum.org/showpost.php?p=528953&postcount=76 until a driver update made it work https://mersenneforum.org/showpost.p...9&postcount=81
In case of difficulty, try different values of the mfakto parameters as described at https://www.mersenneforum.org/showpo...postcount=1177 or https://www.mersenneforum.org/showpo...&postcount=674
i5-4670k (HD4600), i3-8100 (UHD630) https://www.mersenneforum.org/showpo...postcount=1651
HD530 on Linux https://www.mersenneforum.org/showpo...8&postcount=90 and benchmark tbd


Gpuowl:
HD530 on Linux howto https://www.mersenneforum.org/showpo...8&postcount=90 and benchmark https://mersenneforum.org/showpost.p...postcount=2678
UHD610 https://mersenneforum.org/showpost.p...postcount=2685


No joy on gpuowl V6.11-380, Celeron J4105 with UHD 600 https://mersenneforum.org/showpost.p...postcount=2503
Old IGPs lacking DP will be unable to run any version of gpuowl that requires it. Try -fft M61 of gpuowl v1.9 as a last resort.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-01-12 at 18:19 Reason: added igp specific links
kriesel is offline  
Old 2020-08-19, 20:12   #7
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×5×172 Posts
Default Intel Celeron G1840/HD

No useful results were obtained on the Celeron G1840 IGP on Windows 10 attempts.

Mfakto installed ok and passed the 30-item self-test, then went idle instead of processing the worktodo contents.

Gpuowl v0.5 produces a shower of errors and will not perform its self-test.

Gpuowl V1.9 explicitly skips every line of the worktodo file. Even those copied and pasted from the readme.md file.


Top of reference tree https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-09-13 at 19:06
kriesel is offline  
Old 2021-06-08, 16:10   #8
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·5·172 Posts
Default Intel i3-4170 / HD4400

Mfakto 0.15pre7-x64 was nstalled and configured on Windows 10.

An extensive Mfakto self-test ran with no errors.
Code:
...
######### testcase 34071/34071 (M112404491[91-92]) #########
Starting trial factoring M112404491 from 2^91 to 2^92 (4461450.54 GHz-days)
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Jun 08 00:12 | 1848   0.1% |  1.437    n.a. |      n.a.    81206    0.00%
M112404491 has a factor: 3941616367695054034124905537 (91.670846 bits, 2992945.937358 GHz-d)

found 1 factor for M112404491 from 2^91 to 2^92 [mfakto 0.15pre7-MGW cl_barrett32_92_gs_2]
selftest for M112404491 passed (cl_barrett32_92_gs)!
tf(): total time spent:  1.437s

Selftest statistics                                    
  number of tests           335250
  successful tests          335250

selftest PASSED!
Test worktodo entry: Factor=(aid),116159489,75,76
Without prime95 running, mfakto indicates ~10.24 GHzD/day. With prime95 running on the CPU, the IGP droops to ~8.75 GHzD/day in mfakto. Package power is ~53 W and CPU temp is ~82C with both running. That's about half the power of the i7-4790 that blew out motherboards. Estimated time to complete the single bit level assignment above is 6.4 days without prime95 active or 7.4 days with prime95 running.

With mfakto already running, launching prime95 on M61242641 DC yielded ~9.07 ms/iter. Mfakto throughput dropped to 4.667 GHzD/day. No explanation for the difference from the previous paragraph.
Halting Mfakto, prime95 timings improved very slightly, ~8.95 ms/iter.

Gpuowl has not been attempted on this IGP.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-09-13 at 19:07
kriesel is offline  
Old 2021-09-13, 16:45   #9
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10110100101002 Posts
Default Intel i7-1165G7/Iris Xe

This is an 11th-generation laptop oriented CPU and IGP combination. Testing was performed on a single sample with 16GB ram in a Dell Inspiron 3501.



lsgpu output is:
Code:
lsgpu, derived/modified from https://gist.github.com/CptFoobar/bcb513d87e574e69c2db
1 Platform found.

Platform 0
1 Device: Intel(R) Iris(R) Xe Graphics
  1.1 Vendor: Intel(R) Corporation
  1.2 Type: CL_DEVICE_TYPE_GPU
  1.3 Hardware version: OpenCL 3.0 NEO
  1.4 Software version: 27.20.100.9365
  1.5 OpenCL version: OpenCL C 1.2
  1.6 Little Endian: Yes
  1.7 Max Clock frequency: 1300 MHz
  1.8 Image support available: Yes
  1.9 Parallel compute units: 96
  1.10 OpenCL Device Availability: Yes
  1.11 OpenCL Compiler Availability: Yes
  1.12 OpenCL Linker Availability: Yes
Mfakto:
Complete fail so far on this IGP on Windows. Tried my usual configuration, with mfakto v0.15pre7; then in pre6, usual config, (auto), then Vectorsize=1, type=Auto, Intel, GCN. I suspect it's an issue with OpenCL v3.0 not being recognized or not handled by mfakto.
The IGP driver is 27.20.100.9365 DCH/Win10 64 (per GPU-Z v2.40.0)
Code:
mfakto 0.15pre6-Win (64bit build)


Runtime options
  Inifile                   mfakto.ini
  Verbosity                 1
  SieveOnGPU                yes
  MoreClasses               yes
  GPUSievePrimes            81157
  GPUSieveProcessSize       24Ki bits
  GPUSieveSize              96Mi bits
  FlushInterval             0
  WorkFile                  worktodo.txt
  ResultsFile               results.txt
  Checkpoints               enabled
  CheckpointDelay           300s
  Stages                    enabled
  StopAfterFactor           class
  PrintMode                 full
  V5UserID                  kriesel
  ComputerID                martinella-IrisXeIGP
  TimeStampInResults        yes
  VectorSize                1
WARNING: Unknown setting "Intel" for GPUType, using default (AUTO)
  GPUType                   AUTO
  SmallExp                  no
  UseBinfile                mfakto_Kernels.elf
Compiletime options

Select device - Get device info:
WARNING: Unknown GPU name, assuming GCN. Please post the device name "Intel(R) Iris(R) Xe Graphics (Intel(R) Corporation)" to http://www.mersenneforum.org/showthread.php?t=15646 to have it added to mfakto. Set GPUType in mfakto.ini to select a GPU type yourself to avoid this warning.
WARNING: VectorSize=1 is known to fail on AMD GPUs and drivers. If the selftest fails, please increase VectorSize to 2 at least. See http://devgurus.amd.com/thread/167571 for latest news about this issue.
OpenCL device info
  name                      Intel(R) Iris(R) Xe Graphics (Intel(R) Corporation)
  device (driver) version   OpenCL 3.0 NEO  (27.20.100.9365)
  maximum threads per block 256
  maximum threads per grid  16777216
  number of multiprocessors 96 (6144 compute elements)
  clock rate                1300MHz

Automatic parameters
  threads per grid          0
  optimizing kernels for    GCN

Compiling kernels.
 
    BUILD OUTPUT

Unrecognized build options: -O3
     END OF BUILD OUTPUT
ERROR: load_kernels(0) failed
mfakto -d01 --CLtest output:
Code:
mfakto -d 01 --CLtest
mfakto 0.15pre6-Win (64bit build)


Runtime options
  Inifile                   mfakto.ini
  Verbosity                 1
  SieveOnGPU                yes
  MoreClasses               yes
  GPUSievePrimes            81157
  GPUSieveProcessSize       24Ki bits
  GPUSieveSize              96Mi bits
  FlushInterval             0
  WorkFile                  worktodo.txt
  ResultsFile               results.txt
  Checkpoints               enabled
  CheckpointDelay           300s
  Stages                    enabled
  StopAfterFactor           class
  PrintMode                 full
  V5UserID                  kriesel
  ComputerID                martinella-IrisXeIGP
  TimeStampInResults        yes
  VectorSize                1
  GPUType                   GCN
  SmallExp                  no
  UseBinfile                mfakto_Kernels.elf
OpenCL Platform 1/1: Intel(R) Corporation, Version: OpenCL 3.0
Error: No platform found
Error -32 (Invalid platform): clCreateContextFromType(GPU)
Error -34 (Invalid context): clGetContextInfo(CL_CONTEXT_NUM_DEVICES) - assuming one device
Error -34 (Invalid context): clGetContextInfo(numdevs)
Error: Out of memory.
Error -34 (Invalid context): clGetContextInfo(devices)
mfakto -d11 --CLtest output:
Code:
mfakto -d 11 --CLtest
mfakto 0.15pre6-Win (64bit build)


Runtime options
  Inifile                   mfakto.ini
  Verbosity                 1
  SieveOnGPU                yes
  MoreClasses               yes
  GPUSievePrimes            81157
  GPUSieveProcessSize       24Ki bits
  GPUSieveSize              96Mi bits
  FlushInterval             0
  WorkFile                  worktodo.txt
  ResultsFile               results.txt
  Checkpoints               enabled
  CheckpointDelay           300s
  Stages                    enabled
  StopAfterFactor           class
  PrintMode                 full
  V5UserID                  kriesel
  ComputerID                martinella-IrisXeIGP
  TimeStampInResults        yes
  VectorSize                1
  GPUType                   GCN
  SmallExp                  no
  UseBinfile                mfakto_Kernels.elf
OpenCL Platform 1/1: Intel(R) Corporation, Version: OpenCL 3.0
Device 1/1: Intel(R) Iris(R) Xe Graphics (Intel(R) Corporation),
device version: OpenCL 3.0 NEO , driver version: 27.20.100.9365
Extensions: cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_command_queue_families cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_intel_subgroups_char cl_intel_subgroups_long cl_khr_il_program cl_intel_mem_force_host_memory cl_khr_subgroup_extended_types cl_khr_subgroup_non_uniform_vote cl_khr_subgroup_ballot cl_khr_subgroup_non_uniform_arithmetic cl_khr_subgroup_shuffle cl_khr_subgroup_shuffle_relative cl_khr_subgroup_clustered_reduce cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_intel_unified_shared_memory_preview cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_planar_yuv cl_intel_packed_yuv cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_3d_image_writes cl_intel_media_block_io cl_khr_gl_sharing cl_khr_gl_depth_images cl_khr_gl_event cl_khr_gl_msaa_sharing cl_intel_dx9_media_sharing cl_khr_dx9_media_sharing cl_khr_d3d10_sharing cl_khr_d3d11_sharing cl_intel_d3d11_nv12_media_sharing cl_intel_unified_sharing cl_intel_subgroup_local_block_io cl_intel_simultaneous_sharing
Global memory:6755676160, Global memory cache: 1048576, local memory: 65536, workgroup size: 256, Work dimensions: 3[256, 256, 256, 0, 0] , Max clock speed:1300, compute units:96
Compiling kernels (build options: "-I. -DVECTOR_SIZE=1 -O3 -DMORE_CLASSES -DCL_GPU_SIEVE").
        BUILD OUTPUT

Unrecognized build options: -O3
        END OF BUILD OUTPUT
Error -11 (Build program failure): clBuildProgram
.Error -45 (Invalid program executable): Creating Kernel test_k from program. (clCreateKernel)
.Error -45 (Invalid program executable): Creating Kernel mfakto_cl_71 from program. (clCreateKernel)
.Error -45 (Invalid program executable): Creating Kernel mfakto_cl_63 from program. (clCreateKernel)
.Error -45 (Invalid program executable): Creating Kernel cl_barrett32_79 from program. (clCreateKernel)
.Error -45 (Invalid program executable): Creating Kernel cl_barrett32_77 from program. (clCreateKernel)
.Error -45 (Invalid program executable): Creating Kernel cl_barrett32_76 from program. (clCreateKernel)
.Error -45 (Invalid program executable): Creating Kernel cl_barrett32_92 from program. (clCreateKernel)
.Error -45 (Invalid program executable): Creating Kernel cl_barrett32_88 from program. (clCreateKernel)
.Error -45 (Invalid program executable): Creating Kernel cl_barrett32_87 from program. (clCreateKernel)
.Error -45 (Invalid program executable): Creating Kernel cl_barrett15_73 from program. (clCreateKernel)
.Error -45 (Invalid program executable): Creating Kernel cl_barrett15_69 from program. (clCreateKernel)
.Error -45 (Invalid program executable): Creating Kernel cl_barrett15_70 from program. (clCreateKernel)
.Error -45 (Invalid program executable): Creating Kernel cl_barrett15_71 from program. (clCreateKernel)
.Error -45 (Invalid program executable): Creating Kernel cl_barrett15_88 from program. (clCreateKernel)
.Error -45 (Invalid program executable): Creating Kernel cl_barrett15_83 from program. (clCreateKernel)
.Error -45 (Invalid program executable): Creating Kernel cl_barrett15_82 from program. (clCreateKernel)
.Error -45 (Invalid program executable): Creating Kernel cl_barrett15_74 from program. (clCreateKernel)
.Error -45 (Invalid program executable): Creating Kernel cl_mg62 from program. (clCreateKernel)
.Error -45 (Invalid program executable): Creating Kernel cl_mg88 from program. (clCreateKernel)
Error -48 (Invalid kernel): Setting kernel argument. (hi)
Error -48 (Invalid kernel): Setting kernel argument. (lo)
Error -48 (Invalid kernel): Setting kernel argument. (q)
Error -48 (Invalid kernel): Setting kernel argument. (qr)
Error -48 (Invalid kernel): Setting kernel argument. (RES)
loop 1:
Error -48 (Invalid kernel): Enqueuing kernel(clEnqueueNDRangeKernel)
Gpuowl:
V1.9 with DP transform fails verbosely, raining down 2124 lines of text. The startup, unceremonious exit, and first 10 and last 10 errors are shown following, in 67 lines of output:
Code:
gpuOwL v1.9- GPU Mersenne primality checker
Intel(R) Iris(R) Xe Graphics, 96x1300MHz
OpenCL compilation error -11 (args -I. -cl-fast-relaxed-math -cl-std=CL2.0  -DEXP=77936867u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFP_DP=1 )
In file included from 1:1:
./gpuowl.cl:67:26: warning: unsupported OpenCL extension 'cl_khr_fp64' - ignoring
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
                         ^
./gpuowl.cl:68:9: error: use of type 'double' requires cl_khr_fp64 extension to be enabled
typedef double T;
        ^
./gpuowl.cl:69:9: error: unknown type name 'double2'; did you mean 'double'?
typedef double2 T2;
        ^~~~~~~
        double
./gpuowl.cl:69:9: error: use of type 'double' requires cl_khr_fp64 extension to be enabled
./gpuowl.cl:84:7: error: use of type 'T' (aka 'double') requires cl_khr_fp64 extension to be enabled
T2 U2(T a, T b) { return (T2)(a, b); }
      ^
./gpuowl.cl:84:12: error: use of type 'T' (aka 'double') requires cl_khr_fp64 extension to be enabled
T2 U2(T a, T b) { return (T2)(a, b); }
           ^
./gpuowl.cl:84:1: error: use of type 'T2' (aka 'double') requires cl_khr_fp64 extension to be enabled
T2 U2(T a, T b) { return (T2)(a, b); }
^
./gpuowl.cl:84:27: error: use of type 'T2' (aka 'double') requires cl_khr_fp64 extension to be enabled
T2 U2(T a, T b) { return (T2)(a, b); }
                          ^
./gpuowl.cl:117:7: error: use of type 'T' (aka 'double') requires cl_khr_fp64 extension to be enabled
T neg(T x) { return -x; }
      ^
./gpuowl.cl:117:1: error: use of type 'T' (aka 'double') requires cl_khr_fp64 extension to be enabled
T neg(T x) { return -x; }
^
...
./gpuowl.cl:789:46: error: use of type 'T2' (aka 'double') requires cl_khr_fp64 extension to be enabled
KERNEL(256) tail(P(T2) io, Trig smallTrig, P(T2) bigTrig) {
                                             ^
./gpuowl.cl:790:9: error: use of type 'T' (aka 'double') requires cl_khr_fp64 extension to be enabled
  local T lds[HEIGHT];
        ^
./gpuowl.cl:791:3: error: use of type 'T2' (aka 'double') requires cl_khr_fp64 extension to be enabled
  T2 u[N_HEIGHT];
  ^
./gpuowl.cl:792:3: error: use of type 'T2' (aka 'double') requires cl_khr_fp64 extension to be enabled
  T2 v[N_HEIGHT];
  ^
./gpuowl.cl:796:27: error: use of type 'T2' (aka 'double') requires cl_khr_fp64 extension to be enabled
KERNEL(256) transposeW(CP(T2) in, P(T2) out, Trig bigTrig) {
                          ^
./gpuowl.cl:796:37: error: use of type 'T2' (aka 'double') requires cl_khr_fp64 extension to be enabled
KERNEL(256) transposeW(CP(T2) in, P(T2) out, Trig bigTrig) {
                                    ^
./gpuowl.cl:797:9: error: use of type 'T' (aka 'double') requires cl_khr_fp64 extension to be enabled
  local T lds[4096];
        ^
./gpuowl.cl:801:27: error: use of type 'T2' (aka 'double') requires cl_khr_fp64 extension to be enabled
KERNEL(256) transposeH(CP(T2) in, P(T2) out, Trig bigTrig) {
                          ^
./gpuowl.cl:801:37: error: use of type 'T2' (aka 'double') requires cl_khr_fp64 extension to be enabled
KERNEL(256) transposeH(CP(T2) in, P(T2) out, Trig bigTrig) {
                                    ^
./gpuowl.cl:802:9: error: use of type 'T' (aka 'double') requires cl_khr_fp64 extension to be enabled
  local T lds[4096];
        ^


Bye
A retry with -fft M61 appears to work, so it may be usable as PRP DC or PRP DC of LL first test, although it lacks proof generation so would require two PRP tests. Run time estimates are long:
Code:
\gpuowl-v1.9-74f1a38>gpuowl -device 0 -user kriesel -cpu martinella-IrisXeIGP -fft M61
gpuOwL v1.9- GPU Mersenne primality checker
Intel(R) Iris(R) Xe Graphics, 96x1300MHz

OpenCL compilation in 45840 ms, with "-I. -cl-fast-relaxed-math -cl-std=CL2.0  -DEXP=77936867u -DWIDTH=1024u -DHEIGHT=2048u -DLOG_NWORDS=22u -DFGT_61=1 -DLOG_ROOT2=49u "
Note: using long carry kernels
PRP-3: FFT 4M (1024 * 2048 * 2) of 77936867 (18.58 bits/word) [2021-09-13 12:35:42 Central Daylight Time]
Starting at iteration 0
OK        0 / 77936867 [ 0.00%], 0.00 ms/it; ETA 0d 00:00; 0000000000000003 [12:36:04]
OK     1000 / 77936867 [ 0.00%], 40.50 ms/it; ETA 36d 12:52; 9711fce020e74461 [12:37:07]
OK     5000 / 77936867 [ 0.01%], 40.58 ms/it; ETA 36d 14:22; 31d8d3401e6fe48d [12:40:11]
OK    10000 / 77936867 [ 0.01%], 39.37 ms/it; ETA 35d 12:13; fc4f135f7cf4ad29 [12:43:49]
OK    20000 / 77936867 [ 0.03%], 38.88 ms/it; ETA 35d 01:36; 3cd1bd9d5e09cbc5 [12:50:40]
OK    40000 / 77936867 [ 0.05%], 38.25 ms/it; ETA 34d 11:34; dffe1b1b0d748128 [13:03:46]
OK    60000 / 77936867 [ 0.08%], 38.51 ms/it; ETA 34d 17:01; 0945da4dc08bdd95 [13:16:56]
Averaging the last two lines, 38.38 ms/it corresponds to ~7.03 GHD/day alongside prime95. There's also some severe associated reduction in prime95 throughput, to ~31%. The prime95-only benchmark for 4200K fft length required for that same exponent was 7 ms/iter, 142.84 iter/sec, as 4 cores 1 worker, corresponding to 38.51 GHD/day. So that would be a net loss of 38.51*0.69 -7.03 = 19.54 GHzD/day, 51% net loss.

Last fiddled with by kriesel on 2021-09-13 at 19:10
kriesel is offline  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Intel integrated GPU Prime95 GPU Computing 90 2020-12-29 23:02
Ivy Bridge integrated GPU? Dubslow GPU Computing 7 2011-11-18 23:36
Can I use integrated graphics alongside a GPU? mdettweiler GPU Computing 9 2010-09-15 19:41
Uninstall GIMPS Software? BillMMar Information & Answers 6 2010-05-02 22:23
GIMPS software for Sony PS/2 Linux? delta_t Software 5 2002-12-06 17:36

All times are UTC. The time now is 12:33.


Tue Oct 19 12:33:50 UTC 2021 up 88 days, 7:02, 0 users, load averages: 1.07, 1.14, 1.23

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.