![]() |
![]() |
#1 |
"Mark"
Apr 2003
Between here and the
22·7·223 Posts |
![]()
I hope that in the next two weeks I can get back to the sieve. It's been sitting for a few weeks because the kernel has a bug from the last change I made and I've been busy. I'll be traveling so when I'm gone I'll have fewer distractions in the evening which should give me an opportunity to work on it.
My personal goal is to search all bases < 10000 up to n = 10000. If I can get the sieve working on OpenCL, then that would go a good way to accomplishing that as the new sieve will support multiple bases whereas gcwsieve only supports one at a time and MultiSieve is too slow (IMO) to what the GPU can do. |
![]() |
![]() |
![]() |
#2 | |
"Mark"
Apr 2003
Between here and the
141448 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#3 |
"Mark"
Apr 2003
Between here and the
22·7·223 Posts |
![]()
In testing the code, sometimes it returns good results, sometimes it doesn't. It doesn't return invalid factors when that happens, but it misses factors. I'm at a loss regarding the behavior. One would expect an uninitialized variable, but that doesn't appear to be the problem. I suspect something is writing beyond a memory boundary, but I can't see that either. If anyone wants to take a look at the kernel, maybe you will see what I am doing wrong.
Note that when this does eventually work it will be many times faster than MultiSieve and gcwsiev-smallp for p < n. I haven't run it enough for higher p, but based upon some extrapolations it was 5x faster on my MacBook Pro. Last fiddled with by rogue on 2020-09-24 at 19:47 |
![]() |
![]() |
![]() |
#4 |
"Mark"
Apr 2003
Between here and the
22×7×223 Posts |
![]()
The problem is with the code to compute the inverse. The computed inverse is wrong. It appears to have something to do with using unsigned inputs, but when I change them to signed, the OpenCL compiler complains at compile time. Note that I used code posted in another thread on this forum. That code works fine in C. It just doesn't work fine in OpenCL C. I've a slower version that appears to be returning the correct results.
Last fiddled with by rogue on 2014-11-26 at 20:21 |
![]() |
![]() |
![]() |
#5 |
"Mark"
Apr 2003
Between here and the
22×7×223 Posts |
![]()
Now that the computation of the modular inverse is correct, I can release the code as a beta. The factors that are found all appear to be valid. I have a hacked version of pfgw that can take the .log file and verify the factors in it. I need to release that soon.
Here is an example of what you might see when it runs, in this case on an Intel HD Graphics 4000. Note that I fixed that first string of output. I just didn't fix it in what I attached. Code:
gcwsievecl64 -v -C -W -b200 -B201 -n2 -t2 -N1e5 -P1e6 gcwsievecl v1.0.1, a GPU program to find factors numbers of the form k*b^n+c where k, b, and n are fixed Quick elimination of terms info (in order of check): 99998 because the term is even 63411 because the term is divisible by a prime < 100 Platform 0 is an Intel(R) Corporation Intel(R) OpenCL, version OpenCL 1.1 Device 0 is an Intel(R) Corporation Intel(R) HD Graphics 4000 workGroupSize = 51200 = 200 * 16 * 16 (blocks * workGroupSizeMultiple * deviceComputeUnits) Running with 2 threads Allocated memory (prior to sieving): 315 MB in CPU, 82 MB in GPU Sieve started: (cmdline) 0 <= p < 1000000 with 36589 terms Sieve complete: 3 <= p < 1000000 78498 primes tested Clock time: 34.81 seconds at 2255 p/sec. Factors found: 26031 Processor time: 39.89 sec. (5.46 init + 34.43 sieve). Seconds spent in CPU and GPU: 13.41 (cpu), 35.72 (gpu) Percent of time spent in CPU vs. GPU: 27.29 (cpu), 72.71 (gpu) CPU/GPU utilization: 1.15 (cores), 1.00 (devices) Started with 36589 terms and sieved to 1000000. 10558 remaining terms written to gcw_201.pfgw I will be very curious to know how well this works and how fast (or slow) it is compared to gcwsieve on your systems. Last fiddled with by rogue on 2020-09-24 at 19:47 |
![]() |
![]() |
![]() |
#6 |
"Mark"
Apr 2003
Between here and the
22·7·223 Posts |
![]()
I've fixed various issues:
Code:
Fix factor rate calculation. Fix writing ABC file as the wrong line was generated. Fix reading ABC file since it always failed. Various other code cleanup issues. Last fiddled with by rogue on 2020-09-24 at 19:47 |
![]() |
![]() |
![]() |
#7 |
"Mark"
Apr 2003
Between here and the
22·7·223 Posts |
![]()
The reason it doesn't work on my Mac Pro appears to be due to a broken driver for AMD GPUs.
|
![]() |
![]() |
![]() |
#8 |
Sep 2013
Perth, Au.
11000102 Posts |
![]()
I tried gcwsievecl_1.0.2 without success.
GPU crashes when it starts sieving: Code:
>gcwsievecl64.exe -v -C -W -b200 -B201 -n2 -t2 -N1e5 -P1e6 gcwsievecl v1.0.2, a GPU program to find factors of Cullen and Woodall numbers (n*b^n+c where b and n are fixed) Quick elimination of terms info (in order of check): 99998 because the term is even 63411 because the term is divisible by a prime < 100 Platform 0 is an Advanced Micro Devices, Inc. AMD Accelerated Parallel Processing, version OpenCL 1.2 AMD-APP (1573.4) Device 0 is an Advanced Micro Devices, Inc. Tahiti workGroupSize = 409600 = 200 * 64 * 32 (blocks * workGroupSizeMultiple * deviceComputeUnits) Running with 2 threads Allocated memory (prior to sieving): 268 MB in CPU, 93 MB in GPU Sieve started: (cmdline) 0 <= p < 1000000 with 36589 terms Running Catalyst 14.9 with a 7970 AMD GPU. Bug: Woodall formula in -h help incorrect. Bug: -t1 says "Running with 1 threads", should be "Running with 1 thread". Was able to compile with Visual Studio 2012 no problem. Keen to use this app if it can be made to work. |
![]() |
![]() |
![]() |
#9 |
Sep 2013
Perth, Au.
2·72 Posts |
![]()
Tried new Catalyst 14.12 without success. Did this app ever work with any version of Catalyst?
|
![]() |
![]() |
![]() |
#10 |
"Mark"
Apr 2003
Between here and the
22·7·223 Posts |
![]()
Tahiti is probably slower than the CPU. I have another computer at home with a Tahiti, so I'll try running it on that.
|
![]() |
![]() |
![]() |
#11 |
Dec 2014
1 Posts |
![]()
If I did this right, then it looks fine on windows 7 x64 with nvidia gtx 760.
Code:
D:\...\testing>gcwsievecl64 -v -C -W -b200 -B201 -n2 -t2 -N1e5 -P1e6 gcwsievecl v1.0.2, a GPU program to find factors of Cullen and Woodall numbers (n*b^n+c where b and n are fixed) Quick elimination of terms info (in order of check): 99998 because the term is even 63411 because the term is divisible by a prime < 100 Platform 0 is a NVIDIA Corporation NVIDIA CUDA, version OpenCL 1.1 CUDA 6.5.12 Device 0 is a NVIDIA Corporation GeForce GTX 760 workGroupSize = 38400 = 200 * 32 * 6 (blocks * workGroupSizeMultiple * deviceComputeUnits) Running with 2 threads Allocated memory (prior to sieving): 25 MB in CPU, 9 MB in GPU Sieve started: (cmdline) 0 <= p < 1000000 with 36589 terms Sieve complete: 3 <= p < 1000000 116898 primes tested Clock time: 2.02 seconds at 57810 p/sec. Factors found: 26022 Processor time: 1.28 sec. (0.27 init + 1.01 sieve). Seconds spent in CPU and GPU: 2.76 (cpu), 1.76 (gpu) Percent of time spent in CPU vs. GPU: 61.10 (cpu), 38.90 (gpu) CPU/GPU utilization: 0.63 (cores), 0.87 (devices) Started with 36589 terms and sieved to 1000000. 10567 remaining terms written to gcw_201.pfgw Let me know if I did this right, or if any other tests I do can be of any help... Thanks, -1998golfer |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Generalized Cullen and Woodall Searches | rogue | And now for something completely different | 40 | 2020-12-08 13:21 |
Generalized Cullen and Woodall numbers | em99010pepe | Factoring | 9 | 2019-03-26 08:35 |
Super Cullen & Woodall primes | Citrix | And now for something completely different | 1 | 2017-10-26 09:12 |
Cullen and Woodall altering on Prime Pages | jasong | jasong | 9 | 2008-01-25 01:51 |
Can we add Cullen and Woodall p-1ing here? | jasong | Marin's Mersenne-aries | 1 | 2007-11-18 23:17 |