mersenneforum.org Generalized Cullen/Woodall Sieving Software
 Register FAQ Search Today's Posts Mark Forums Read

 2014-11-05, 00:38 #1 rogue     "Mark" Apr 2003 Between here and the 186616 Posts Generalized Cullen/Woodall Sieving Software I hope that in the next two weeks I can get back to the sieve. It's been sitting for a few weeks because the kernel has a bug from the last change I made and I've been busy. I'll be traveling so when I'm gone I'll have fewer distractions in the evening which should give me an opportunity to work on it. My personal goal is to search all bases < 10000 up to n = 10000. If I can get the sieve working on OpenCL, then that would go a good way to accomplishing that as the new sieve will support multiple bases whereas gcwsieve only supports one at a time and MultiSieve is too slow (IMO) to what the GPU can do.
2014-11-07, 18:29   #2
rogue

"Mark"
Apr 2003
Between here and the

2×32×347 Posts

Quote:
 Originally Posted by rogue I hope that in the next two weeks I can get back to the sieve. It's been sitting for a few weeks because the kernel has a bug from the last change I made and I've been busy. I'll be traveling so when I'm gone I'll have fewer distractions in the evening which should give me an opportunity to work on it.
Unfortunately the laptop I will be taking has some issue that crashes my application on Windows. I took a Windows patch during summer and it hasn't worked since. The problem is either in OpenCL itself or in Windows.

 2014-11-11, 18:51 #3 rogue     "Mark" Apr 2003 Between here and the 2·32·347 Posts In testing the code, sometimes it returns good results, sometimes it doesn't. It doesn't return invalid factors when that happens, but it misses factors. I'm at a loss regarding the behavior. One would expect an uninitialized variable, but that doesn't appear to be the problem. I suspect something is writing beyond a memory boundary, but I can't see that either. If anyone wants to take a look at the kernel, maybe you will see what I am doing wrong. Note that when this does eventually work it will be many times faster than MultiSieve and gcwsiev-smallp for p < n. I haven't run it enough for higher p, but based upon some extrapolations it was 5x faster on my MacBook Pro. Last fiddled with by rogue on 2020-09-24 at 19:47
 2014-11-26, 20:15 #4 rogue     "Mark" Apr 2003 Between here and the 2×32×347 Posts The problem is with the code to compute the inverse. The computed inverse is wrong. It appears to have something to do with using unsigned inputs, but when I change them to signed, the OpenCL compiler complains at compile time. Note that I used code posted in another thread on this forum. That code works fine in C. It just doesn't work fine in OpenCL C. I've a slower version that appears to be returning the correct results. Last fiddled with by rogue on 2014-11-26 at 20:21
 2014-11-26, 20:47 #5 rogue     "Mark" Apr 2003 Between here and the 2×32×347 Posts Now that the computation of the modular inverse is correct, I can release the code as a beta. The factors that are found all appear to be valid. I have a hacked version of pfgw that can take the .log file and verify the factors in it. I need to release that soon. Here is an example of what you might see when it runs, in this case on an Intel HD Graphics 4000. Note that I fixed that first string of output. I just didn't fix it in what I attached. Code: gcwsievecl64 -v -C -W -b200 -B201 -n2 -t2 -N1e5 -P1e6 gcwsievecl v1.0.1, a GPU program to find factors numbers of the form k*b^n+c where k, b, and n are fixed Quick elimination of terms info (in order of check): 99998 because the term is even 63411 because the term is divisible by a prime < 100 Platform 0 is an Intel(R) Corporation Intel(R) OpenCL, version OpenCL 1.1 Device 0 is an Intel(R) Corporation Intel(R) HD Graphics 4000 workGroupSize = 51200 = 200 * 16 * 16 (blocks * workGroupSizeMultiple * deviceComputeUnits) Running with 2 threads Allocated memory (prior to sieving): 315 MB in CPU, 82 MB in GPU Sieve started: (cmdline) 0 <= p < 1000000 with 36589 terms Sieve complete: 3 <= p < 1000000 78498 primes tested Clock time: 34.81 seconds at 2255 p/sec. Factors found: 26031 Processor time: 39.89 sec. (5.46 init + 34.43 sieve). Seconds spent in CPU and GPU: 13.41 (cpu), 35.72 (gpu) Percent of time spent in CPU vs. GPU: 27.29 (cpu), 72.71 (gpu) CPU/GPU utilization: 1.15 (cores), 1.00 (devices) Started with 36589 terms and sieved to 1000000. 10558 remaining terms written to gcw_201.pfgw Unlike gcwsieve, this program doesn't have restrictions on p (min p > max n). In fact, I could modify gcwsieve to remove that restriction as there is a little trick I used in this code to eliminate it, but if this program works correctly, I don't think that will be necessary. I will be very curious to know how well this works and how fast (or slow) it is compared to gcwsieve on your systems. Last fiddled with by rogue on 2020-09-24 at 19:47
 2014-12-02, 16:06 #6 rogue     "Mark" Apr 2003 Between here and the 141468 Posts I've fixed various issues: Code:  Fix factor rate calculation. Fix writing ABC file as the wrong line was generated. Fix reading ABC file since it always failed. Various other code cleanup issues. Last fiddled with by rogue on 2020-09-24 at 19:47
 2014-12-14, 19:09 #7 rogue     "Mark" Apr 2003 Between here and the 2·32·347 Posts The reason it doesn't work on my Mac Pro appears to be due to a broken driver for AMD GPUs.
 2014-12-22, 03:29 #8 TheCount     Sep 2013 Perth, Au. 2·72 Posts I tried gcwsievecl_1.0.2 without success. GPU crashes when it starts sieving: Code: >gcwsievecl64.exe -v -C -W -b200 -B201 -n2 -t2 -N1e5 -P1e6 gcwsievecl v1.0.2, a GPU program to find factors of Cullen and Woodall numbers (n*b^n+c where b and n are fixed) Quick elimination of terms info (in order of check): 99998 because the term is even 63411 because the term is divisible by a prime < 100 Platform 0 is an Advanced Micro Devices, Inc. AMD Accelerated Parallel Processing, version OpenCL 1.2 AMD-APP (1573.4) Device 0 is an Advanced Micro Devices, Inc. Tahiti workGroupSize = 409600 = 200 * 64 * 32 (blocks * workGroupSizeMultiple * deviceComputeUnits) Running with 2 threads Allocated memory (prior to sieving): 268 MB in CPU, 93 MB in GPU Sieve started: (cmdline) 0 <= p < 1000000 with 36589 terms I tried one thread and lower blocks but same issue. Running Catalyst 14.9 with a 7970 AMD GPU. Bug: Woodall formula in -h help incorrect. Bug: -t1 says "Running with 1 threads", should be "Running with 1 thread". Was able to compile with Visual Studio 2012 no problem. Keen to use this app if it can be made to work.
 2014-12-22, 14:03 #9 TheCount     Sep 2013 Perth, Au. 2·72 Posts Tried new Catalyst 14.12 without success. Did this app ever work with any version of Catalyst?
 2014-12-22, 14:17 #10 rogue     "Mark" Apr 2003 Between here and the 2×32×347 Posts Tahiti is probably slower than the CPU. I have another computer at home with a Tahiti, so I'll try running it on that.
2014-12-23, 02:42   #11
1998golfer

Dec 2014

12 Posts

If I did this right, then it looks fine on windows 7 x64 with nvidia gtx 760.

Code:
D:\...\testing>gcwsievecl64 -v -C -W -b200 -B201 -n2 -t2 -N1e5 -P1e6
gcwsievecl v1.0.2, a GPU program to find factors of Cullen and Woodall numbers (n*b^n+c where b and n are fixed)
Quick elimination of terms info (in order of check):
99998 because the term is even
63411 because the term is divisible by a prime < 100
Platform 0 is a NVIDIA Corporation NVIDIA CUDA, version OpenCL 1.1 CUDA 6.5.12
Device 0 is a NVIDIA Corporation GeForce GTX 760
workGroupSize = 38400 = 200 * 32 * 6 (blocks * workGroupSizeMultiple * deviceComputeUnits)
Allocated memory (prior to sieving):  25 MB in CPU, 9 MB in GPU
Sieve started: (cmdline) 0 <= p < 1000000 with 36589 terms

Sieve complete: 3 <= p < 1000000  116898 primes tested
Clock time: 2.02 seconds at 57810 p/sec.  Factors found: 26022
Processor time: 1.28 sec. (0.27 init + 1.01 sieve).
Seconds spent in CPU and GPU: 2.76 (cpu), 1.76 (gpu)
Percent of time spent in CPU vs. GPU: 61.10 (cpu), 38.90 (gpu)
CPU/GPU utilization: 0.63 (cores), 0.87 (devices)
Started with 36589 terms and sieved to 1000000.  10567 remaining terms written to gcw_201.pfgw
Attached is a zip containing gcwsieve.log and gcw_201.pfgw

Let me know if I did this right, or if any other tests I do can be of any help...

Thanks,
-1998golfer
Attached Files
 gcwsieve_results_win7x64_gtx760.zip (437.0 KB, 101 views)

 Similar Threads Thread Thread Starter Forum Replies Last Post rogue And now for something completely different 40 2020-12-08 13:21 em99010pepe Factoring 9 2019-03-26 08:35 Citrix And now for something completely different 1 2017-10-26 09:12 jasong jasong 9 2008-01-25 01:51 jasong Marin's Mersenne-aries 1 2007-11-18 23:17

All times are UTC. The time now is 10:31.

Fri Mar 5 10:31:48 UTC 2021 up 92 days, 6:43, 0 users, load averages: 1.65, 1.79, 1.88