![]() |
![]() |
#716 |
Dec 2011
After 1.58M nines:)
52·67 Posts |
![]()
Did my testing show to you where problem is, my cpu is still faster then RTX 2060!
Any other idea? Different setup? |
![]() |
![]() |
![]() |
#717 | |
"Mark"
Apr 2003
Between here and the
161008 Posts |
![]() Quote:
Unlike some of the other sieves using the framework, sr2sievecl uses a lot more GPU memory as each thread has to maintain its own set of tables in memory. When it comes to discrete logs, you can use less memory, but then the computation time for each p varies significantly. The discrete log used by srsieve2cl uses a method that "flattens the curve" for the calculation regardless of p, but requires more memory. It might be possible to modify the algorithm to use less memory in the GPU, but that could lead to other issues. One of the worst things with the current algorithm is that there are many conditionals and the remaining loops can't really be unrolled. It would likely require a completely different algorithm to get more speed out of it. |
|
![]() |
![]() |
![]() |
#718 | |
Dec 2011
After 1.58M nines:)
167510 Posts |
![]() Quote:
I agree with all that but what is purpose of the opencl sieve. RTX 2060 is not most powerfull card, but it is not bad at all. I expect huge difference in speed, . And as ryanp says " I don't know how else to explain an A100 being the same speed as a regular 64 core machine." A100 is beast GPU card... |
|
![]() |
![]() |
![]() |
#719 | |
"Mark"
Apr 2003
Between here and the
161008 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#720 | |
Jun 2012
Boulder, CO
23×3×19 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#721 |
Sep 2011
Germany
67448 Posts |
![]()
It looks like that srsieve2.cl cannot run on my HD7950:
Code:
Sieving with generic logic for p >= 1000000000 Split 27683 base 486 sequences into 27683 base 486^1 sequences. OpenCL Error: Program build failure in call to clBuildProgram "C:\Users\user\AppData\Local\Temp\OCL2224T5.cl", line 165: warning: state ment is unreachable resBM64 = mmmPowmod(resBM64, BABY_STEPS, thePrime, _q, _one); ^ "C:\Users\user\AppData\Local\Temp\OCL2224T5.cl", line 238: warning: statement is unreachable return 0; ^ Error:E013:Insufficient Private Resources! Anyone has a testline? I think the file is too large to handle. It was running on my old win7 PC, the card has 3GB VRAM |
![]() |
![]() |
![]() |
#722 | |
"Mark"
Apr 2003
Between here and the
26·113 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#723 | |
"Mark"
Apr 2003
Between here and the
26·113 Posts |
![]() Quote:
There are many factors that impact the speed of the GPU. The GPU is great for parallelizing tasks. How much of a speed bump you actually get is dependent upon the size of each task and how much memory each task needs. The discrete log algorithm is much larger than typical GPU tasks and requires a lot more memory than typical GPU tasks. The driver is also a factor as it is responsible for compiling the OpenCL C (in the kernel) to machine code. Some drivers are better than others at the task. Based upon my testing across various computers with GPUs, -G1 is anywhere from 5x to 10x faster than -W1. So if you only have 5x faster with -G1 than -W1 but want to use -W8, then the CPU will clearly be faster, but if you use -G1 -W5, that will be twice as fast as -W5 alone. |
|
![]() |
![]() |
![]() |
#724 | |
Sep 2011
Germany
DE416 Posts |
![]() Quote:
The file was too large I think, tested a sievefile with 2k's in it and it was running. The question is how many k's per sievefile can handle srsieve2cl? |
|
![]() |
![]() |
![]() |
#725 | |
Jun 2012
Boulder, CO
45610 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#726 | |
"Mark"
Apr 2003
Between here and the
723210 Posts |
![]() Quote:
I can only suggest -W36 -G1 as -W36 alone will not use the GPU. It will require trial and error to determine if you need to use a higher value for -G or the default for -g. |
|
![]() |
![]() |