mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   mtsieve enhancements (https://www.mersenneforum.org/showthread.php?t=25486)

henryzz 2021-01-07 22:21

What is holding srsieve2 back from reaching sr2sieve's speed? How come the sieving % was only 23%? How does srsieve2cl compare when running on the cpu? I assume sr1sieve is still much faster than srsieve2 for one sequence.

rogue 2021-01-08 01:21

[QUOTE=henryzz;568685]What is holding srsieve2 back from reaching sr2sieve's speed? How come the sieving % was only 23%? How does srsieve2cl compare when running on the cpu? I assume sr1sieve is still much faster than srsieve2 for one sequence.[/QUOTE]

There are significant differences between sr2sieve and srsieve. srsieve2 only handles what srsieve can handle, which is fairly general. I will have to write and test a lot of code before srsieve2 can do what sr2sieve does. I work on it in bits and pieces, when I am motivated, but sr2sieve is a mess from the coding perspective. Until today I didn't think I could write code to compete (speed wise) with sr2sieve and I didn't just want to "copy and paste" its code into srsieve2 because it isn't quite that easy. I was given an ask to support p > 2^52 on sr1sieve for non-x86 CPUs. I know how to do that so I think I also have a change of pulling sr2sieve into srsieve2 and getting better performance without all of myriad code paths that sr2sieve supports.

LaurV 2021-01-08 04:35

To clarify, srsieve is "general" tool, srXsieve are more "specific" tools. Of course the last are faster, but they can't handle all the cases the former can.

But what you (Mark) could do from my "lazy man" perspective here, is to call sr1sieve or sr2sieve directly from srsieve and srsieve2, if the conditions suffice (i.e. only one or few k's, sufficient high n, etc). Also, consider pre-calling srfile, if the format does not match (i need to use it sometimes, as sr2sieve does only support a limited number of formats for the input file).

rogue 2021-01-08 13:04

[QUOTE=LaurV;568710]To clarify, srsieve is "general" tool, srXsieve are more "specific" tools. Of course the last are faster, but they can't handle all the cases the former can.

But what you (Mark) could do from my "lazy man" perspective here, is to call sr1sieve or sr2sieve directly from srsieve and srsieve2, if the conditions suffice (i.e. only one or few k's, sufficient high n, etc). Also, consider pre-calling srfile, if the format does not match (i need to use it sometimes, as sr2sieve does only support a limited number of formats for the input file).[/QUOTE]

One could write a simple script to do as you suggest.

My long term goal is that srsieve2(cl) as "one-stop shopping". It will start with the generic case then switch to the specialized cases (sr2sieve logic, sr1sieve logc, opencl logic) when the correct conditions are met. This way you don't need to run multiple programs to get the results you currently need to run before PRP testing. The framework supports this. I just have to code to sr1sieve/sr2sieve logic, but that is a lot easier said than done. They rely on global variables and the code is really difficult to navigate. I just need enough motivation to complete the task. The main motivator will be believing that I can make srsieve2 faster than the sr1sieve/sr2sieve as that will be the motivator for people to switch. The other half is finding enough time to dedicate to the task rather than "an hour here" and "an hour there". It takes larger chunks of my time to do the work and those are hard to come by.

KEP 2021-01-09 10:46

[QUOTE=rogue;568733]The other half is finding enough time to dedicate to the task rather than "an hour here" and "an hour there". It takes larger chunks of my time to do the work and those are hard to come by.[/QUOTE]

Just take your time. What you did with srsieve2, in practical almost doubled our sievespeed compared to srsieve - that has been a huge motivator on the 1000+ k conjectures. Also the multithread function that srsieve2 supports and handles well, was a great motivator for most Windows users to switch to your excellant program.

As Ian said, he would rather do a slower and correct job than a fast and faulty job. It was not his exact words, but the point is, that you have to do as you do now, in stead of rushing and risk throwing out a bad program that we risk believing for ever is doing a great job. Primegrid can confirm the price it cost, when a sieve program misses factors and a lot of work has to be redone :smile: Point being, keep doing a great job, even if it means the rest of us has to wait a little longer for the end product :smile:

Take care, stay safe and stay along for a long time, WE really need you and your skills :smile:

rogue 2021-01-19 18:30

I have finally taken the time and have a build of srsieve2 with sr1sieve logic. There are a couple of TODOs left before I can start testing. There was a lot of refactoring of the srsieve2 to support what sr1sieve does. This means that I have to retest the "generic" code paths. I don't expect that to yield any problems that I have introduced, but one never knows.

sr1sieve and sr2sieve have a lot of conditionally compiled code which makes figuring out the "normal" case hard. The task was made harder by the fact that sr1sieve and sr2sieve have a lot of global variables which do not have meaningful names so understanding this usage is frustratingly hard since I go to great lengths to avoid global variables in the framework. There is also "vectorized" code for mulmods which I'm not supporting at this time because I'm using Montgomery mulmod logic which supports larger p than sr1sieve.

I do not expect it to be as fast as sr1sieve. I'm expecting about 10% slower out of the box, but I won't know until I get to the point and I'm sure that there are many bugs in the code that have to be squashed before I even get that far.

When I get this working, the next step is to support the sr1sieve logic in OpenCL. There is one four-dimensional array in sr1sieve that has to be flattened to three dimensions before I can use it in the GPU. That shouldn't be too hard, but I also have to be cognizant of memory usage in the GPU, but I'm feeling confident that it won't be an issue. Based upon what I have seen with the generic sieving logic in OpenCL, I would expect a 5x or 6x speed bump, but I won't make any promises for that.

The step following that is sr2sieve logic. I have the pieces in place (due to the refactoring) to add support for that. It should go a lot faster than the sr1sieve logic, but I'm uncertain if about putting that in the GPU without spending more time in the sr2sieve code.

Happy5214 2021-01-19 23:08

[QUOTE=rogue;569686]I do not expect it to be as fast as sr1sieve. I'm expecting about 10% slower out of the box, but I won't know until I get to the point and I'm sure that there are many bugs in the code that have to be squashed before I even get that far.[/QUOTE]

Do you know what's keeping it from reaching the speed of sr1sieve?

rogue 2021-01-19 23:18

[QUOTE=Happy5214;569693]Do you know what's keeping it from reaching the speed of sr1sieve?[/QUOTE]

I don't know what speed it will get. I haven't gotten that far in testing. I have made some changes that will either help the speed or hurt the speed and those changes might help for some sequences, but hurt for others. I won't know for a while yet. If I do get within 10% with the current code, I believe that I can get it as fast or faster than sr1sieve (in the CPU), but that will take some experimentation.

The 10% is based upon the Montgomery mulmod that sr1sieve uses for the non-x86 code path. I am using Montgomery mulmod for srsieve2 because it requires no asm code.

Note that this means that I have the tools I need to port many of the sievers built upon the framework to ARM.

ET_ 2021-01-20 12:13

[QUOTE=rogue;569695]

Note that this means that I have the tools I need to port many of the sievers built upon the framework to ARM.[/QUOTE]

:bow: :bow: :bow:

pepi37 2021-01-21 22:00

Rogue,excuse me for stupid question, but I must ask.
I enjoy using yours tools from mtsieve especially twinsieve (using it most of times).
I also read about your efforts for srsieve2.
Question: if you can make twinsieve fast as it is now, can you build sieve that will replace sr1sieve with same speed. I dont know math, and doesnot know will and in what percentage speed will be changes if sieve search for fixed n or for fixed k.
If answer is in srsieve2, then forget what I ask, I will wait :)

rogue 2021-01-21 22:46

[QUOTE=pepi37;569794]Rogue,excuse me for stupid question, but I must ask.
I enjoy using yours tools from mtsieve especially twinsieve (using it most of times).
I also read about your efforts for srsieve2.
Question: if you can make twinsieve fast as it is now, can you build sieve that will replace sr1sieve with same speed. I dont know math, and doesnot know will and in what percentage speed will be changes if sieve search for fixed n or for fixed k.
If answer is in srsieve2, then forget what I ask, I will wait :)[/QUOTE]

twinsieve is "fixed n and variable k" and the srsieve programs are "fixed k and variable n". They require completely different logic.

In theory if one has a very small range of n, then twinsieve could be modified to handle multiple n and could be faster than sr1sieve. I don't know how small the range of n would need to be, but I'm guessing less than 1000 between n_min and n_max.

rogue 2021-02-01 18:59

I have committed my first cut at the GPU code for the CisOne logic (sr1sieve). Preliminary testing shows that it is missing some factors. On the plus side it isn't reporting any invalid factors, so the mistake might be trivial. Also on the plus side is that it appears to be about 4x faster than the CPU code, which means that it is about 3x faster than sr1sieve (on the laptop I am testing on). This could change when the bug causing the missing factors is found, but I do not know if it will make it faster or slower than it is now.

To support this I had to do some refactoring on the CisOne log to replace a multi-dimensional array with a single-dimensional array which is used by both the CPU and GPU logic. If you are brave enough to look at the sources you will see how similar the CPU and GPU code are. This should make it easier for me to track down the problem. If anyone is willing to take a look at the code in an effort to see where I have gone amiss, I would appreciate it. I just don't know how long it will take to track down and squash the bug on my own.

rogue 2021-02-17 17:17

I have committed some changes to gfndsieve/gfndsievecl to support ppsieve type sieving. Although I can only compare against an old version of OpenCL enabled ppsieve (version cl-0.2.3e), it is about 3x faster than it. I do not have any way to compare its speed against the speed of the CUDA enabled ppsieve. If someone has a current version of OpenCL enabled ppsieve, please share.

Is anyone interesting is doing some testing to compare the speed of either the CUDA or OpenCL ppsieve with gfndsievecl?

I posted a Windows build of gfndsievecl at sourceforge. Use the -r option to bypass building of the bitmap.

I have some more changes to boost the speed further, but I want to see where gnfdsievecl stacks up first.

pepi37 2021-02-17 20:11

[QUOTE=rogue;571816]I have committed some changes to gfndsieve/gfndsievecl to support ppsieve type sieving. Although I can only compare against an old version of OpenCL enabled ppsieve (version cl-0.2.3e), it is about 3x faster than it. I do not have any way to compare its speed against the speed of the CUDA enabled ppsieve. If someone has a current version of OpenCL enabled ppsieve, please share.

Is anyone interesting is doing some testing to compare the speed of either the CUDA or OpenCL ppsieve with gfndsievecl?

I posted a Windows build of gfndsievecl at sourceforge. Use the -r option to bypass building of the bitmap.

I have some more changes to boost the speed further, but I want to see where gnfdsievecl stacks up first.[/QUOTE]
You can download currently used app in [URL="http://www.primegrid.com/download/"]http://www.primegrid.com/download/ [/URL]

rogue 2021-02-18 00:17

[QUOTE=pepi37;571837]You can download currently used app in [URL="http://www.primegrid.com/download/"]http://www.primegrid.com/download/ [/URL][/QUOTE]

Those builds are from 2011. I thought that they had more recent builds.

I've asked in the PrimeGrid developers-corner over at discord, but no response.

[rant]
This is one of the beefs I have with PrimeGrid. The are in a hurry to get people to build stuff for them, but they have a lot of difficulty with people that live outside of their world of influence, which includes me. I create much faster sievers than anything they have used and they basically ignore what I produce. They won't even discuss it. Very frustrating.
[/rant]

MisterBitcoin 2021-02-18 16:28

[QUOTE=Happy5214;568456]I'm not able to create new sieve files using srsieve2 (SVN rev 84):
[code]$ ./srsieve2 -W 4 -n 5e4 -N 145e3 -P "1e9" -o 't17_b2.prp' -f B -s "34767*2^n-1"
srsieve2 v1.3, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with generic logic
Fatal Error: Expected 95001 terms, but only set 0
[/code][/QUOTE]


Strangly enough i got this one on the latest version:


[CODE]C:\Users\Administrator\Documents\cllr S649\r15\thread 1>srsieve2.exe -s remain.txt -n 25000 -N 100000 -P 15e9
srsieve2 v1.5.1, a program to find factors of k*b^n+c numbers for fixed b and va
riable k and n
Sieving with generic logic for p >= 3
Fatal Error: Expected 825011 terms, but only set 0
[/CODE]


I also got a bug with using multithreading on 1.4 (which was fixed); so i am back using v1.1 which works fine. Creating the sieve file on my main pc now. :D

rogue 2021-02-18 17:29

[QUOTE=MisterBitcoin;571911]Strangly enough i got this one on the latest version:


[CODE]C:\Users\Administrator\Documents\cllr S649\r15\thread 1>srsieve2.exe -s remain.txt -n 25000 -N 100000 -P 15e9
srsieve2 v1.5.1, a program to find factors of k*b^n+c numbers for fixed b and va
riable k and n
Sieving with generic logic for p >= 3
Fatal Error: Expected 825011 terms, but only set 0
[/CODE]


I also got a bug with using multithreading on 1.4 (which was fixed); so i am back using v1.1 which works fine. Creating the sieve file on my main pc now. :D[/QUOTE]

I'll dig into it. I'm expecting it to be easy to track down and fix.

rogue 2021-02-19 14:38

[QUOTE=MisterBitcoin;571911]Strangly enough i got this one on the latest version:


[CODE]C:\Users\Administrator\Documents\cllr S649\r15\thread 1>srsieve2.exe -s remain.txt -n 25000 -N 100000 -P 15e9
srsieve2 v1.5.1, a program to find factors of k*b^n+c numbers for fixed b and va
riable k and n
Sieving with generic logic for p >= 3
Fatal Error: Expected 825011 terms, but only set 0
[/CODE]


I also got a bug with using multithreading on 1.4 (which was fixed); so i am back using v1.1 which works fine. Creating the sieve file on my main pc now. :D[/QUOTE]

I have not been able to reproduce. Can you post the remain.txt that you used for input?

KEP 2021-02-19 15:56

[QUOTE=rogue;572017]I have not been able to reproduce. Can you post the remain.txt that you used for input?[/QUOTE]

I saw exactly same error on my machine. I in fact had to roll back to the version of srsieve2 that Reb has on his BoincConfederation website. This was when resuming and not when starting from scratch. It was for base S561. One big difference between 2.2.0 and between Rebs version was that 2.2.0 used 638000 KB while Rebs version used only 68000 KB. Maybe that can give you a lead :smile:

rogue 2021-02-19 16:40

[QUOTE=KEP;572025]I saw exactly same error on my machine. I in fact had to roll back to the version of srsieve2 that Reb has on his BoincConfederation website. This was when resuming and not when starting from scratch. It was for base S561. One big difference between 2.2.0 and between Rebs version was that 2.2.0 used 638000 KB while Rebs version used only 68000 KB. Maybe that can give you a lead :smile:[/QUOTE]

I suspect that there was a compile issue or linking issue as I cannot reproduce. I put a new build of srsieve2.7z over at sourceforge. Please try that. If that still fails on Windows, let me know. If using a file of sequences, please post that file.

KEP 2021-02-20 08:44

[QUOTE=rogue;572027]I suspect that there was a compile issue or linking issue as I cannot reproduce. I put a new build of srsieve2.7z over at sourceforge. Please try that. If that still fails on Windows, let me know. If using a file of sequences, please post that file.[/QUOTE]

It solved one problem, but there must be a problem with -W or -w as you see in this copy from the command prompt:

C:\20210101_S561_startup\srsieve2_n=3073-25K>srsieve2 -p 96467343541 -W 4 -w 10000 -i b561_n.abcd
srsieve2 v1.5.1, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Cannot use Legendre tables because square-free part of k is too large
Must use generic sieving logic because there is more than one sequence
Sieving with generic logic for p >= 96467343541
Split 5604 base 561 sequences into 5604 base 561^1 sequences.
Sieve started: 96467343541 < p < 2^62 with 7810916 terms (9001 < n < 25000, k*561^n+c)
Must use generic sieving logic because there is more than one sequence
Sieving with generic logic for p >= 96468358331
Split 5604 base 561 sequences into 5604 base 561^1 sequences.
Must use generic sieving logic because there is more than one sequence
Sieving with generic logic for p >= 96469371701
Split 5604 base 561 sequences into 5604 base 561^1 sequences.
Must use generic sieving logic because there is more than one sequence
Sieving with generic logic for p >= 96470381507
Split 5604 base 561 sequences into 5604 base 561^1 sequences.
Must use generic sieving logic because there is more than one sequence
Sieving with generic logic for p >= 96471394813
Split 5604 base 561 sequences into 5604 base 561^1 sequences.
Must use generic sieving logic because there is more than one sequence
Sieving with generic logic for p >= 96472407233
Split 5604 base 561 sequences into 5604 base 561^1 sequences.
p=0, nanM p/sec, no factors found
Must use generic sieving logic because there is more than one sequence
Sieving with generic logic for p >= 96473417713
Split 5604 base 561 sequences into 5604 base 561^1 sequences.
Must use generic sieving logic because there is more than one sequence
Sieving with generic logic for p >= 96474430733
Split 5604 base 561 sequences into 5604 base 561^1 sequences.
Must use generic sieving logic because there is more than one sequence
Sieving with generic logic for p >= 96475448009
Split 5604 base 561 sequences into 5604 base 561^1 sequences.

Without -W and -w everything checksout fine:

C:\20210101_S561_startup\srsieve2_n=3073-25K>srsieve2 -p 96467343541 -i b561_n.abcd
srsieve2 v1.5.1, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Cannot use Legendre tables because square-free part of k is too large
Must use generic sieving logic because there is more than one sequence
Sieving with generic logic for p >= 96467343541
Split 5604 base 561 sequences into 5604 base 561^1 sequences.
Sieve started: 96467343541 < p < 2^62 with 7810916 terms (9001 < n < 25000, k*561^n+c)
p=96469028651, 1.092K p/sec, 5 factors found at 17.30 sec per factor (last 1 min)

So somewthing got fixed while something got broken unfortunantly :cry:

Keep up the good work :smile:

rogue 2021-02-20 15:10

I'm fairly certain I know the cause of this.


All times are UTC. The time now is 22:41.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.