mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   mtsieve enhancements (https://www.mersenneforum.org/showthread.php?t=25486)

LaurV 2021-01-08 04:35

To clarify, srsieve is "general" tool, srXsieve are more "specific" tools. Of course the last are faster, but they can't handle all the cases the former can.

But what you (Mark) could do from my "lazy man" perspective here, is to call sr1sieve or sr2sieve directly from srsieve and srsieve2, if the conditions suffice (i.e. only one or few k's, sufficient high n, etc). Also, consider pre-calling srfile, if the format does not match (i need to use it sometimes, as sr2sieve does only support a limited number of formats for the input file).

rogue 2021-01-08 13:04

[QUOTE=LaurV;568710]To clarify, srsieve is "general" tool, srXsieve are more "specific" tools. Of course the last are faster, but they can't handle all the cases the former can.

But what you (Mark) could do from my "lazy man" perspective here, is to call sr1sieve or sr2sieve directly from srsieve and srsieve2, if the conditions suffice (i.e. only one or few k's, sufficient high n, etc). Also, consider pre-calling srfile, if the format does not match (i need to use it sometimes, as sr2sieve does only support a limited number of formats for the input file).[/QUOTE]

One could write a simple script to do as you suggest.

My long term goal is that srsieve2(cl) as "one-stop shopping". It will start with the generic case then switch to the specialized cases (sr2sieve logic, sr1sieve logc, opencl logic) when the correct conditions are met. This way you don't need to run multiple programs to get the results you currently need to run before PRP testing. The framework supports this. I just have to code to sr1sieve/sr2sieve logic, but that is a lot easier said than done. They rely on global variables and the code is really difficult to navigate. I just need enough motivation to complete the task. The main motivator will be believing that I can make srsieve2 faster than the sr1sieve/sr2sieve as that will be the motivator for people to switch. The other half is finding enough time to dedicate to the task rather than "an hour here" and "an hour there". It takes larger chunks of my time to do the work and those are hard to come by.

KEP 2021-01-09 10:46

[QUOTE=rogue;568733]The other half is finding enough time to dedicate to the task rather than "an hour here" and "an hour there". It takes larger chunks of my time to do the work and those are hard to come by.[/QUOTE]

Just take your time. What you did with srsieve2, in practical almost doubled our sievespeed compared to srsieve - that has been a huge motivator on the 1000+ k conjectures. Also the multithread function that srsieve2 supports and handles well, was a great motivator for most Windows users to switch to your excellant program.

As Ian said, he would rather do a slower and correct job than a fast and faulty job. It was not his exact words, but the point is, that you have to do as you do now, in stead of rushing and risk throwing out a bad program that we risk believing for ever is doing a great job. Primegrid can confirm the price it cost, when a sieve program misses factors and a lot of work has to be redone :smile: Point being, keep doing a great job, even if it means the rest of us has to wait a little longer for the end product :smile:

Take care, stay safe and stay along for a long time, WE really need you and your skills :smile:

rogue 2021-01-19 18:30

I have finally taken the time and have a build of srsieve2 with sr1sieve logic. There are a couple of TODOs left before I can start testing. There was a lot of refactoring of the srsieve2 to support what sr1sieve does. This means that I have to retest the "generic" code paths. I don't expect that to yield any problems that I have introduced, but one never knows.

sr1sieve and sr2sieve have a lot of conditionally compiled code which makes figuring out the "normal" case hard. The task was made harder by the fact that sr1sieve and sr2sieve have a lot of global variables which do not have meaningful names so understanding this usage is frustratingly hard since I go to great lengths to avoid global variables in the framework. There is also "vectorized" code for mulmods which I'm not supporting at this time because I'm using Montgomery mulmod logic which supports larger p than sr1sieve.

I do not expect it to be as fast as sr1sieve. I'm expecting about 10% slower out of the box, but I won't know until I get to the point and I'm sure that there are many bugs in the code that have to be squashed before I even get that far.

When I get this working, the next step is to support the sr1sieve logic in OpenCL. There is one four-dimensional array in sr1sieve that has to be flattened to three dimensions before I can use it in the GPU. That shouldn't be too hard, but I also have to be cognizant of memory usage in the GPU, but I'm feeling confident that it won't be an issue. Based upon what I have seen with the generic sieving logic in OpenCL, I would expect a 5x or 6x speed bump, but I won't make any promises for that.

The step following that is sr2sieve logic. I have the pieces in place (due to the refactoring) to add support for that. It should go a lot faster than the sr1sieve logic, but I'm uncertain if about putting that in the GPU without spending more time in the sr2sieve code.

Happy5214 2021-01-19 23:08

[QUOTE=rogue;569686]I do not expect it to be as fast as sr1sieve. I'm expecting about 10% slower out of the box, but I won't know until I get to the point and I'm sure that there are many bugs in the code that have to be squashed before I even get that far.[/QUOTE]

Do you know what's keeping it from reaching the speed of sr1sieve?

rogue 2021-01-19 23:18

[QUOTE=Happy5214;569693]Do you know what's keeping it from reaching the speed of sr1sieve?[/QUOTE]

I don't know what speed it will get. I haven't gotten that far in testing. I have made some changes that will either help the speed or hurt the speed and those changes might help for some sequences, but hurt for others. I won't know for a while yet. If I do get within 10% with the current code, I believe that I can get it as fast or faster than sr1sieve (in the CPU), but that will take some experimentation.

The 10% is based upon the Montgomery mulmod that sr1sieve uses for the non-x86 code path. I am using Montgomery mulmod for srsieve2 because it requires no asm code.

Note that this means that I have the tools I need to port many of the sievers built upon the framework to ARM.

ET_ 2021-01-20 12:13

[QUOTE=rogue;569695]

Note that this means that I have the tools I need to port many of the sievers built upon the framework to ARM.[/QUOTE]

:bow: :bow: :bow:

pepi37 2021-01-21 22:00

Rogue,excuse me for stupid question, but I must ask.
I enjoy using yours tools from mtsieve especially twinsieve (using it most of times).
I also read about your efforts for srsieve2.
Question: if you can make twinsieve fast as it is now, can you build sieve that will replace sr1sieve with same speed. I dont know math, and doesnot know will and in what percentage speed will be changes if sieve search for fixed n or for fixed k.
If answer is in srsieve2, then forget what I ask, I will wait :)

rogue 2021-01-21 22:46

[QUOTE=pepi37;569794]Rogue,excuse me for stupid question, but I must ask.
I enjoy using yours tools from mtsieve especially twinsieve (using it most of times).
I also read about your efforts for srsieve2.
Question: if you can make twinsieve fast as it is now, can you build sieve that will replace sr1sieve with same speed. I dont know math, and doesnot know will and in what percentage speed will be changes if sieve search for fixed n or for fixed k.
If answer is in srsieve2, then forget what I ask, I will wait :)[/QUOTE]

twinsieve is "fixed n and variable k" and the srsieve programs are "fixed k and variable n". They require completely different logic.

In theory if one has a very small range of n, then twinsieve could be modified to handle multiple n and could be faster than sr1sieve. I don't know how small the range of n would need to be, but I'm guessing less than 1000 between n_min and n_max.

rogue 2021-02-01 18:59

I have committed my first cut at the GPU code for the CisOne logic (sr1sieve). Preliminary testing shows that it is missing some factors. On the plus side it isn't reporting any invalid factors, so the mistake might be trivial. Also on the plus side is that it appears to be about 4x faster than the CPU code, which means that it is about 3x faster than sr1sieve (on the laptop I am testing on). This could change when the bug causing the missing factors is found, but I do not know if it will make it faster or slower than it is now.

To support this I had to do some refactoring on the CisOne log to replace a multi-dimensional array with a single-dimensional array which is used by both the CPU and GPU logic. If you are brave enough to look at the sources you will see how similar the CPU and GPU code are. This should make it easier for me to track down the problem. If anyone is willing to take a look at the code in an effort to see where I have gone amiss, I would appreciate it. I just don't know how long it will take to track down and squash the bug on my own.

rogue 2021-02-17 17:17

I have committed some changes to gfndsieve/gfndsievecl to support ppsieve type sieving. Although I can only compare against an old version of OpenCL enabled ppsieve (version cl-0.2.3e), it is about 3x faster than it. I do not have any way to compare its speed against the speed of the CUDA enabled ppsieve. If someone has a current version of OpenCL enabled ppsieve, please share.

Is anyone interesting is doing some testing to compare the speed of either the CUDA or OpenCL ppsieve with gfndsievecl?

I posted a Windows build of gfndsievecl at sourceforge. Use the -r option to bypass building of the bitmap.

I have some more changes to boost the speed further, but I want to see where gnfdsievecl stacks up first.


All times are UTC. The time now is 09:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.