mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   XYYXF Project (https://www.mersenneforum.org/forumdisplay.php?f=110)
-   -   Leyland Primes (x^y+y^x primes) (https://www.mersenneforum.org/showthread.php?t=19347)

XYYXF 2014-06-03 09:12

[QUOTE=rogue;373744]Multisieve also supports x^y-y^x and for that to be > 0, x must be less than y.[/QUOTE]I've always used version 5.5 which requires x > y :)

[url]http://www.primefan.ru/stuff/soft/ms.zip[/url]

rogue 2014-06-03 11:53

[QUOTE=XYYXF;374917]I've always used version 5.5 which requires x > y :)

[url]http://www.primefan.ru/stuff/soft/ms.zip[/url][/QUOTE]

That was before I was asked to modify MultiSieve to support x^y - y^x.

rogue 2014-06-12 17:31

With both MultiSieve and its replacement for x^y+/-y^x, the more squarelike the search area, the more efficient the software will be. The new software will be even more efficient based upon squareness of the search area. By "squareness" I mean xmax - xmin is close to ymax - ymin. This is because the software has to calculate both x^y mod p and y^x mod p. The software gets its efficiency by computing x^miny mod p for each x and y^minx mod p for each y then multiplying to get the next term. The software won't prohibit a flat rectangle, i.e. minx = maxx or miny = maxy, but it won't be very fast.

rogue 2014-06-13 12:04

Nevermind my previous post.

I was looking at what I was working on and realized that it couldn't easily be done in a GPU because of the memory requirements. As I turned the problem over in my head (more than a few times) I realized that my approach to the sieve was inefficient. Although more efficient than MultiSieve, there was an obvious optimization that I completely missed.

The good thing is that this optimization also simplifies the code (code I haven't written yet) quite a bit.

rogue 2014-06-13 19:09

And due to that simplification the code is now written. Now I have the fun of testing it...

Batalov 2014-06-13 19:29

I will be glad to help you beta-test.

As you can see from my interest for these (which is predating the creation of this forum) that I have a soft spot for these PRPs (as, undoubtedly, also does Selevich). The real burden of course is the PRP testing, but an improved sieve could lift maybe 1/3 of that weight - then, that's great!

rogue 2014-06-15 13:36

[QUOTE=rogue;375713]Nevermind my previous post.

I was looking at what I was working on and realized that it couldn't easily be done in a GPU because of the memory requirements. As I turned the problem over in my head (more than a few times) I realized that my approach to the sieve was inefficient. Although more efficient than MultiSieve, there was an obvious optimization that I completely missed.

The good thing is that this optimization also simplifies the code (code I haven't written yet) quite a bit.[/QUOTE]

That optimization was a fail (and I know why). So I'm focused on getting the code at least equivalent to MultiSieve and then focus on optimizations later. That being said it shouldn't be too difficult to make that happen.

rogue 2014-06-30 12:42

Almost ready for release. I have to do some more testing. On my laptop (which has a slow GPU), it is only about 2x as fast as MultiSieve. I would expect a desktop with a bigger GPU to be much faster.

rogue 2014-07-07 14:00

1 Attachment(s)
Here is a 64-bit beta (Windows). I have good news and bad news about this beta. First the bad. It appears that MultiSieve removes terms for which it does not have a factor. I've looked at the code and have not been able to figure out where that is occurring. This means that ranges sieved in the past should be sieved again presuming I can't track down why it is doing it, which I have little desire to do.

Now the good. The attached program is about 4x faster than MultiSieve on my laptop (for the range below), which has an NVIDIA NVS 5200M. The performance is somewhat based upon the range of y. The larger the range of y, the better the performance.

Here are the runtime options:

[code]
E:\stuff\sieves\xyyxsieve_1.0.0\visualstudio>xyyxsievecl64 -h
-h --help Print this help
-v --verbose Verbose output (memory usage)
-q --quiet Quiet output (no banner or stats)
-p --pmin=P0 Sieve start: 3 <= P0 <= p (default P0=3)
-P --pmax=P1 Sieve end: p < P1 <= 2^62 (default P1=P0+10^9)
-t --nthreads=N Start N threads (default N=1)
-l --list List available platforms and devices
-b --blocks=B Force B blocks per device
-f --platform=F Use platform F instead of 0
-d --device=D Use device D instead of 0
-x --min_x=x Minimum x to search
-X --max_x=X Maximum x to search
-y --min_y=y Minimum y to search
-Y --max_y=Y Maximum y to search
-s --step=s steps iterated per call to GPU
-f --factorsinrate=f Number of factors to use when computing removal rate (default 50)
-m --minimum X Do not report factors smaller than X (default 100000)
-i --inputfile=i PFGW file in ABC format
-o --outputfile=o PFGW file of remaining candidates in ABC format
-S --sign=+/-/b sign to sieve for
[/code]

Here is an command line example: xyyxsievecl64 -x100 -X200 -y3000 -Y4000 -P2e6 -t2 -s3000 -S+ -b32 -oxyyx.pfgw

You will want to vary options t, s, and b to maximize the performance of your machine.

I'll provide source later as this can be built for Linux and OS X.

wombatman 2014-07-07 14:41

Runs great on a GTX570, but I did encounter an error as documented below:
[CODE]$ xyyxsievecl64.exe -x100 -X200 -y3000 -Y10000 -P2e6 -t2 -s3000 -S+ -oxyyx.pfgw

xyyxsievecl v1.0.0, a GPU program to find factors numbers of the form x^y+y^x
Sieve started: (cmdline) 0 <= p < 2000000 with 287349 terms
OpenCL Error: Out of resources
in call to clEnqueueReadBuffer
argument: counter[/CODE]

My GPU has ~1.2GB of RAM and usually about 1GB actually available.

Batalov 2014-07-07 15:05

Option -f (platform and factorsinrate) is used twice, and doesn't seem to come through (as --platform=1)
[CODE]$ ./xyyxsievecl64.exe --platform=1 -d0 -x100 -X200 -y3000 -Y4000 -P2e6 -t2 -s3000 -S+ -b32 -oxyyx.pfgw
xyyxsievecl v1.0.0, a GPU program to find factors numbers of the form x^y+y^x
List of available platforms and devices
Platform 0 has no available devices.
Here is a list of platforms and devices:
Platform 0 is a Advanced Micro Devices, Inc. AMD Accelerated Parallel Processing, version OpenCL 1.2 AMD-APP (938.2)
No devices
Platform 1 is a NVIDIA Corporation NVIDIA CUDA, version OpenCL 1.1 CUDA 6.0.1
Device 0 is a NVIDIA Corporation GeForce GTX 570
[/CODE]


All times are UTC. The time now is 09:42.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.