mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   mtsieve (https://www.mersenneforum.org/showthread.php?t=23042)

pepi37 2022-08-18 21:41

I5 -9600K 24 MB RAM win 10


With W 6 option


[CODE]e:\MTSIEVE\MTSIEVE-2-3-3>srsieve2 -W 6 -P 1e12 -n 3e6 -N 10e6 -o ferm81_3M_10M.txt -s "81*2^n+1"
srsieve2 v1.6.3, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
(b2) Removed 1750000 algebraic factors for 81*2^n+1 of the form (3^2)*2^(n/2)-3*2^((n+2)/4))+1 when n%4=2
Sieving with generic logic for p >= 3
Sieve started: 3 < p < 1e12 with 5250001 terms (3000000 < n < 10000000, k*2^n+1) (expecting 5041260 factors)
Sieving with single sequence c=1 logic for p >= 257
BASE_MULTIPLE = 30, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
Split 1 base 2 sequence into 384 base 2^720 sequences.
Legendre summary: Approximately 2 B needed for Legendre tables
1 total sequences
1 are eligible for Legendre tables
0 are not eligible for Legendre tables
1 have Legendre tables in memory
0 cannot have Legendre tables in memory
0 have Legendre tables loaded from files
1 required building of the Legendre tables
518400 bytes used for congruent q and ladder indices
295200 bytes used for congruent qs and ladders
Increasing worksize to 256000 since each chunk is tested in less than a second
p=1589282789, 1.319M p/sec, 4524325 factors found at 4.361K f/sec (last 1 min), 0.2% done. ETC 2022-08-19 10:12
CTRL-C accepted. Threads will stop after sieving to 2020583611
Sieve interrupted at p=2020583611.
CPU time: 416.06 sec. (0.47 sieving) (5.59 cores)
717815 terms written to ferm81_3M_10M.txt
Primes tested: 99184016. Factors found: 4532186. Remaining terms: 717815. Time: 74.44 seconds.[/CODE]
Linux Ryzen 3900x


w 6


[CODE]srsieve2 v1.6.3, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
(b2) Removed 1750000 algebraic factors for 81*2^n+1 of the form (3^2)*2^(n/2)-3*2^((n+2)/4))+1 when n%4=2
Sieving with generic logic for p >= 3
Sieve started: 3 < p < 1e12 with 5250001 terms (3000000 < n < 10000000, k*2^n+1) (expecting 5041260 factors)
Sieving with single sequence c=1 logic for p >= 257
BASE_MULTIPLE = 30, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
Split 1 base 2 sequence into 384 base 2^720 sequences.
Legendre summary: Approximately 2 B needed for Legendre tables
1 total sequences
1 are eligible for Legendre tables
0 are not eligible for Legendre tables
1 have Legendre tables in memory
0 cannot have Legendre tables in memory
0 have Legendre tables loaded from files
1 required building of the Legendre tables
518400 bytes used for congruent q and ladder indices
295200 bytes used for congruent qs and ladders
Decreasing worksize to 8000 since each chunk needs more than 5 seconds to test
Increasing worksize to 128000 since each chunk is tested in less than a second
corrupted double-linked list (not small)
./a: line 1: 9550 Aborted ./srsieve2 -W 6 -P 1e12 -n 3e6 -N 10e6 -o ferm81_3M_10M.txt -s "81*2^n+1"[/CODE]

[I][B]W 4 works ok![/B][/I]

rogue 2022-08-19 12:47

There are no double linked lists in the code so that is a mystery to me

kruoli 2022-08-19 12:54

That's an error message from GLIBC itself, it uses double-linked lists internally.

chalsall 2022-08-20 01:04

[QUOTE=kruoli;611742]That's an error message from GLIBC itself, it uses double-linked lists internally.[/QUOTE]

Why? A sparse matrix?

Sincere question.

kruoli 2022-08-20 17:48

Why they use it, I cannot say for sure (mostly because I do not enough about the interna). But I can [URL="http://github.com/bminor/glibc/blob/master/malloc/malloc.c#L1599"]link[/URL] you to the GLIBC code where the [URL="http://github.com/bminor/glibc/blob/master/malloc/malloc.c#L1617"]error message[/URL] comes from.

rogue 2022-08-22 16:22

I do not know when I will get around to resolving the issue with srsieve2. I have a more pressing issue with srsieve2cl that is perplexing me. It was introduced when I started making the changes for Metal support. For some reason CL_KERNEL_PRIVATE_MEM_SIZE has increased dramatically even though the kernel itself has barely changed. Due to this increase the generic kernel won't run at all when one has a thousands of sequences. Once that is resolved, I will look into the other issue with srsieve2.

rogue 2022-08-23 20:10

srsieve2 and srsieve2cl code has been updated. I have not posted new builds yet.

The segfault with srsieve2 should be fixed. The issues with srsieve2cl should also be fixed. My test with srsieve2cl is sieving over 170,000 sequences at a time on the GPU. It is not super fast, but much faster than srsieve2.

pepi37 2022-08-23 20:58

[QUOTE=rogue;611950]srsieve2 and srsieve2cl code has been updated. I have not posted new builds yet.

The segfault with srsieve2 should be fixed. The issues with srsieve2cl should also be fixed. My test with srsieve2cl is sieving over 170,000 sequences at a time on the GPU. It is not super fast, but much faster than srsieve2.[/QUOTE]


If you can please post command line as reference for further tuning on GPU
Thanks

rogue 2022-08-23 21:34

[QUOTE=pepi37;611951]If you can please post command line as reference for further tuning on GPU
Thanks[/QUOTE]

On a GPU you will want to tune mainly with -K, -g, and -M. If you have too many sequences to sieve, use -K to break them up. So if you have 6000 sequences and use -K2, it will make two GPU calls with 3000 sequences each. If so many factors are found that it fills the buffer, use -M to adjust the buffer size. You will want to play around with -g to determine if changing it from the default value improves performance. Use srsieve2cl -h to get the defaults for -g and -M.

If starting to sieve a set of sequences, I suggest sieving to 1e6 (which will be all CPU), then starting from the .abcd file so that you don't lose the progress of sieving to 1e6. I had to use -M1000 for what was testing. You might not need to change it at all.

-H will provide details around memory utilization. If srsieve2cl won't run in the GPU, you will likely need to adjust -K higher or -g smaller. I suggest adjusting -K first. On various GPUs CL_KERNEL_PRIVATE_MEM_SIZE seems to top out at 500000. Above that gives an error.

pepi37 2022-08-23 22:20

[QUOTE=rogue;611955]On a GPU you will want to tune mainly with -K, -g, and -M. If you have too many sequences to sieve, use -K to break them up. So if you have 6000 sequences and use -K2, it will make two GPU calls with 3000 sequences each. If so many factors are found that it fills the buffer, use -M to adjust the buffer size. You will want to play around with -g to determine if changing it from the default value improves performance. Use srsieve2cl -h to get the defaults for -g and -M.

If starting to sieve a set of sequences, I suggest sieving to 1e6 (which will be all CPU), then starting from the .abcd file so that you don't lose the progress of sieving to 1e6. I had to use -M1000 for what was testing. You might not need to change it at all.

-H will provide details around memory utilization. If srsieve2cl won't run in the GPU, you will likely need to adjust -K higher or -g smaller. I suggest adjusting -K first. On various GPUs CL_KERNEL_PRIVATE_MEM_SIZE seems to top out at 500000. Above that gives an error.[/QUOTE]


What is your command line? Or it is top secret one? :)

rogue 2022-08-24 01:07

[QUOTE=pepi37;611957]What is your command line? Or it is top secret one? :)[/QUOTE]

For this it was: srsieve2cl -ir63.abcd -K6 -M1000

This was a file presieved to 1e6.


All times are UTC. The time now is 17:06.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.