mersenneforum.org mtsieve
 Register FAQ Search Today's Posts Mark Forums Read

 2022-05-17, 14:13 #628 rogue     "Mark" Apr 2003 Between here and the 23·3·52·11 Posts I tried switching to the cygwin g++ compiler, but it won't link. It appears to be a bug in primesieve or the linker or compiler used by cygwin. So I am now trying to use the llvm-mingw compiler. It appears that I can debug with lldb or gdb for programs built on it. This will require a bit of testing. It seems to crash when resetting the rounding mode for SSE, but only one sieve uses SSE, so I will probably just modify that sieve to use different logic and thus remove SSE assembler completely from the code. It is possible that something else I changed between releases triggered this crash, but it is odd that only srsieve2 seems to be affected by it. Without use of gdb on msys2 builds, I cannot diagnose the root cause of the crash.
 2022-05-23, 13:30 #629 ryanp     Jun 2012 Boulder, CO 383 Posts Found what seems to be a bug in the latest SVN code. I have not had time to recompile or test with gdb. Code: $./srsieve2 -P 1e14 -W 64 -s "39*2^n+1" -o ferm39_10M_20M_sv1e14.txt -n 10e6 -N 20e6 srsieve2 v1.6.2, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with generic logic for p >= 3 Sieve started: 3 < p < 1e14 with 10000001 terms (10000000 < n < 20000000, k*2^n+1) (expecting 9659200 factors) Sieving with single sequence c=1 logic for p >= 257 BASE_MULTIPLE = 30, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720 Split 1 base 2 sequence into 408 base 2^720 sequences. Legendre summary: Approximately 40 B needed for Legendre tables 1 total sequences 1 are eligible for Legendre tables 0 are not eligible for Legendre tables 1 have Legendre tables in memory 0 cannot have Legendre tables in memory 0 have Legendre tables loaded from files 1 required building of the Legendre tables 518400 bytes used for congruent q and ladder indices 311200 bytes used for congruent qs and ladders Unable to lock mutex thread_6_worker. Exiting. Interestingly, this only seems to happen for 39*2^n+1 and not 37*2^n+1 or 41*2^n+1. Last fiddled with by ryanp on 2022-05-23 at 13:31 Reason: added some details 2022-05-23, 15:00 #630 rogue "Mark" Apr 2003 Between here and the 23×3×52×11 Posts Quote:  Originally Posted by ryanp Found what seems to be a bug in the latest SVN code. I have not had time to recompile or test with gdb. Code: $ ./srsieve2 -P 1e14 -W 64 -s "39*2^n+1" -o ferm39_10M_20M_sv1e14.txt -n 10e6 -N 20e6 srsieve2 v1.6.2, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with generic logic for p >= 3 Sieve started: 3 < p < 1e14 with 10000001 terms (10000000 < n < 20000000, k*2^n+1) (expecting 9659200 factors) Sieving with single sequence c=1 logic for p >= 257 BASE_MULTIPLE = 30, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720 Split 1 base 2 sequence into 408 base 2^720 sequences. Legendre summary: Approximately 40 B needed for Legendre tables 1 total sequences 1 are eligible for Legendre tables 0 are not eligible for Legendre tables 1 have Legendre tables in memory 0 cannot have Legendre tables in memory 0 have Legendre tables loaded from files 1 required building of the Legendre tables 518400 bytes used for congruent q and ladder indices 311200 bytes used for congruent qs and ladders Unable to lock mutex thread_6_worker. Exiting. Interestingly, this only seems to happen for 39*2^n+1 and not 37*2^n+1 or 41*2^n+1.
srsieve2 doesn't seem to perform nicely for large numbers of threads. I think it would require a rethinking of the framework to do that. You are probably off running multiple instances with a smaller number of threads, each with its own range. You can also bump -w to change the number of primes per worker thread. That might alleviate some of the thread contention. I do not have a CPU with more than 8 cores to run a test like this on.

I do not recall if sr1sieve is faster (compared to srsieve2). I know it is slower than srsieve2cl.

Last fiddled with by rogue on 2022-05-23 at 15:13

2022-05-23, 17:16   #631
ryanp

Jun 2012
Boulder, CO

17F16 Posts

Quote:
 Originally Posted by rogue srsieve2 doesn't seem to perform nicely for large numbers of threads. I think it would require a rethinking of the framework to do that. You are probably off running multiple instances with a smaller number of threads, each with its own range. You can also bump -w to change the number of primes per worker thread. That might alleviate some of the thread contention. I do not have a CPU with more than 8 cores to run a test like this on. I do not recall if sr1sieve is faster (compared to srsieve2). I know it is slower than srsieve2cl.
I've been able to repro this on multiple machines. Observations:

* it fails consistently and only with -s "39*2^n+1"
* it fails with even -W 4, though this produces:

Code:
518400 bytes used for congruent q and ladder indices
311200 bytes used for congruent qs and ladders
corrupted size vs. prev_size
Aborted

Code:
518400 bytes used for congruent q and ladder indices
311200 bytes used for congruent qs and ladders
Aborted
Here's the result of debugging with gdb:

Code:
[New Thread 0x7ffff6ef9640 (LWP 206359)]
corrupted size vs. prev_size

__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
49	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1  0x00007ffff7a4a546 in __GI_abort () at abort.c:79
#2  0x00007ffff7aa1eb8 in __libc_message (action=action@entry=do_abort,
fmt=fmt@entry=0x7ffff7bbfa78 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007ffff7aa991a in malloc_printerr (
str=str@entry=0x7ffff7bbd714 "corrupted size vs. prev_size")
at malloc.c:5628
av=<optimized out>) at malloc.c:1608
av=av@entry=0x7ffff7bf6ba0 <main_arena>, bytes=bytes@entry=112)
at malloc.c:4263
#6  0x00007ffff7aae4b1 in __GI___libc_malloc (bytes=112) at malloc.c:3237
#7  0x00007ffff7e0e6bc in operator new(unsigned long) ()
from /lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x000055555556f308 in __gnu_cxx::new_allocator<primesieve::SievingPrime>::allocate (this=0x7fffffffd5f0, __n=14)
at /usr/include/c++/11/ext/new_allocator.h:127
#9  0x000055555556efb2 in std::allocator_traits<std::allocator<primesieve::SievingPrime> >::allocate (__a=..., __n=14)
at /usr/include/c++/11/bits/alloc_traits.h:464
#10 0x000055555556ea3c in std::_Vector_base<primesieve::SievingPrime, std::allocator<primesieve::SievingPrime> >::_M_allocate (this=0x7fffffffd5f0, __n=14)
at /usr/include/c++/11/bits/stl_vector.h:346
#11 0x000055555556e6c4 in std::vector<primesieve::SievingPrime, std::allocator<primesieve::SievingPrime> >::reserve (this=0x7fffffffd5f0, __n=14)
at /usr/include/c++/11/bits/vector.tcc:78
#12 0x000055555556ba1c in primesieve::EratSmall::init (this=0x7fffffffd5d0,
stop=1021020, l1CacheSize=17017, maxPrime=17) at sieve/EratSmall.cpp:57
#13 0x000055555556f9d4 in primesieve::PreSieve::initBuffer (
this=0x55555583efd8, maxPrime=17, primeProduct=510510)
at sieve/PreSieve.cpp:86
#14 0x000055555556f8dc in primesieve::PreSieve::init (this=0x55555583efd8,
start=11924379, stop=23704475) at sieve/PreSieve.cpp:68
#15 0x0000555555564079 in primesieve::Erat::init (this=0x55555583ec70,
start=11924379, stop=23704475, sieveSize=512, preSieve=...)
at sieve/Erat.cpp:79
#16 0x000055555557793b in primesieve::PrimeGenerator::initErat (
this=0x55555583ec70) at sieve/PrimeGenerator.cpp:159
this=0x55555583ec70,
primes=std::vector of length 256, capacity 256 = {...},
size=0x7fffffffd8f8) at sieve/PrimeGenerator.cpp:147
#18 0x0000555555577bab in primesieve::PrimeGenerator::sieveSegment (
this=0x55555583ec70,
primes=std::vector of length 256, capacity 256 = {...},
size=0x7fffffffd8f8) at sieve/PrimeGenerator.cpp:232
#19 0x0000555555577d56 in primesieve::PrimeGenerator::fill (
this=0x55555583ec70,
primes=std::vector of length 256, capacity 256 = {...},
size=0x7fffffffd8f8) at sieve/PrimeGenerator.cpp:291
#20 0x00005555555850d5 in primesieve::iterator::generate_next_primes (
this=0x7fffffffd8f0) at sieve/iterator.cpp:67
#21 0x0000555555560110 in primesieve::iterator::next_prime (
this=0x7fffffffd8f0) at sieve/primesieve/iterator.hpp:69
#22 0x0000555555560632 in primesieve::store_n_primes<std::vector<unsigned long, std::allocator<unsigned long> > > (n=216804, start=1257,
primes=std::vector of length 783196, capacity 1000000 = {...})
at sieve/primesieve/StorePrimes.hpp:87
#23 0x000055555556028c in primesieve::generate_n_primes<unsigned long> (
n=1000000, start=1258, primes=0x5555556fb390)
at core/../sieve/primesieve.hpp:62
#24 0x0000555555561fff in Worker::ProcessNextPrimeChunk (this=0x5555556fb380,
startFrom=1257, maxPrimeForChunk=1257) at core/Worker.cpp:155
#25 0x000055555555cbe1 in App::Sieve (this=0x5555555d98e0) at core/App.cpp:450
#26 0x000055555555ca3b in App::Run (this=0x5555555d98e0) at core/App.cpp:405
#27 0x0000555555563288 in main (argc=13, argv=0x7fffffffdba8)
at core/main.cpp:87`

Last fiddled with by ryanp on 2022-05-23 at 17:18 Reason: add gdb backtrace

 2022-05-23, 18:15 #632 pepi37     Dec 2011 After milion nines:) 72·31 Posts What is range of n? [QUOTE=ryanp;606344]I've been able to repro this on multiple machines. Observations: * it fails consistently and only with -s "39*2^n+1" * it fails with even -W 4, though this produces: [ Last fiddled with by pepi37 on 2022-05-23 at 18:48
 2022-05-23, 19:29 #633 rogue     "Mark" Apr 2003 Between here and the 147108 Posts When I have some time I will play around with this using the latest build as it uses a different compiler. I have also upgraded to the latest primesieve code. One of those changes might fix this issue.
 2022-05-23, 21:58 #634 pepi37     Dec 2011 After milion nines:) 27578 Posts fix for problem (temporary) Make initial sieve with srsieve use srsieve2 version 1.5.3 working on 6 core CPU without problem srsieve2 -P 1446900638801 -W 6 -w5e6 -i a.txt -o b.txt -O fact39.txt srsieve2 v1.5.3, a program to find factors of k*b^n+c numbers for fixed b and variable k and n Sieving with generic logic for p >= 100000000 Split 1 base 2 sequence into 204 base 2^360 sequences. 712822 bytes used for congruence tables 522 bytes used for Legendre tables Sieve started: 1e8 < p < 1446900638801 with 1353444 terms (11924391 < n < 23704473, k*2^n+1) (expecting 463052 factors) p=731561923, 679.7K p/sec, 142804 factors found at 476.8 f/sec (last 1 min), 0.0% done. ETC 2022-05-25 14:45
2022-05-23, 22:00   #635
pepi37

Dec 2011
After milion nines:)

5EF16 Posts

Quote:
 Originally Posted by rogue srsieve2 doesn't seem to perform nicely for large numbers of threads. I think it would require a rethinking of the framework to do that. You are probably off running multiple instances with a smaller number of threads, each with its own range. You can also bump -w to change the number of primes per worker thread. That might alleviate some of the thread contention. I do not have a CPU with more than 8 cores to run a test like this on. I do not recall if sr1sieve is faster (compared to srsieve2). I know it is slower than srsieve2cl.

I noticed this on 12 core Ryzen ( HT is off) When I set 12 cores I only have 785% CPU utilization ( Linux)

All times are UTC. The time now is 04:51.

Tue May 24 04:51:54 UTC 2022 up 40 days, 2:53, 0 users, load averages: 1.45, 1.18, 1.14