20221117, 00:35  #760  
"Mark"
Apr 2003
Between here and the
13×557 Posts 
Quote:


20221117, 01:06  #761 
"Seth"
Apr 2019
17·29 Posts 
I believe this has to do with mtsieve not handling the end point strictly.
Code:
five:~/Projects/mtsievegit$ diff 4b9m7_n_20e8_clean.abc 4b9m7_n_20e8_post.abc 1c1 < ABC 4*9^$a7 // Sieved to 2000000063  > ABC 4*9^$a7 // Sieved to 2000288291 Last fiddled with by SethTro on 20221117 at 01:23 
20221117, 01:47  #762  
"Seth"
Apr 2019
17·29 Posts 
Quote:
Code:
$ primesieve 1e9 Primes: 50847534 $ primesieve n $(expr 50847534 + 47432100) Nth prime: 2001223151 $ ./srsieve2cl platform=1 W1 G1 fA p 20e8 P 2001223151 i 4b9m7_n_20e8_clean.abc o temp.abc Primes tested: 73344. Factors found: 1. Remaining terms: 20477. Time: 0.10 seconds. $ diff 4b9m7_n_20e8_clean.abc temp.abc 1c1 < ABC 4*9^$a7 // Sieved to 2000000063  > ABC 4*9^$a7 // Sieved to 2001565729 629d628 < 104493 $ python >>> for p in range(20 * 10 ** 8 + 1, 2001288307, 2): ... if (4 * pow(9, 104493, p)  7) % p == 0: ... print(p) ... 2001059303 $ echo "4*9^1044937"  ecm pm1 14650 0 GMPECM 7.0.5dev [configured with GMP 6.2.99, enableasmredc, enableassert] [P1] Input number is 4*9^1044937 (99713 digits) Using B1=14650, B2=0, polynomial x^1, x0=2592079789 Step 1 took 38713ms ********** Factor found in step 1: 2001059303 I looked briefly into what it would take to make the worksize variable so that the bounds could be strict but it's not pretty. It would probably make more sense to pad the last batch with the last prime a bunch of times (and verify that if that primes divides a sequence it doesn't count multiple times) Last fiddled with by SethTro on 20221117 at 01:49 

20221117, 13:46  #763  
"Mark"
Apr 2003
Between here and the
13·557 Posts 
Quote:


20221117, 14:06  #764 
"Mark"
Apr 2003
Between here and the
13·557 Posts 
Upon review, this is for a single sequence and you have doubled the size of the hash table. Have you done any testing with multiple sequences? I am concerned that the change to GenericGpuWorker will negatively affect memory requirements for the GPU when one has hundreds or thousands of sequences.

20221117, 17:00  #765  
"Seth"
Apr 2019
17·29 Posts 
Quote:
I'm happy to test with more sequences, do you have a recommendation for how many sequences at a time I should test? and how large those sequences should be? You can test with just the one line change, or the one line change + refactor, or with all my hash table changes. On my GPU (1080ti) I consistently see a +510% performance improvement. Code:
$ ./srsieve2cl_clean platform=1 W0 G1 fA p 1e9 P 20e8 i 4b9m7_n_1e9.abc Sieve started: 1e9 < p < 2e9 with 21137 terms (100012 < n < 249988, k*9^n7) (expecting 684 factors) CPU time: 35.73 sec. (0.64 sieving) (0.98 cores) GPU time: 34.83 sec. Primes tested: 47423488. Factors found: 659. Remaining terms: 20478. Time: 36.45 seconds. Segmentation fault (core dumped) $ ./srsieve2cl platform=1 W0 G1 fA p 1e9 P 20e8 i 4b9m7_n_1e9.abc Sieve started: 1e9 < p < 2e9 with 21137 terms (100012 < n < 249988, k*9^n7) (expecting 684 factors) CPU time: 33.44 sec. (0.60 sieving) (0.98 cores) GPU time: 32.26 sec. Primes tested: 47423488. Factors found: 659. Remaining terms: 20478. Time: 33.84 seconds. 

20221117, 19:43  #766 
"Mark"
Apr 2003
Between here and the
13×557 Posts 
Is suggest testing with 1, 10, 10, and 1000 sequences. You can generate sequences or you can use the CRUS page to find sequences that will provide more meaningful results. Use n1e5 N2e5. You can then test p1e9 p1e10. It will take longer to run, but reduces impact of other things you might be doing on your computer.
I would like to see before and after for CPU and GPU. With srsieve2, no W. With srsieve2cl, no G. Assuming all looks good, can you email me the files that you have changed. It would make it easier for me to evaluate what you have changed. Also send me the starting sequence files so that I can do my own testing. I hope that I am not asking too much from you. I do intend to incorporate your changes into the code base. I could also make you a developer on the project so that you can commit directly to sourceforge. 
20221117, 21:41  #767  
"Seth"
Apr 2019
755_{8} Posts 
Quote:
Quote:
When i run `./srsieve2 fA p 1e8 p 2e8 n1e5 N2e5 s"118392*51^n+1" ` I get an error with Code:
srsieve2: sierpinski_riesel/CisOneWithOneSequenceHelper.cpp:282: void CisOneWithOneSequenceHelper::MakeLadder(uint16_t*, uint32_t): Assertion `qListLen+1 < ii_BestQ' failed. 

20221117, 21:59  #768 
"Mark"
Apr 2003
Between here and the
13·557 Posts 
I suggest that you start by sieving the sequence to 1e8, then continue sieving with the output file that was generated. For example:
./srsieve2 P1e6 n1e5 N2e5 s"118392*51^n+1" followed by: ./srsieve2 P2e8 ib51_n.abcd The first execution will remove most terms. The second will provide the stats we care about. What you input might be a problem because you are overriding the starting value for p. At p=1e6 it recomputes bestQ. 
20221117, 22:05  #769 
"Seth"
Apr 2019
111101101_{2} Posts 
Results
Results
I pulled down a few CRUS sequences and choose 1, 10, 100, 1000 k for random bases (I attach these files) All of this was run on a Ryzen 3900x with no other load, I tested p 1e8 to P 2e8 (to get results faster) but can test larger ranges if this doesn't convince you. Code:
$ wc crus_seqs_rand* 1000 3000 15978 crus_seqs_rand1000.txt 100 300 1534 crus_seqs_rand100.txt 10 30 158 crus_seqs_rand10.txt 1 3 13 crus_seqs_rand1.txt $ cat crus_seqs_randX.txt  awk F", " '{ print "s\"" $1 "*" $2 "^n" $3 "\"" }'  tr '\n' ' ' > seqsX.txt $ cat seqs10.txt s"3677878*3^n1" s"6793112*3^n1" s"10463066*3^n1" s"10789522*3^n1" s"11033634*3^n1" s"16874152*3^n1" s"18137648*3^n1" s"20379336*3^n1" s"21368582*3^n1" s"29140796*3^n1" $ echo "./srsieve2 fA p 1e8 P 2e8 n1e5 N2e5 $(cat seqs10.txt)" ./srsieve2 fA p 1e8 p 2e8 n1e5 N2e5 s"3677878*3^n1" s"6793112*3^n1" s"10463066*3^n1" s"10789522*3^n1" s"11033634*3^n1" s"16874152*3^n1" s"18137648*3^n1" s"20379336*3^n1" s"21368582*3^n1" s"29140796*3^n1" $ eval "./srsieve2 fA p 1e8 P 2e8 n1e5 N2e5 $(cat seqs10.txt)" Code:
$ ./srsieve2_clean fA p 1e8 P 2e8 n1e5 N2e5 s"3677878*3^n1" s"6793112*3^n1" s"10463066*3^n1" s"10789522*3^n1" s"11033634*3^n1" s"16874152*3^n1" s"18137648*3^n1" s"20379336*3^n1" s"21368582*3^n1" s"29140796*3^n1" Sieve started: 1e8 < p < 2e8 with 1000010 terms (100000 < n < 200000, k*3^n1) (expecting 36264 factors) CPU time: 58.77 sec. (0.05 sieving) (1.00 cores) Primes tested: 5317484. Factors found: 36406. Remaining terms: 963604. Time: 58.60 seconds. $ ./srsieve2 fA p 1e8 P 2e8 n1e5 N2e5 s"3677878*3^n1" s"6793112*3^n1" s"10463066*3^n1" s"10789522*3^n1" s"11033634*3^n1" s"16874152*3^n1" s"18137648*3^n1" s"20379336*3^n1" s"21368582*3^n1" s"29140796*3^n1" Sieve started: 1e8 < p < 2e8 with 1000010 terms (100000 < n < 200000, k*3^n1) (expecting 36264 factors) CPU time: 47.23 sec. (0.05 sieving) (1.00 cores) Primes tested: 5317484. Factors found: 36406. Remaining terms: 963604. Time: 47.19 seconds. # With W8 CPU time: 62.63 sec. (0.50 sieving) (7.84 cores) vs CPU time: 48.75 sec. (0.54 sieving) (7.79 cores) Code:
$ eval ./srsieve2_clean fA p 1e8 P 2e8 n1e5 N2e5 $(cat seqs100.txt) Sieve started: 1e8 < p < 2e8 with 10000100 terms (100000 < n < 200000, k*31^n+1) (expecting 362645 factors) p=130433617, 26.90K p/sec, 142003 factors found at 2.321K f/sec (last 1 min), 30.4% done. ETC 20221117 13:48 p=193974931, 27.34K p/sec, 347298 factors found at 1.488K f/sec (last 1 min), 93.9% done. ETC 20221117 13:48 CPU time: 195.54 sec. (0.05 sieving) (1.00 cores) Primes tested: 5317484. Factors found: 362592. Remaining terms: 9637508. Time: 194.95 seconds. $ eval ./srsieve2 fA p 1e8 P 2e8 n1e5 N2e5 $(cat seqs100.txt) Sieve started: 1e8 < p < 2e8 with 10000100 terms (100000 < n < 200000, k*31^n+1) (expecting 362645 factors) p=136261487, 32.20K p/sec, 165361 factors found at 2.720K f/sec (last 1 min), 36.2% done. ETC 20221117 13:52 p=173591807, 32.33K p/sec, 290747 factors found at 2.048K f/sec (last 1 min), 73.5% done. ETC 20221117 13:52 CPU time: 164.43 sec. (0.04 sieving) (1.00 cores) Primes tested: 5317484. Factors found: 362592. Remaining terms: 9637508. Time: 163.99 seconds. # With W8 CPU time: 206.01 sec. (0.55 sieving) (7.92 cores) vs CPU time: 172.62 sec. (0.50 sieving) (7.90 cores) Code:
$ eval ./srsieve2_clean W8 fA p 1e8 P 2e8 n1e5 N2e5 $(cat seqs1000.txt) p=146758463, 41.42K p/sec, 2043151 factors found at 4.210K f/sec (last 1 min), 46.7% done. ETC 20221117 13:59 p=194516929, 41.36K p/sec, 3489737 factors found at 2.971K f/sec (last 1 min), 94.5% done. ETC 20221117 13:59 CPU time: 1026.07 sec. (0.61 sieving) (7.96 cores) Primes tested: 5317484. Factors found: 3626818. Remaining terms: 96374182. Time: 128.82 seconds. $ eval ./srsieve2 W8 fA p 1e8 P 2e8 n1e5 N2e5 $(cat seqs1000.txt) p=150323629, 44.44K p/sec, 2167391 factors found at 4.458K f/sec (last 1 min), 50.3% done. ETC 20221117 14:02 CPU time: 957.16 sec. (0.71 sieving) (7.96 cores) Primes tested: 5317484. Factors found: 3626818. Remaining terms: 96374182. Time: 120.17 seconds. 
20221117, 23:07  #770 
"Seth"
Apr 2019
17×29 Posts 
Better test runner script
Code:
rm temp*.out for N in {1,10,100,100}; do cat "crus_seqs_rand${N}.txt"  awk F", " '{ print "s\"" $1 "*" $2 "^n" $3 "\"" }'  tr '\n' ' ' > "seqs${N}.txt" wc "seqs${N}.txt" echo eval ./srsieve2_clean P 1e8 n1e5 N2e5 $(cat seqs${N}.txt) o temp_${N}.in echo eval "./srsieve2_clean P 2e8 i temp_${N}.in o temp_${N}_befor1.out echo eval "./srsieve2_clean P 1e9 W8 i temp_${N}.in o temp_${N}_befor8.out echo eval "./srsieve2 P 2e8 i temp_${N}.in o temp_${N}_after1.out echo eval "./srsieve2 P 1e9 W8 i temp_${N}.in o temp_${N}_after8.out echo e "\n\n" done Code:
1 Sequence (using 10x higher bounds; 1e9, 1e10) Before CPU time: 21.90 sec. (0.26 sieving) (1.20 cores) Primes tested: 40351310. Factors found: 225. Remaining terms: 2731. Time: 18.20 seconds. CPU time: 219.52 sec. (27.57 sieving) (8.05 cores) Primes tested: 444556287. Factors found: 517. Remaining terms: 2439. Time: 27.26 seconds. After CPU time: 20.96 sec. (0.27 sieving) (1.20 cores) Primes tested: 40351310. Factors found: 225. Remaining terms: 2731. Time: 17.45 seconds. CPU time: 212.39 sec. (28.63 sieving) (8.03 cores) Primes tested: 444556287. Factors found: 517. Remaining terms: 2439. Time: 26.41 seconds. 10 Sequences Before CPU time: 25.28 sec. (0.06 sieving) (1.00 cores) Primes tested: 5317484. Factors found: 757. Remaining terms: 19928. Time: 25.23 seconds. CPU time: 231.81 sec. (3.57 sieving) (7.91 cores) (This is 9X the work so expected that it takes longer) Primes tested: 45086080. Factors found: 2277. Remaining terms: 18408. Time: 29.30 seconds. After CPU time: 22.14 sec. (0.05 sieving) (1.00 cores) Primes tested: 5317484. Factors found: 757. Remaining terms: 19928. Time: 22.13 seconds. CPU time: 202.03 sec. (4.27 sieving) (7.91 cores) Primes tested: 45086080. Factors found: 2277. Remaining terms: 18408. Time: 25.52 seconds. 100 Sequence Before CPU time: 135.06 sec. (0.05 sieving) (1.00 cores) Primes tested: 5317484. Factors found: 21073. Remaining terms: 563114. Time: 134.57 seconds. CPU time: 1223.59 sec. (4.01 sieving) (7.96 cores) Primes tested: 45086080. Factors found: 64678. Remaining terms: 519509. Time: 153.59 seconds. After CPU time: 127.34 sec. (0.06 sieving) (1.00 cores) Primes tested: 5317484. Factors found: 21073. Remaining terms: 563114. Time: 126.80 seconds. CPU time: 1155.34 sec. (4.17 sieving) (7.96 cores) Primes tested: 45086080. Factors found: 64678. Remaining terms: 519509. Time: 144.99 seconds. 1000 Sequence (sieved to 11e7 for no W, 2e8 for W8) Before CPU time: 84.85 sec. (0.00 sieving) (1.02 cores) Primes tested: 541856. Factors found: 23720. Remaining terms: 4588834. Time: 83.03 seconds. CPU time: 869.99 sec. (0.42 sieving) (7.97 cores) Primes tested: 5317484. Factors found: 167203. Remaining terms: 4445351. Time: 109.13 seconds. After CPU time: 85.95 sec. (0.00 sieving) (1.02 cores) Primes tested: 541856. Factors found: 23720. Remaining terms: 4588834. Time: 84.07 seconds. CPU time: 848.70 sec. (0.51 sieving) (7.97 cores) Primes tested: 5317484. Factors found: 167203. Remaining terms: 4445351. Time: 106.48 seconds. I ran the single sequence to high bounds (1e9, 1e10). It is the only sequence not using generic logic and doesn't show a speedup. The 10/100 sequence results are faster, the 1/1000 sequence results are equal. I suspect the single sequence is because it's not using the generic logic. Maybe with 1000 sequences I'm hitting cache limits (I could profile with perf if it was important to verify that thesis) Last fiddled with by SethTro on 20221117 at 23:18 