![]() |
![]() |
#760 | |
"Mark"
Apr 2003
Between here and the
13×557 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#761 |
"Seth"
Apr 2019
17·29 Posts |
![]()
I believe this has to do with mtsieve not handling the end point strictly.
Code:
five:~/Projects/mtsieve-git$ diff 4b9m7_n_20e8_clean.abc 4b9m7_n_20e8_post.abc 1c1 < ABC 4*9^$a-7 // Sieved to 2000000063 --- > ABC 4*9^$a-7 // Sieved to 2000288291 Last fiddled with by SethTro on 2022-11-17 at 01:23 |
![]() |
![]() |
![]() |
#762 | |
"Seth"
Apr 2019
17·29 Posts |
![]() Quote:
Code:
$ primesieve 1e9 Primes: 50847534 $ primesieve -n $(expr 50847534 + 47432100) Nth prime: 2001223151 $ ./srsieve2cl --platform=1 -W1 -G1 -fA -p 20e8 -P 2001223151 -i 4b9m7_n_20e8_clean.abc -o temp.abc Primes tested: 73344. Factors found: 1. Remaining terms: 20477. Time: 0.10 seconds. $ diff 4b9m7_n_20e8_clean.abc temp.abc 1c1 < ABC 4*9^$a-7 // Sieved to 2000000063 --- > ABC 4*9^$a-7 // Sieved to 2001565729 629d628 < 104493 $ python >>> for p in range(20 * 10 ** 8 + 1, 2001288307, 2): ... if (4 * pow(9, 104493, p) - 7) % p == 0: ... print(p) ... 2001059303 $ echo "4*9^104493-7" | ecm -pm1 14650 0 GMP-ECM 7.0.5-dev [configured with GMP 6.2.99, --enable-asm-redc, --enable-assert] [P-1] Input number is 4*9^104493-7 (99713 digits) Using B1=14650, B2=0, polynomial x^1, x0=2592079789 Step 1 took 38713ms ********** Factor found in step 1: 2001059303 I looked briefly into what it would take to make the worksize variable so that the bounds could be strict but it's not pretty. It would probably make more sense to pad the last batch with the last prime a bunch of times (and verify that if that primes divides a sequence it doesn't count multiple times) Last fiddled with by SethTro on 2022-11-17 at 01:49 |
|
![]() |
![]() |
![]() |
#763 | |
"Mark"
Apr 2003
Between here and the
13·557 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#764 |
"Mark"
Apr 2003
Between here and the
13·557 Posts |
![]()
Upon review, this is for a single sequence and you have doubled the size of the hash table. Have you done any testing with multiple sequences? I am concerned that the change to GenericGpuWorker will negatively affect memory requirements for the GPU when one has hundreds or thousands of sequences.
|
![]() |
![]() |
![]() |
#765 | |
"Seth"
Apr 2019
17·29 Posts |
![]() Quote:
I'm happy to test with more sequences, do you have a recommendation for how many sequences at a time I should test? and how large those sequences should be? You can test with just the one line change, or the one line change + refactor, or with all my hash table changes. On my GPU (1080ti) I consistently see a +5-10% performance improvement. Code:
$ ./srsieve2cl_clean --platform=1 -W0 -G1 -fA -p 1e9 -P 20e8 -i 4b9m7_n_1e9.abc Sieve started: 1e9 < p < 2e9 with 21137 terms (100012 < n < 249988, k*9^n-7) (expecting 684 factors) CPU time: 35.73 sec. (0.64 sieving) (0.98 cores) GPU time: 34.83 sec. Primes tested: 47423488. Factors found: 659. Remaining terms: 20478. Time: 36.45 seconds. Segmentation fault (core dumped) $ ./srsieve2cl --platform=1 -W0 -G1 -fA -p 1e9 -P 20e8 -i 4b9m7_n_1e9.abc Sieve started: 1e9 < p < 2e9 with 21137 terms (100012 < n < 249988, k*9^n-7) (expecting 684 factors) CPU time: 33.44 sec. (0.60 sieving) (0.98 cores) GPU time: 32.26 sec. Primes tested: 47423488. Factors found: 659. Remaining terms: 20478. Time: 33.84 seconds. |
|
![]() |
![]() |
![]() |
#766 |
"Mark"
Apr 2003
Between here and the
13×557 Posts |
![]()
Is suggest testing with 1, 10, 10, and 1000 sequences. You can generate sequences or you can use the CRUS page to find sequences that will provide more meaningful results. Use -n1e5 -N2e5. You can then test -p1e9 -p1e10. It will take longer to run, but reduces impact of other things you might be doing on your computer.
I would like to see before and after for CPU and GPU. With srsieve2, no -W. With srsieve2cl, no -G. Assuming all looks good, can you e-mail me the files that you have changed. It would make it easier for me to evaluate what you have changed. Also send me the starting sequence files so that I can do my own testing. I hope that I am not asking too much from you. I do intend to incorporate your changes into the code base. I could also make you a developer on the project so that you can commit directly to sourceforge. |
![]() |
![]() |
![]() |
#767 | ||
"Seth"
Apr 2019
7558 Posts |
![]() Quote:
Quote:
When i run `./srsieve2 -fA -p 1e8 -p 2e8 -n1e5 -N2e5 -s"118392*51^n+1" ` I get an error with Code:
srsieve2: sierpinski_riesel/CisOneWithOneSequenceHelper.cpp:282: void CisOneWithOneSequenceHelper::MakeLadder(uint16_t*, uint32_t): Assertion `qListLen+1 < ii_BestQ' failed. |
||
![]() |
![]() |
![]() |
#768 |
"Mark"
Apr 2003
Between here and the
13·557 Posts |
![]()
I suggest that you start by sieving the sequence to 1e8, then continue sieving with the output file that was generated. For example:
./srsieve2 -P1e6 -n1e5 -N2e5 -s"118392*51^n+1" followed by: ./srsieve2 -P2e8 -ib51_n.abcd The first execution will remove most terms. The second will provide the stats we care about. What you input might be a problem because you are overriding the starting value for p. At p=1e6 it re-computes bestQ. |
![]() |
![]() |
![]() |
#769 |
"Seth"
Apr 2019
1111011012 Posts |
![]()
Results
I pulled down a few CRUS sequences and choose 1, 10, 100, 1000 k for random bases (I attach these files) All of this was run on a Ryzen 3900x with no other load, I tested -p 1e8 to -P 2e8 (to get results faster) but can test larger ranges if this doesn't convince you. Code:
$ wc crus_seqs_rand* 1000 3000 15978 crus_seqs_rand1000.txt 100 300 1534 crus_seqs_rand100.txt 10 30 158 crus_seqs_rand10.txt 1 3 13 crus_seqs_rand1.txt $ cat crus_seqs_randX.txt | awk -F", " '{ print "-s\"" $1 "*" $2 "^n" $3 "\"" }' | tr '\n' ' ' > seqsX.txt $ cat seqs10.txt -s"3677878*3^n-1" -s"6793112*3^n-1" -s"10463066*3^n-1" -s"10789522*3^n-1" -s"11033634*3^n-1" -s"16874152*3^n-1" -s"18137648*3^n-1" -s"20379336*3^n-1" -s"21368582*3^n-1" -s"29140796*3^n-1" $ echo "./srsieve2 -fA -p 1e8 -P 2e8 -n1e5 -N2e5 $(cat seqs10.txt)" ./srsieve2 -fA -p 1e8 -p 2e8 -n1e5 -N2e5 -s"3677878*3^n-1" -s"6793112*3^n-1" -s"10463066*3^n-1" -s"10789522*3^n-1" -s"11033634*3^n-1" -s"16874152*3^n-1" -s"18137648*3^n-1" -s"20379336*3^n-1" -s"21368582*3^n-1" -s"29140796*3^n-1" $ eval "./srsieve2 -fA -p 1e8 -P 2e8 -n1e5 -N2e5 $(cat seqs10.txt)" Code:
$ ./srsieve2_clean -fA -p 1e8 -P 2e8 -n1e5 -N2e5 -s"3677878*3^n-1" -s"6793112*3^n-1" -s"10463066*3^n-1" -s"10789522*3^n-1" -s"11033634*3^n-1" -s"16874152*3^n-1" -s"18137648*3^n-1" -s"20379336*3^n-1" -s"21368582*3^n-1" -s"29140796*3^n-1" Sieve started: 1e8 < p < 2e8 with 1000010 terms (100000 < n < 200000, k*3^n-1) (expecting 36264 factors) CPU time: 58.77 sec. (0.05 sieving) (1.00 cores) Primes tested: 5317484. Factors found: 36406. Remaining terms: 963604. Time: 58.60 seconds. $ ./srsieve2 -fA -p 1e8 -P 2e8 -n1e5 -N2e5 -s"3677878*3^n-1" -s"6793112*3^n-1" -s"10463066*3^n-1" -s"10789522*3^n-1" -s"11033634*3^n-1" -s"16874152*3^n-1" -s"18137648*3^n-1" -s"20379336*3^n-1" -s"21368582*3^n-1" -s"29140796*3^n-1" Sieve started: 1e8 < p < 2e8 with 1000010 terms (100000 < n < 200000, k*3^n-1) (expecting 36264 factors) CPU time: 47.23 sec. (0.05 sieving) (1.00 cores) Primes tested: 5317484. Factors found: 36406. Remaining terms: 963604. Time: 47.19 seconds. # With -W8 CPU time: 62.63 sec. (0.50 sieving) (7.84 cores) vs CPU time: 48.75 sec. (0.54 sieving) (7.79 cores) Code:
$ eval ./srsieve2_clean -fA -p 1e8 -P 2e8 -n1e5 -N2e5 $(cat seqs100.txt) Sieve started: 1e8 < p < 2e8 with 10000100 terms (100000 < n < 200000, k*31^n+1) (expecting 362645 factors) p=130433617, 26.90K p/sec, 142003 factors found at 2.321K f/sec (last 1 min), 30.4% done. ETC 2022-11-17 13:48 p=193974931, 27.34K p/sec, 347298 factors found at 1.488K f/sec (last 1 min), 93.9% done. ETC 2022-11-17 13:48 CPU time: 195.54 sec. (0.05 sieving) (1.00 cores) Primes tested: 5317484. Factors found: 362592. Remaining terms: 9637508. Time: 194.95 seconds. $ eval ./srsieve2 -fA -p 1e8 -P 2e8 -n1e5 -N2e5 $(cat seqs100.txt) Sieve started: 1e8 < p < 2e8 with 10000100 terms (100000 < n < 200000, k*31^n+1) (expecting 362645 factors) p=136261487, 32.20K p/sec, 165361 factors found at 2.720K f/sec (last 1 min), 36.2% done. ETC 2022-11-17 13:52 p=173591807, 32.33K p/sec, 290747 factors found at 2.048K f/sec (last 1 min), 73.5% done. ETC 2022-11-17 13:52 CPU time: 164.43 sec. (0.04 sieving) (1.00 cores) Primes tested: 5317484. Factors found: 362592. Remaining terms: 9637508. Time: 163.99 seconds. # With -W8 CPU time: 206.01 sec. (0.55 sieving) (7.92 cores) vs CPU time: 172.62 sec. (0.50 sieving) (7.90 cores) Code:
$ eval ./srsieve2_clean -W8 -fA -p 1e8 -P 2e8 -n1e5 -N2e5 $(cat seqs1000.txt) p=146758463, 41.42K p/sec, 2043151 factors found at 4.210K f/sec (last 1 min), 46.7% done. ETC 2022-11-17 13:59 p=194516929, 41.36K p/sec, 3489737 factors found at 2.971K f/sec (last 1 min), 94.5% done. ETC 2022-11-17 13:59 CPU time: 1026.07 sec. (0.61 sieving) (7.96 cores) Primes tested: 5317484. Factors found: 3626818. Remaining terms: 96374182. Time: 128.82 seconds. $ eval ./srsieve2 -W8 -fA -p 1e8 -P 2e8 -n1e5 -N2e5 $(cat seqs1000.txt) p=150323629, 44.44K p/sec, 2167391 factors found at 4.458K f/sec (last 1 min), 50.3% done. ETC 2022-11-17 14:02 CPU time: 957.16 sec. (0.71 sieving) (7.96 cores) Primes tested: 5317484. Factors found: 3626818. Remaining terms: 96374182. Time: 120.17 seconds. |
![]() |
![]() |
![]() |
#770 |
"Seth"
Apr 2019
17×29 Posts |
![]()
Better test runner script
Code:
rm temp*.out for N in {1,10,100,100}; do cat "crus_seqs_rand${N}.txt" | awk -F", " '{ print "-s\"" $1 "*" $2 "^n" $3 "\"" }' | tr '\n' ' ' > "seqs${N}.txt" wc "seqs${N}.txt" echo eval ./srsieve2_clean -P 1e8 -n1e5 -N2e5 $(cat seqs${N}.txt) -o temp_${N}.in echo eval "./srsieve2_clean -P 2e8 -i temp_${N}.in -o temp_${N}_befor1.out echo eval "./srsieve2_clean -P 1e9 -W8 -i temp_${N}.in -o temp_${N}_befor8.out echo eval "./srsieve2 -P 2e8 -i temp_${N}.in -o temp_${N}_after1.out echo eval "./srsieve2 -P 1e9 -W8 -i temp_${N}.in -o temp_${N}_after8.out echo -e "\n\n" done Code:
1 Sequence (using 10x higher bounds; 1e9, 1e10) Before CPU time: 21.90 sec. (0.26 sieving) (1.20 cores) Primes tested: 40351310. Factors found: 225. Remaining terms: 2731. Time: 18.20 seconds. CPU time: 219.52 sec. (27.57 sieving) (8.05 cores) Primes tested: 444556287. Factors found: 517. Remaining terms: 2439. Time: 27.26 seconds. After CPU time: 20.96 sec. (0.27 sieving) (1.20 cores) Primes tested: 40351310. Factors found: 225. Remaining terms: 2731. Time: 17.45 seconds. CPU time: 212.39 sec. (28.63 sieving) (8.03 cores) Primes tested: 444556287. Factors found: 517. Remaining terms: 2439. Time: 26.41 seconds. 10 Sequences Before CPU time: 25.28 sec. (0.06 sieving) (1.00 cores) Primes tested: 5317484. Factors found: 757. Remaining terms: 19928. Time: 25.23 seconds. CPU time: 231.81 sec. (3.57 sieving) (7.91 cores) (This is 9X the work so expected that it takes longer) Primes tested: 45086080. Factors found: 2277. Remaining terms: 18408. Time: 29.30 seconds. After CPU time: 22.14 sec. (0.05 sieving) (1.00 cores) Primes tested: 5317484. Factors found: 757. Remaining terms: 19928. Time: 22.13 seconds. CPU time: 202.03 sec. (4.27 sieving) (7.91 cores) Primes tested: 45086080. Factors found: 2277. Remaining terms: 18408. Time: 25.52 seconds. 100 Sequence Before CPU time: 135.06 sec. (0.05 sieving) (1.00 cores) Primes tested: 5317484. Factors found: 21073. Remaining terms: 563114. Time: 134.57 seconds. CPU time: 1223.59 sec. (4.01 sieving) (7.96 cores) Primes tested: 45086080. Factors found: 64678. Remaining terms: 519509. Time: 153.59 seconds. After CPU time: 127.34 sec. (0.06 sieving) (1.00 cores) Primes tested: 5317484. Factors found: 21073. Remaining terms: 563114. Time: 126.80 seconds. CPU time: 1155.34 sec. (4.17 sieving) (7.96 cores) Primes tested: 45086080. Factors found: 64678. Remaining terms: 519509. Time: 144.99 seconds. 1000 Sequence (sieved to 11e7 for no -W, 2e8 for -W8) Before CPU time: 84.85 sec. (0.00 sieving) (1.02 cores) Primes tested: 541856. Factors found: 23720. Remaining terms: 4588834. Time: 83.03 seconds. CPU time: 869.99 sec. (0.42 sieving) (7.97 cores) Primes tested: 5317484. Factors found: 167203. Remaining terms: 4445351. Time: 109.13 seconds. After CPU time: 85.95 sec. (0.00 sieving) (1.02 cores) Primes tested: 541856. Factors found: 23720. Remaining terms: 4588834. Time: 84.07 seconds. CPU time: 848.70 sec. (0.51 sieving) (7.97 cores) Primes tested: 5317484. Factors found: 167203. Remaining terms: 4445351. Time: 106.48 seconds. I ran the single sequence to high bounds (1e9, 1e10). It is the only sequence not using generic logic and doesn't show a speedup. The 10/100 sequence results are faster, the 1/1000 sequence results are equal. I suspect the single sequence is because it's not using the generic logic. Maybe with 1000 sequences I'm hitting cache limits (I could profile with perf if it was important to verify that thesis) Last fiddled with by SethTro on 2022-11-17 at 23:18 |
![]() |
![]() |