20191110, 04:10  #1 
"Ed Hall"
Dec 2009
Adirondack Mtns
111000110010_{2} Posts 
AVX2 Troubles with Colab Instance
I seem to be having troubles with YAFU on a Colab Instance if I try to use USE_AVX2=1 as an option. I'm getting quite regular failures of the following sort:
Code:
./yafu 83627958813331634770105456990581223975460530782647023599500689759334189187309703 fac: factoring 83627958813331634770105456990581223975460530782647023599500689759334189187309703 fac: using pretesting plan: normal fac: no tune info: using qs/gnfs crossover of 93 digits div: primes less than 10000 rho: x^2 + 3, starting 200 iterations on C80 rho: x^2 + 2, starting 200 iterations on C80 rho: x^2 + 1, starting 200 iterations on C80 pm1: starting B1 = 150K, B2 = gmpecm default on C80 ecm: 30/30 curves on C80, B1=2K, B2=gmpecm default ecm: 74/74 curves on C80, B1=11K, B2=gmpecm default ecm: 188/188 curves on C80, B1=50K, B2=gmpecm default, ETA: 0 sec starting SIQS on c80: 83627958813331634770105456990581223975460530782647023599500689759334189187309703 ==== sieving in progress ( 2 threads): 48096 relations needed ==== ==== Press ctrlc to abort and save state ==== The CPU is: Code:
Intel(R) Xeon(R) CPU @ 2.00GHz If I compile with USE_SSE41=1 and not USE_AVX2=1, I only see a failure very rarely. I am not including msieve or NFS at all. Any help appreciated. . . 
20191110, 19:16  #2 
"Ed Hall"
Dec 2009
Adirondack Mtns
2×23×79 Posts 
Here is a pretty reproducible run with more details:
Command used: Code:
./yafu "siqs(83627958813331634770105456990581223975460530782647023599500689759334189187309703)" v v v This returned immediately: Code:
11/10/19 19:04:36 v1.34.5 @ c39f9954850d, System/Build Info: Using GMPECM 7.0.5dev, Powered by GMP 6.1.2 detected Intel(R) Xeon(R) CPU @ 2.00GHz detected L1 = 32768 bytes, L2 = 40370176 bytes, CL = 64 bytes measured cpu frequency ~= 42.000000 using 1 random witnesses for RabinMiller PRP checks =============================================================== ======= Welcome to YAFU (Yet Another Factoring Utility) ======= ======= bbuhrow@gmail.com ======= ======= Type help at any time, or quit to quit ======= =============================================================== cached 78498 primes. pmax = 999983 >> starting SIQS on c80: 83627958813331634770105456990581223975460530782647023599500689759334189187309703 static memory usage: initial cycle hashtable: 16777216 bytes initial cycle table: 160000 bytes factor base: 960640 bytes allocated 1784 bytes for roots allocated 0 bytes for lower mod prime allocated 458752 bytes for sieve lines time to compute linear sieve roots = 0.00 starting root computation over 446 to 446 starting root computation over 446 to 446 time to compute bucket sieve roots = 0.00 allocated 1784 bytes for offsets for 446 sieving primes allocated 1784 bytes for offsets for 446 sieving primes finding requested range 0 to 10000000 sieving range 0 to 11010048 using 446 primes, max prime = 3162 using 2 residue classes lines have 229376 bytes and 1835008 flags lines broken into = 7 blocks of size 32768 blocks contain 262144 flags and cover 1572864 primes using 465328 bytes for sieving storage thread 0 finding primes from byte offset 0 to 114688 thread 1 finding primes from byte offset 114688 to 229376 allocating temporary space for 443347 primes between 0 and 5505024 allocating temporary space for 405442 primes between 5505024 and 11010048 computing: 85%adding 380909 primes found in thread 0 adding 283466 primes founfb bounds small: 1024 SPV: 33 10bit: 96 11bit: 152 12bit: 272 13bit: 504 32k div 3: 664 14bit: 944 15bit: 1768 med: 2528 large: 16624 all: 48032 start primes SPV: 241 10bit: 1087 11bit: 2027 12bit: 4157 13bit: 8221 32k div 3: 11059 14bit: 16417 15bit: 32789 med: 49393 large: 392981 memory usage during sieving: curr_poly structure: 131152 bytes relation buffer: 1310720 bytes factor bases: 1698816 bytes update data: 624416 bytes sieve: 32768 bytes bucket data: 1376963 bytes memory usage during sieving: curr_poly structure: 131152 bytes relation buffer: 1310720 bytes factor bases: 1698816 bytes update data: 624416 bytes sieve: 32768 bytes bucket data: 1376963 bytes ==== sieve params ==== n = 81 digits, 269 bits factor base: 48032 primes (max prime = 1241407) single large prime cutoff: 117933665 (95 * pmax) double large prime range from 41 to 49 bits double large prime range from 1541091339649 to 338024385079292 allocating 7 large prime slices of factor base buckets hold 2048 elements using AVX2 enabled 32k sieve core sieve interval: 12 blocks of size 32768 polynomial A has ~ 10 factors using multiplier of 7 using SPV correction of 20 bits, starting at offset 33 trial factoring cutoff at 88 bits ==== sieving in progress ( 2 threads): 48096 relations needed ==== ==== Press ctrlc to abort and save state ==== Code:
11/10/19 19:04:36 v1.34.5 @ c39f9954850d, starting SIQS on c80: 83627958813331634770105456990581223975460530782647023599500689759334189187309703 11/10/19 19:04:36 v1.34.5 @ c39f9954850d, random seeds: 2503899283, 1201291079 Code:
. . . ==== sieving in progress (1 thread): 48096 relations needed ==== ==== Press ctrlc to abort and save state ==== Segmentation fault (core dumped) Last fiddled with by EdH on 20191110 at 19:24 
20191111, 17:21  #3 
"Ben"
Feb 2007
3,371 Posts 
Do you get the same error if you run with the /branches/wip/ version of yafu instead of trunk with AVX2?

20191111, 18:13  #4  
"Ed Hall"
Dec 2009
Adirondack Mtns
2×23×79 Posts 
Quote:
Code:
In function `_trail_zcnt64': /content/yafu/include/arith.h:102: undefined reference to `_BitScanForward64' /content/yafu/include/arith.h:102: undefined reference to `_BitScanForward64' factor/squfof.o: In function `_lead_zcnt64': /content/yafu/include/arith.h:110: undefined reference to `_BitScanReverse64' arith/arith3.o: In function `_trail_zcnt64': /content/yafu/include/arith.h:102: undefined reference to `_BitScanForward64' /content/yafu/include/arith.h:102: undefined reference to `_BitScanForward64' /content/yafu/include/arith.h:102: undefined reference to `_BitScanForward64' top/eratosthenes/primes.o: In function `_trail_zcnt64': /content/yafu/include/arith.h:102: undefined reference to `_BitScanForward64' collect2: error: ld returned 1 exit status Makefile:359: recipe for target 'all' failed make: *** [all] Error 1 

20191111, 22:44  #5 
"Ed Hall"
Dec 2009
Adirondack Mtns
7062_{8} Posts 
With my limited knowledge I haven't been able to get past the above error(s).
GCC is version 7.4.0. I commented out "CC = gcc7.3.0" in the Makefile, which was aborting the compile. 
20191112, 01:49  #6 
"Ed Hall"
Dec 2009
Adirondack Mtns
2·23·79 Posts 
I tried to go back a couple revisions, but still no luck with AVX2, only SSE41.
Code:
top/eratosthenes/primes.c:354:11: note: called from here  _pdep_u64(x2, 0xaaaaaaaaaaaaaaaa); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from /usr/lib/gcc/x86_64linuxgnu/7/include/immintrin.h:83:0, from include/soe.h:27, from top/eratosthenes/primes.c:15: /usr/lib/gcc/x86_64linuxgnu/7/include/bmi2intrin.h:69:1: error: inlining failed in call to always_inline \u2018_pdep_u64\u2019: target specific option mismatch _pdep_u64 (unsigned long long __X, unsigned long long __Y) ^~~~~~~~~ top/eratosthenes/primes.c:353:12: note: called from here return _pdep_u64(x1, 0x5555555555555555) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <builtin>: recipe for target 'top/eratosthenes/primes.o' failed make: *** [top/eratosthenes/primes.o] Error 1 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Colab question  David703  GPU to 72  279  20201212 01:26 
New instance types soon from AWS: nextgen C5, an FPGA instance, more GPU options  GP2  Cloud Computing  8  20201116 08:21 
How I Create a Colab Session That Factors factordb Composites with YAFU  EdH  EdH  12  20191111 17:44 
AVX2 weirdness  bsquared  Programming  1  20160117 17:26 
Haswell New Instructions / AVX2  ixfd64  Hardware  72  20130320 00:00 