mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   YAFU (https://www.mersenneforum.org/forumdisplay.php?f=96)
-   -   AVX2 Troubles with Colab Instance (https://www.mersenneforum.org/showthread.php?t=24928)

EdH 2019-11-10 04:10

AVX2 Troubles with Colab Instance
 
I seem to be having troubles with YAFU on a Colab Instance if I try to use USE_AVX2=1 as an option. I'm getting quite regular failures of the following sort:
[code]./yafu 83627958813331634770105456990581223975460530782647023599500689759334189187309703

fac: factoring 83627958813331634770105456990581223975460530782647023599500689759334189187309703
fac: using pretesting plan: normal
fac: no tune info: using qs/gnfs crossover of 93 digits
div: primes less than 10000
rho: x^2 + 3, starting 200 iterations on C80
rho: x^2 + 2, starting 200 iterations on C80
rho: x^2 + 1, starting 200 iterations on C80
pm1: starting B1 = 150K, B2 = gmp-ecm default on C80
ecm: 30/30 curves on C80, B1=2K, B2=gmp-ecm default
ecm: 74/74 curves on C80, B1=11K, B2=gmp-ecm default
ecm: 188/188 curves on C80, B1=50K, B2=gmp-ecm default, ETA: 0 sec

starting SIQS on c80: 83627958813331634770105456990581223975460530782647023599500689759334189187309703

==== sieving in progress ( 2 threads): 48096 relations needed ====
==== Press ctrl-c to abort and save state ====
[/code]and then it returns.

The CPU is:
[code]
Intel(R) Xeon(R) CPU @ 2.00GHz
[/code]and I'm using the trunk branch.

If I compile with USE_SSE41=1 and not USE_AVX2=1, I only see a failure very rarely. I am not including msieve or NFS at all.

Any help appreciated. . .

EdH 2019-11-10 19:16

Here is a pretty reproducible run with more details:

Command used:

[code]
[COLOR=#000000][FONT=monospace][COLOR=#000000]./yafu [/COLOR][COLOR=#a31515]"siqs(83627958813331634770105456990581223975460530782647023599500689759334189187309703)"[/COLOR][COLOR=#000000] -v -v -v[/COLOR]
[/FONT][/COLOR][/code][COLOR=#000000][FONT=monospace]
[COLOR=#000000]This returned immediately:[/COLOR]
[code]
11/10/19 19:04:36 v1.34.5 @ c39f9954850d, System/Build Info:
Using GMP-ECM 7.0.5-dev, Powered by GMP 6.1.2
detected Intel(R) Xeon(R) CPU @ 2.00GHz
detected L1 = 32768 bytes, L2 = 40370176 bytes, CL = 64 bytes
measured cpu frequency ~= 42.000000
using 1 random witnesses for Rabin-Miller PRP checks

===============================================================
======= Welcome to YAFU (Yet Another Factoring Utility) =======
======= bbuhrow@gmail.com =======
======= Type help at any time, or quit to quit =======
===============================================================
cached 78498 primes. pmax = 999983


>>
starting SIQS on c80: 83627958813331634770105456990581223975460530782647023599500689759334189187309703
static memory usage:
initial cycle hashtable: 16777216 bytes
initial cycle table: 160000 bytes
factor base: 960640 bytes
allocated 1784 bytes for roots
allocated 0 bytes for lower mod prime
allocated 458752 bytes for sieve lines
time to compute linear sieve roots = 0.00
starting root computation over 446 to 446
starting root computation over 446 to 446
time to compute bucket sieve roots = 0.00
allocated 1784 bytes for offsets for 446 sieving primes
allocated 1784 bytes for offsets for 446 sieving primes
finding requested range 0 to 10000000
sieving range 0 to 11010048
using 446 primes, max prime = 3162
using 2 residue classes
lines have 229376 bytes and 1835008 flags
lines broken into = 7 blocks of size 32768
blocks contain 262144 flags and cover 1572864 primes
using 465328 bytes for sieving storage
thread 0 finding primes from byte offset 0 to 114688
thread 1 finding primes from byte offset 114688 to 229376
allocating temporary space for 443347 primes between 0 and 5505024
allocating temporary space for 405442 primes between 5505024 and 11010048
computing: 85%adding 380909 primes found in thread 0
adding 283466 primes founfb bounds
small: 1024
SPV: 33
10bit: 96
11bit: 152
12bit: 272
13bit: 504
32k div 3: 664
14bit: 944
15bit: 1768
med: 2528
large: 16624
all: 48032
start primes
SPV: 241
10bit: 1087
11bit: 2027
12bit: 4157
13bit: 8221
32k div 3: 11059
14bit: 16417
15bit: 32789
med: 49393
large: 392981
memory usage during sieving:
curr_poly structure: 131152 bytes
relation buffer: 1310720 bytes
factor bases: 1698816 bytes
update data: 624416 bytes
sieve: 32768 bytes
bucket data: 1376963 bytes
memory usage during sieving:
curr_poly structure: 131152 bytes
relation buffer: 1310720 bytes
factor bases: 1698816 bytes
update data: 624416 bytes
sieve: 32768 bytes
bucket data: 1376963 bytes

==== sieve params ====
n = 81 digits, 269 bits
factor base: 48032 primes (max prime = 1241407)
single large prime cutoff: 117933665 (95 * pmax)
double large prime range from 41 to 49 bits
double large prime range from 1541091339649 to 338024385079292
allocating 7 large prime slices of factor base
buckets hold 2048 elements
using AVX2 enabled 32k sieve core
sieve interval: 12 blocks of size 32768
polynomial A has ~ 10 factors
using multiplier of 7
using SPV correction of 20 bits, starting at offset 33
trial factoring cutoff at 88 bits

==== sieving in progress ( 2 threads): 48096 relations needed ====
==== Press ctrl-c to abort and save state ====
[/code]This is factor.log:[/FONT][/COLOR][code][COLOR=#000000][FONT=monospace]11/10/19 19:04:36 v1.34.5 @ c39f9954850d, starting SIQS on c80: 83627958813331634770105456990581223975460530782647023599500689759334189187309703
[/FONT][/COLOR][COLOR=#000000][FONT=monospace]11/10/19 19:04:36 v1.34.5 @ c39f9954850d, random seeds: 2503899283, 1201291079 [/FONT][/COLOR][/code]EDIT: I tried this on a home machine:
[code]
. . .
==== sieving in progress (1 thread): 48096 relations needed ====
==== Press ctrl-c to abort and save state ====
Segmentation fault (core dumped)
[/code]AVX2 doesn't need GCC 7, does it?

bsquared 2019-11-11 17:21

Do you get the same error if you run with the /branches/wip/ version of yafu instead of trunk with AVX2?

EdH 2019-11-11 18:13

[QUOTE=bsquared;530299]Do you get the same error if you run with the /branches/wip/ version of yafu instead of trunk with AVX2?[/QUOTE]
I can't get it compiled and have to run ATM. I'll play more later:
[code] In function `_trail_zcnt64':
/content/yafu/include/arith.h:102: undefined reference to `_BitScanForward64'
/content/yafu/include/arith.h:102: undefined reference to `_BitScanForward64'
factor/squfof.o: In function `_lead_zcnt64':
/content/yafu/include/arith.h:110: undefined reference to `_BitScanReverse64'
arith/arith3.o: In function `_trail_zcnt64':
/content/yafu/include/arith.h:102: undefined reference to `_BitScanForward64'
/content/yafu/include/arith.h:102: undefined reference to `_BitScanForward64'
/content/yafu/include/arith.h:102: undefined reference to `_BitScanForward64'
top/eratosthenes/primes.o: In function `_trail_zcnt64':
/content/yafu/include/arith.h:102: undefined reference to `_BitScanForward64'
collect2: error: ld returned 1 exit status
Makefile:359: recipe for target 'all' failed
make: *** [all] Error 1
[/code]

EdH 2019-11-11 22:44

With my limited knowledge I haven't been able to get past the above error(s).

GCC is version 7.4.0. I commented out "CC = gcc-7.3.0" in the Makefile, which was aborting the compile.

EdH 2019-11-12 01:49

I tried to go back a couple revisions, but still no luck with AVX2, only SSE41.
[code]
top/eratosthenes/primes.c:354:11: note: called from here
| _pdep_u64(x2, 0xaaaaaaaaaaaaaaaa);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/lib/gcc/x86_64-linux-gnu/7/include/immintrin.h:83:0,
from include/soe.h:27,
from top/eratosthenes/primes.c:15:
/usr/lib/gcc/x86_64-linux-gnu/7/include/bmi2intrin.h:69:1: error: inlining failed in call to always_inline \u2018_pdep_u64\u2019: target specific option mismatch
_pdep_u64 (unsigned long long __X, unsigned long long __Y)
^~~~~~~~~
top/eratosthenes/primes.c:353:12: note: called from here
return _pdep_u64(x1, 0x5555555555555555)
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<builtin>: recipe for target 'top/eratosthenes/primes.o' failed
make: *** [top/eratosthenes/primes.o] Error 1
[/code]


All times are UTC. The time now is 02:45.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.