mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2021-09-09, 20:28   #34
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

2·3·7·197 Posts
Default

We are not sure if this is interesting or not.

13_2_909m1 - Near-Cunningham - SNFS(274)

This is a big (33 bit?) job. The msieve.dat file, uncompressed and with duplicates and bad relations removed, is 49GB.
Code:
$ ls -lh
total 105G
-rw-rw-r--. 1 m m  36G Sep  8 20:33 13_2_909m1.dat.gz
drwx------. 2 m m   50 Aug  4 12:17 cub
-r--------. 1 m m  29K Aug  4 12:16 lanczos_kernel.ptx
-r-x------. 1 m m 3.4M Aug  4 12:16 msieve
-rw-rw-r--. 1 m m  49G Sep  8 22:02 msieve.dat
-rw-rw-r--. 1 m m 4.2G Sep  9 14:17 msieve.dat.bak.chk
-rw-rw-r--. 1 m m 4.2G Sep  9 14:54 msieve.dat.chk
-rw-rw-r--. 1 m m 969M Sep  9 12:11 msieve.dat.cyc
-rw-rw-r--. 1 m m  12G Sep  9 12:11 msieve.dat.mat
-rw-rw-r--. 1 m m  415 Sep  2 19:15 msieve.fb
-rw-rw-r--. 1 m m  13K Sep  9 15:10 msieve.log
-r--------. 1 m m 108K Aug  4 12:16 stage1_core.ptx
-rw-rw-r--. 1 m m  264 Sep  2 19:15 worktodo.ini
There are ~442M relations. Setting block_nnz to 500M resulted in an OOM error, so we used 1B instead.
Code:
commencing linear algebra
using VBITS=256
skipping matrix build
matrix starts at (0, 0)
matrix is 27521024 x 27521194 (12901.7 MB) with weight 3687594306 (133.99/col)
sparse part has weight 3106904079 (112.89/col)
saving the first 240 matrix rows for later
matrix includes 256 packed rows
matrix is 27520784 x 27521194 (12034.4 MB) with weight 2848207923 (103.49/col)
sparse part has weight 2714419599 (98.63/col)
using GPU 0 (Quadro RTX 8000)
selected card has CUDA arch 7.5
Nonzeros per block: 1000000000
converting matrix to CSR and copying it onto the GPU
1000000013 27520784 9680444
1000000057 27520784 11295968
714419529 27520784 6544782
1039631367 27521194 100000
917599197 27521194 3552480
757189035 27521194 23868304
commencing Lanczos iteration
vector memory use: 5879.2 MB
dense rows memory use: 839.9 MB
sparse matrix memory use: 21339.3 MB
memory use: 28058.3 MB
Allocated 123.0 MB for SpMV library
Allocated 127.8 MB for SpMV library
linear algebra at 0.0%, ETA 49h57m7521194 dimensions (0.0%, ETA 49h57m)    
checkpointing every 570000 dimensions   
linear algebra completed 925789 of 27521194 dimensions (3.4%, ETA 45h13m)    
received signal 2; shutting down
linear algebra completed 926044 of 27521194 dimensions (3.4%, ETA 45h12m)    
lanczos halted after 3628 iterations (dim = 926044)
BLanczosTime: 5932
elapsed time 01:38:53

current factorization was interrupted
So the LA step is under 50 hours which seems pretty fast! (We have no plans to complete it since it is assigned to VBCurtis.)

We have the raw files saved if there are other configurations worth investigating. If so, just let us know!

Attached Files
File Type: log msieve.log (12.1 KB, 4 views)
Xyzzy is offline   Reply With Quote
Old 2021-09-09, 21:00   #35
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

493810 Posts
Default

It's a 32/33 hybrid, with a healthy amount of oversieving (I wanted a matrix below 30M dimensions, success!).

I'm impressed that fits on your card, and 50hr is pretty amazing- I just started the matrix a few hr ago on a 10-core Ivy Bridge, ETA is 365 hr.

If you have the free cycles to run it, please be my guest! That 20+ core weeks saved is enough to ECM the next candidate.
VBCurtis is online now   Reply With Quote
Old 2021-09-13, 05:22   #36
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

24·33·5 Posts
Default

I spent time with Nsight Compute looking at the SpMV kernel. As expected for SpMV it's memory bandwidth limited, so increasing occupancy to hide latency should help. I adjusted parameters to reduce both register and shared memory use, which increased the occupancy. This yielded a runtime improvement of only about 5% on the V100 but it may differ on other cards. I also increased the default block_nnz to 1750M to reduce global memory use a bit.
frmky is online now   Reply With Quote
Old 2021-09-16, 06:00   #37
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

87016 Posts
Default

Today I expanded the allowed values of VBITS to any of 64, 128, 192, 256, 320, 384, 448, or 512. This works on both CPUs and GPUs, but I don't expect much, if any, speedup on CPUs. As a GPU benchmark, I tested a 42.1M matrix on two NVLink-connected V100's. Here are the results.
Code:
VBITS Time (hours)
 64   109.5
128   63.75
192   50
256   40.25
320   40.25
384   37.75
448   40.25
512   37.25
Combined with the new SpMV parameters, I get the best times with VBITS of 384 and 512, but 384 uses less memory. Overall, I get about 6% better performance than with VBITS=256.
frmky is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
msieve on KNL frmky Msieve 3 2016-11-06 11:45
Using msieve with c burrobert Msieve 9 2012-10-26 22:46
msieve help em99010pepe Msieve 23 2009-09-27 16:13
fun with msieve masser Sierpinski/Riesel Base 5 83 2007-11-17 19:39

All times are UTC. The time now is 22:43.


Thu Sep 16 22:43:58 UTC 2021 up 55 days, 17:12, 0 users, load averages: 2.60, 2.78, 2.65

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.