20191020, 07:24  #1 
Feb 2019
2_{8} Posts 
GNFS matrix vector multiplications using GPUs
As the this paper mentioned:
The Block Wiedemann (BW) and the Block Lanczos (BL) algorithms are frequently used to solve sparse linear systems over GF(2) and Iterative sparse matrixvector multiplication is the most time consuming operation of these approaches. On the other hand, LightSpMV is a CUDAcompatible sparse matrixvector multiplication (SpMv) algorithm. Now, my question is this that, there exist any implementation of GNFS algorithm using GPU SpMV? Whats your opinion about such implementation speed compared with non GPU SpMV? 
20191020, 15:58  #2 
"Curtis"
Feb 2005
Riverside, CA
2·3·733 Posts 
Nobody has (publicly) implemented GNFS matrixsolving on GPU.
The biggest problem, as I understand it, is that the matrices of interesting problems range from 10 to 50+ GB, while GPUs are rather memory limited. Either data transfer is a hurdle that kills speed gains, or the GPU can only solve matrices that already don't take all that long (say, under GNFS150). The speedup for small problems would be welcome, but doesn't advance the state of the art. 
20191020, 16:49  #3 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK
2^{7}×37 Posts 
https://www.mersenneforum.org/showthread.php?t=24190 (small matrice)
Also Greg manage to run msieve on those intel GPU... can’t find the thread.... Last fiddled with by pinhodecarlos on 20191020 at 16:54 
20191020, 20:39  #4 
Tribal Bullet
Oct 2004
3489_{10} Posts 
There is a branch of Msieve that uses CUDA to implement block Lanczos on Nvidia GPUs. We were making steady progress before I had to put it down in 2013; the smaller size of GPU memory can be worked around by tiling the matrix across a GPU cluster. Early measurements indicated the performance on a hot GPU at the time was pretty encouraging, but getting the full benefit of CUDA would have required a second restructuring of the code (over and above the first one that let CUDA be used).
One complication with block Lanczos is that we need both regular and transpose matrix multiplies, and the consensus at the time was that maximum performance would require two versions of the matrix to be stored, in regular and transpose order. Obviously that makes the memory capacity issues twice as bad. 
20191021, 20:36  #5  
Feb 2019
2 Posts 
Quote:
Thanks. But as this paper mentioned: Quote:


20191022, 00:05  #6 
Tribal Bullet
Oct 2004
3×1,163 Posts 
Sparse matrix multiplies on GPU have become a lot more sophisticated since 2011 (incidentally $75 for an ebook is about the most expensive I have ever seen). See older versions of the moderngpu library and the CUB library, though the latter would need modifications to work in GF(2).

20191022, 01:27  #7 
"Curtis"
Feb 2005
Riverside, CA
1000100101110_{2} Posts 
Good! I look forward to testing this code, when it is made public. If RSA170 can be done in 1.5GB, the software will be quite useful.

20191022, 06:22  #8 
Romulan Interpreter
Jun 2011
Thailand
3×2,957 Posts 
We would personally donate $75 for Jason's P book, if this would result in improving msieve/yafu to use our (always plenty) GPUs (as opposite to CPUs which are always scarce!). Send us the instructions.

20191022, 07:17  #9 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK
2^{7}×37 Posts 
Anyone please send me the DOI’s and I’ll see what I can do.

20191022, 07:47  #10  
Oct 2018
3×5 Posts 
Quote:
Is a later version of the paper compared to the conference paper in the original post. The paper does seem to focus on block Wiedemann, but parts should still be of interest for block Lanczos. 

20191022, 11:32  #11  
"Carlos Pinho"
Oct 2011
Milton Keynes, UK
4736_{10} Posts 
Quote:
https://drive.google.com/file/d/0B6W...w?usp=drivesdk Last fiddled with by pinhodecarlos on 20191022 at 12:02 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
ARM SVE 2048 bit vector  hansl  Hardware  4  20190607 13:13 
Matrix data for GNFS 190+ on NFS@home  VBCurtis  NFS@Home  5  20190428 05:59 
GNFS matrix stage on GPU  fivemack  GPU Computing  0  20160714 15:44 
Initial vector in Lanczos  Robert Holmes  Miscellaneous Math  4  20100911 01:34 
Intel Advanced Vector Extensions (256bit SSE)  nuutti  Programming  3  20080408 18:01 