mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2019-10-20, 07:24   #1
abmalakh
 
Feb 2019

28 Posts
Default GNFS matrix vector multiplications using GPUs

As the this paper mentioned:
The Block Wiedemann (BW) and the Block Lanczos (BL) algorithms are frequently used to solve sparse linear systems over GF(2) and Iterative sparse matrix-vector multiplication is the most time consuming operation of these approaches.

On the other hand, LightSpMV is a CUDA-compatible sparse matrix-vector multiplication (SpMv) algorithm.

Now, my question is this that, there exist any implementation of GNFS algorithm using GPU SpMV? Whats your opinion about such implementation speed compared with non GPU SpMV?
abmalakh is offline   Reply With Quote
Old 2019-10-20, 15:58   #2
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

439810 Posts
Default

Nobody has (publicly) implemented GNFS matrix-solving on GPU.
The biggest problem, as I understand it, is that the matrices of interesting problems range from 10 to 50+ GB, while GPUs are rather memory limited.

Either data transfer is a hurdle that kills speed gains, or the GPU can only solve matrices that already don't take all that long (say, under GNFS-150). The speedup for small problems would be welcome, but doesn't advance the state of the art.
VBCurtis is online now   Reply With Quote
Old 2019-10-20, 16:49   #3
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

27·37 Posts
Default

https://www.mersenneforum.org/showthread.php?t=24190 (small matrice)

Also Greg manage to run msieve on those intel GPU... can’t find the thread....

Last fiddled with by pinhodecarlos on 2019-10-20 at 16:54
pinhodecarlos is offline   Reply With Quote
Old 2019-10-20, 20:39   #4
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

1101101000012 Posts
Default

There is a branch of Msieve that uses CUDA to implement block Lanczos on Nvidia GPUs. We were making steady progress before I had to put it down in 2013; the smaller size of GPU memory can be worked around by tiling the matrix across a GPU cluster. Early measurements indicated the performance on a hot GPU at the time was pretty encouraging, but getting the full benefit of CUDA would have required a second restructuring of the code (over and above the first one that let CUDA be used).

One complication with block Lanczos is that we need both regular and transpose matrix multiplies, and the consensus at the time was that maximum performance would require two versions of the matrix to be stored, in regular and transpose order. Obviously that makes the memory capacity issues twice as bad.
jasonp is offline   Reply With Quote
Old 2019-10-21, 20:36   #5
abmalakh
 
Feb 2019

28 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
Nobody has (publicly) implemented GNFS matrix-solving on GPU.
The biggest problem, as I understand it, is that the matrices of interesting problems range from 10 to 50+ GB, while GPUs are rather memory limited.

Either data transfer is a hurdle that kills speed gains, or the GPU can only solve matrices that already don't take all that long (say, under GNFS-150). The speedup for small problems would be welcome, but doesn't advance the state of the art.

Thanks. But as this paper mentioned:

Quote:
In this paper we derive an efficient CUDA implementation of this operation using a newly designed hybrid sparse matrix format. This leads to speedups between 4 and 8 on a single GPU for a number of tested NFS matrices compared to an optimized multi-core implementation.
Experimental results in this paper show that using CUDA SpMV, we can improve linear algebra step. For example, one of their tables is evaluated using RSA-170 on an NVIDIA Tesla C2070 with 6 GB RAM and an NVIDIA GTX580 with 1.5 GB RAM.
abmalakh is offline   Reply With Quote
Old 2019-10-22, 00:05   #6
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

348910 Posts
Default

Sparse matrix multiplies on GPU have become a lot more sophisticated since 2011 (incidentally $75 for an e-book is about the most expensive I have ever seen). See older versions of the moderngpu library and the CUB library, though the latter would need modifications to work in GF(2).
jasonp is offline   Reply With Quote
Old 2019-10-22, 01:27   #7
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

2·3·733 Posts
Default

Quote:
Originally Posted by abmalakh View Post
For example, one of their tables is evaluated using RSA-170 on an NVIDIA Tesla C2070 with 6 GB RAM and an NVIDIA GTX580 with 1.5 GB RAM.
Good! I look forward to testing this code, when it is made public. If RSA-170 can be done in 1.5GB, the software will be quite useful.
VBCurtis is online now   Reply With Quote
Old 2019-10-22, 06:22   #8
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

887310 Posts
Default

We would personally donate $75 for Jason's P book, if this would result in improving msieve/yafu to use our (always plenty) GPUs (as opposite to CPUs which are always scarce!). Send us the instructions.
LaurV is offline   Reply With Quote
Old 2019-10-22, 07:17   #9
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

473610 Posts
Default

Anyone please send me the DOI’s and I’ll see what I can do.
pinhodecarlos is offline   Reply With Quote
Old 2019-10-22, 07:47   #10
Branger
 
Oct 2018

1510 Posts
Default

Quote:
Originally Posted by pinhodecarlos View Post
Anyone please send me the DOI’s and I’ll see what I can do.
https://onlinelibrary.wiley.com/doi/....1002/cpe.2896

Is a later version of the paper compared to the conference paper in the original post. The paper does seem to focus on block Wiedemann, but parts should still be of interest for block Lanczos.
Branger is offline   Reply With Quote
Old 2019-10-22, 11:32   #11
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

10010100000002 Posts
Default

Quote:
Originally Posted by Branger View Post
https://onlinelibrary.wiley.com/doi/....1002/cpe.2896

Is a later version of the paper compared to the conference paper in the original post. The paper does seem to focus on block Wiedemann, but parts should still be of interest for block Lanczos.
Got it but I can’t attach it through iphone. Please see below access to that paper.

https://drive.google.com/file/d/0B6W...w?usp=drivesdk

Last fiddled with by pinhodecarlos on 2019-10-22 at 12:02
pinhodecarlos is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
ARM SVE 2048 bit vector hansl Hardware 4 2019-06-07 13:13
Matrix data for GNFS 190+ on NFS@home VBCurtis NFS@Home 5 2019-04-28 05:59
GNFS matrix stage on GPU fivemack GPU Computing 0 2016-07-14 15:44
Initial vector in Lanczos Robert Holmes Miscellaneous Math 4 2010-09-11 01:34
Intel Advanced Vector Extensions (256-bit SSE) nuutti Programming 3 2008-04-08 18:01

All times are UTC. The time now is 15:53.

Wed Oct 28 15:53:35 UTC 2020 up 48 days, 13:04, 3 users, load averages: 1.60, 1.55, 1.67

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.