20220327, 16:19  #89  
"Ed Hall"
Dec 2009
Adirondack Mtns
2·5·521 Posts 
Quote:
I did get the microSD card added as swap, but I needed help from kruoli in the linux subforum. top now shows nearly 40G for swap space, but it is all totally free for this c145. 

20220327, 17:36  #90 
Jul 2003
So Cal
2^{5}·3^{4} Posts 
You almost had enough. It ran out while trying to allocate working memory for the spmv library. Recompile with VBITS=128 and it should fit, even if it's not optimal. (Don't forget to copy the .ptx and .so files.)

20220327, 21:13  #91  
"Ed Hall"
Dec 2009
Adirondack Mtns
12132_{8} Posts 
Quote:
I still need to do some more testing and find out where the crossover is, but all this is encouraging. 

20220408, 13:11  #92 
"Ed Hall"
Dec 2009
Adirondack Mtns
1010001011010_{2} Posts 
Sorry if these questions are annoying:
I've been playing with my K20Xm card now for a little bit and, of course, it isn't "good enough." I can get more of them at reasonable prices, but, why, If they aren't? And, most of my card capable machines don't have an extra fan connector, which would be needed for a K20Xm. Comparing a GTX 980, the memory is the same, so I still wouldn't be able to run larger matrices. Is the matrix size increase proportional in a manner I could estimate? e.g. 5 more digits doubles the matrix? If a GTX 1080 with 11GB would only give me 5 more digits, I couldn't consider it worth the cost. Is there a similar estimation for target_density? I currently use t_d 70 so the CADONFS clients can move to a subsequent server sooner, but I haven't empirically determined if that is best. I'm not sure if this might be a typo, but while the 980 shows a much better performance overall, the FP64 (double) performance only shows 189.4 GFLOPS (1:32), while for the K20Xm, it is shown as 1,312 GFLOPS (1:3). Would that be of significance in LA solving? It's been mentioned that the K80 consists of two cards that are each a little better than the K20Xm. How much larger matrices might I be able to run with mpi across both sections of a 24GB card? 
20220409, 12:42  #93 
"Ed Hall"
Dec 2009
Adirondack Mtns
2·5·521 Posts 
Any familiarity with the Tesla M40 24GB for Msieve LA? That would be about 4x memory for 2x cost over the K20Xm.

20220409, 19:45  #94 
Jul 2003
So Cal
2^{5}·3^{4} Posts 
I sieve enough to use target_density of at least 100110 as it brings down the matrix size. An 11 GB card can likely handle matrices with about 10M rows (GNFS175ish), whereas a 24GB card would take you up to around 20M rows (GNFS184ish). With enough system memory, the newer Tesla M40 would let you go a bit higher with a significant performance penalty by storing the matrix in system memory and transferring it as needed onto the GPU.
GPU LA is entirely integer code and doesn't depend on the FP64 performance. It's written using 64bit integer operations, but even on the latest GPUs those are implemented with 32bit integer instructions. You lose some speed and memory efficiency splitting the matrix across two GPUs in a K80, but you should still be able to handle 9M rows or so (GNFS174ish). 
20220409, 20:29  #95 
"Ed Hall"
Dec 2009
Adirondack Mtns
2×5×521 Posts 
Thanks! That helps me a bunch. My new interest is the M40 24GB now. But, I'm not quite ready, because the machines I'd like to use don't have any extra fan connectors. I'm considering the idea of a fan powered another way  possibly from an older PATA power cable.

20220501, 02:08  #96 
"Ed Hall"
Dec 2009
Adirondack Mtns
2×5×521 Posts 
I'm hoping to set up a machine to primarily do GPU LA with an M40 24GB card.
 Will a Core2 Duo 3.16GHz be better (or much worse) than a slower Quad core?   When running the GPU, is LA doing anything with more than one CPU core? I only see one core in use via top.  Will 8GB of machine RAM be insufficient to feed the 24GB card?   If insufficient, would a large swap file, via MicroSD 32GB ease the memory limit? 
20220501, 17:33  #97 
Jul 2003
So Cal
2^{5}·3^{4} Posts 
GPU LA uses only a single CPU core to do a very small part of each iteration. Likewise, filtering and traditional sqrt use only a single core. The Core2 Duo should be fine.
With a 24GB card, you should be able to solve up to around 20Mx20M matrices, which would be about 10GB in size. While transferring the matrix to the card, you need to store the entire matrix in COO plus a portion of it in CSR format. 8 GB would not be enough. 16 GB plus a swap file should be enough, but leave room for expansion later if needed. 
20220501, 18:50  #98 
"Ed Hall"
Dec 2009
Adirondack Mtns
1010001011010_{2} Posts 
Thanks! I think it would be too costly to bring the Core2 up to 16 GB, so I'll look at other options. I appreciate all the help!

20220509, 20:09  #99 
"Ed Hall"
Dec 2009
Adirondack Mtns
2·5·521 Posts 
Sorry to annoy, but I'm having troubles getting an M40 to run. The system sees it, but not nvidiasmi or Msieve. This machine runs the k20X and an NVS510 fine. Do I need to reinstall CUDA with the M40 in place, perhaps?

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Resume linear algebra  Timic  Msieve  35  20201005 23:08 
use msieve linear algebra after CADONFS filtering  aein  Msieve  2  20171005 01:52 
Has anyone tried linear algebra on a Threadripper yet?  fivemack  Hardware  3  20171003 03:11 
Linear algebra at 600%  CRGreathouse  Msieve  8  20090805 07:25 
Linear algebra proof  Damian  Math  8  20070212 22:25 