![]() |
![]() |
#89 | |
"Ed Hall"
Dec 2009
Adirondack Mtns
2·5·521 Posts |
![]() Quote:
I did get the microSD card added as swap, but I needed help from kruoli in the linux sub-forum. top now shows nearly 40G for swap space, but it is all totally free for this c145. |
|
![]() |
![]() |
![]() |
#90 |
Jul 2003
So Cal
25·34 Posts |
![]()
You almost had enough. It ran out while trying to allocate working memory for the spmv library. Recompile with VBITS=128 and it should fit, even if it's not optimal. (Don't forget to copy the .ptx and .so files.)
|
![]() |
![]() |
![]() |
#91 | |
"Ed Hall"
Dec 2009
Adirondack Mtns
121328 Posts |
![]() Quote:
I still need to do some more testing and find out where the crossover is, but all this is encouraging. |
|
![]() |
![]() |
![]() |
#92 |
"Ed Hall"
Dec 2009
Adirondack Mtns
10100010110102 Posts |
![]()
Sorry if these questions are annoying:
I've been playing with my K20Xm card now for a little bit and, of course, it isn't "good enough." I can get more of them at reasonable prices, but, why, If they aren't? And, most of my card capable machines don't have an extra fan connector, which would be needed for a K20Xm. Comparing a GTX 980, the memory is the same, so I still wouldn't be able to run larger matrices. Is the matrix size increase proportional in a manner I could estimate? e.g. 5 more digits doubles the matrix? If a GTX 1080 with 11GB would only give me 5 more digits, I couldn't consider it worth the cost. Is there a similar estimation for target_density? I currently use t_d 70 so the CADO-NFS clients can move to a subsequent server sooner, but I haven't empirically determined if that is best. I'm not sure if this might be a typo, but while the 980 shows a much better performance overall, the FP64 (double) performance only shows 189.4 GFLOPS (1:32), while for the K20Xm, it is shown as 1,312 GFLOPS (1:3). Would that be of significance in LA solving? It's been mentioned that the K80 consists of two cards that are each a little better than the K20Xm. How much larger matrices might I be able to run with mpi across both sections of a 24GB card? |
![]() |
![]() |
![]() |
#93 |
"Ed Hall"
Dec 2009
Adirondack Mtns
2·5·521 Posts |
![]()
Any familiarity with the Tesla M40 24GB for Msieve LA? That would be about 4x memory for 2x cost over the K20Xm.
|
![]() |
![]() |
![]() |
#94 |
Jul 2003
So Cal
25·34 Posts |
![]()
I sieve enough to use target_density of at least 100-110 as it brings down the matrix size. An 11 GB card can likely handle matrices with about 10M rows (GNFS-175ish), whereas a 24GB card would take you up to around 20M rows (GNFS-184ish). With enough system memory, the newer Tesla M40 would let you go a bit higher with a significant performance penalty by storing the matrix in system memory and transferring it as needed onto the GPU.
GPU LA is entirely integer code and doesn't depend on the FP64 performance. It's written using 64-bit integer operations, but even on the latest GPUs those are implemented with 32-bit integer instructions. You lose some speed and memory efficiency splitting the matrix across two GPUs in a K80, but you should still be able to handle 9M rows or so (GNFS-174ish). |
![]() |
![]() |
![]() |
#95 |
"Ed Hall"
Dec 2009
Adirondack Mtns
2×5×521 Posts |
![]()
Thanks! That helps me a bunch. My new interest is the M40 24GB now. But, I'm not quite ready, because the machines I'd like to use don't have any extra fan connectors. I'm considering the idea of a fan powered another way - possibly from an older PATA power cable.
|
![]() |
![]() |
![]() |
#96 |
"Ed Hall"
Dec 2009
Adirondack Mtns
2×5×521 Posts |
![]()
I'm hoping to set up a machine to primarily do GPU LA with an M40 24GB card.
- Will a Core2 Duo 3.16GHz be better (or much worse) than a slower Quad core? - - When running the GPU, is LA doing anything with more than one CPU core? I only see one core in use via top. - Will 8GB of machine RAM be insufficient to feed the 24GB card? - - If insufficient, would a large swap file, via MicroSD 32GB ease the memory limit? |
![]() |
![]() |
![]() |
#97 |
Jul 2003
So Cal
25·34 Posts |
![]()
GPU LA uses only a single CPU core to do a very small part of each iteration. Likewise, filtering and traditional sqrt use only a single core. The Core2 Duo should be fine.
With a 24GB card, you should be able to solve up to around 20Mx20M matrices, which would be about 10GB in size. While transferring the matrix to the card, you need to store the entire matrix in COO plus a portion of it in CSR format. 8 GB would not be enough. 16 GB plus a swap file should be enough, but leave room for expansion later if needed. |
![]() |
![]() |
![]() |
#98 |
"Ed Hall"
Dec 2009
Adirondack Mtns
10100010110102 Posts |
![]()
Thanks! I think it would be too costly to bring the Core2 up to 16 GB, so I'll look at other options. I appreciate all the help!
|
![]() |
![]() |
![]() |
#99 |
"Ed Hall"
Dec 2009
Adirondack Mtns
2·5·521 Posts |
![]()
Sorry to annoy, but I'm having troubles getting an M40 to run. The system sees it, but not nvidia-smi or Msieve. This machine runs the k20X and an NVS-510 fine. Do I need to reinstall CUDA with the M40 in place, perhaps?
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Resume linear algebra | Timic | Msieve | 35 | 2020-10-05 23:08 |
use msieve linear algebra after CADO-NFS filtering | aein | Msieve | 2 | 2017-10-05 01:52 |
Has anyone tried linear algebra on a Threadripper yet? | fivemack | Hardware | 3 | 2017-10-03 03:11 |
Linear algebra at 600% | CRGreathouse | Msieve | 8 | 2009-08-05 07:25 |
Linear algebra proof | Damian | Math | 8 | 2007-02-12 22:25 |