mersenneforum.org Msieve GPU Linear Algebra
 Register FAQ Search Today's Posts Mark Forums Read

2022-03-27, 16:19   #89
EdH

"Ed Hall"
Dec 2009

11FF16 Posts

Quote:
 Originally Posted by chris2be8 The OOM killer should put messages into syslog so check syslog and dmesg output before buying memory or spending a lot of time checking other things. I should have said to do that first in my previous post, sorry. 8Gb should be enough to solve the matrix on the CPU. I've done a GNFS c178 in 16Gb (the system has 32GB swap space as well but wasn't obviously paging).
Thanks! I'll play more later, but for now, I'm running a c145 that nvidia-smi reports as using 1802MiB on the GPU. It is happily stomping the 40 thread CPU machine. The GPU machine started after copying the files from the CPU machine, and it is ahead with ETA 38m vs. ETA 1h 0m.

I did get the microSD card added as swap, but I needed help from kruoli in the linux sub-forum. top now shows nearly 40G for swap space, but it is all totally free for this c145.

2022-03-27, 17:36   #90
frmky

Jul 2003
So Cal

2,399 Posts

Quote:
 Originally Posted by EdH Code: commencing linear algebra using VBITS=256 ... error (spmv_engine.cu:78): out of memory
You almost had enough. It ran out while trying to allocate working memory for the spmv library. Recompile with VBITS=128 and it should fit, even if it's not optimal. (Don't forget to copy the .ptx and .so files.)

2022-03-27, 21:13   #91
EdH

"Ed Hall"
Dec 2009

17·271 Posts

Quote:
 Originally Posted by frmky You almost had enough. It ran out while trying to allocate working memory for the spmv library. Recompile with VBITS=128 and it should fit, even if it's not optimal. (Don't forget to copy the .ptx and .so files.)
Thanks! That did the trick! nvidia-smi is reporting 4999MiB / 5700 MiB and Msieve is using 2.7g of 8G. The ETA is just over 11 hours, where as the CPU took 12:30 with 32 threads. I had forgotten to edit Msieve for 40 threads on this machine.

I still need to do some more testing and find out where the crossover is, but all this is encouraging.

 2022-04-08, 13:11 #92 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 17×271 Posts Sorry if these questions are annoying: I've been playing with my K20Xm card now for a little bit and, of course, it isn't "good enough." I can get more of them at reasonable prices, but, why, If they aren't? And, most of my card capable machines don't have an extra fan connector, which would be needed for a K20Xm. Comparing a GTX 980, the memory is the same, so I still wouldn't be able to run larger matrices. Is the matrix size increase proportional in a manner I could estimate? e.g. 5 more digits doubles the matrix? If a GTX 1080 with 11GB would only give me 5 more digits, I couldn't consider it worth the cost. Is there a similar estimation for target_density? I currently use t_d 70 so the CADO-NFS clients can move to a subsequent server sooner, but I haven't empirically determined if that is best. I'm not sure if this might be a typo, but while the 980 shows a much better performance overall, the FP64 (double) performance only shows 189.4 GFLOPS (1:32), while for the K20Xm, it is shown as 1,312 GFLOPS (1:3). Would that be of significance in LA solving? It's been mentioned that the K80 consists of two cards that are each a little better than the K20Xm. How much larger matrices might I be able to run with mpi across both sections of a 24GB card?
 2022-04-09, 12:42 #93 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 17×271 Posts Any familiarity with the Tesla M40 24GB for Msieve LA? That would be about 4x memory for 2x cost over the K20Xm.
 2022-04-09, 19:45 #94 frmky     Jul 2003 So Cal 2,399 Posts I sieve enough to use target_density of at least 100-110 as it brings down the matrix size. An 11 GB card can likely handle matrices with about 10M rows (GNFS-175ish), whereas a 24GB card would take you up to around 20M rows (GNFS-184ish). With enough system memory, the newer Tesla M40 would let you go a bit higher with a significant performance penalty by storing the matrix in system memory and transferring it as needed onto the GPU. GPU LA is entirely integer code and doesn't depend on the FP64 performance. It's written using 64-bit integer operations, but even on the latest GPUs those are implemented with 32-bit integer instructions. You lose some speed and memory efficiency splitting the matrix across two GPUs in a K80, but you should still be able to handle 9M rows or so (GNFS-174ish).
 2022-04-09, 20:29 #95 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 17·271 Posts Thanks! That helps me a bunch. My new interest is the M40 24GB now. But, I'm not quite ready, because the machines I'd like to use don't have any extra fan connectors. I'm considering the idea of a fan powered another way - possibly from an older PATA power cable.
 2022-05-01, 02:08 #96 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 107778 Posts I'm hoping to set up a machine to primarily do GPU LA with an M40 24GB card. - Will a Core2 Duo 3.16GHz be better (or much worse) than a slower Quad core? - - When running the GPU, is LA doing anything with more than one CPU core? I only see one core in use via top. - Will 8GB of machine RAM be insufficient to feed the 24GB card? - - If insufficient, would a large swap file, via MicroSD 32GB ease the memory limit?
 2022-05-01, 17:33 #97 frmky     Jul 2003 So Cal 2,399 Posts GPU LA uses only a single CPU core to do a very small part of each iteration. Likewise, filtering and traditional sqrt use only a single core. The Core2 Duo should be fine. With a 24GB card, you should be able to solve up to around 20Mx20M matrices, which would be about 10GB in size. While transferring the matrix to the card, you need to store the entire matrix in COO plus a portion of it in CSR format. 8 GB would not be enough. 16 GB plus a swap file should be enough, but leave room for expansion later if needed.
 2022-05-01, 18:50 #98 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 107778 Posts Thanks! I think it would be too costly to bring the Core2 up to 16 GB, so I'll look at other options. I appreciate all the help!
 2022-05-09, 20:09 #99 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 17·271 Posts Sorry to annoy, but I'm having troubles getting an M40 to run. The system sees it, but not nvidia-smi or Msieve. This machine runs the k20X and an NVS-510 fine. Do I need to reinstall CUDA with the M40 in place, perhaps?

 Similar Threads Thread Thread Starter Forum Replies Last Post Timic Msieve 35 2020-10-05 23:08 aein Msieve 2 2017-10-05 01:52 fivemack Hardware 3 2017-10-03 03:11 CRGreathouse Msieve 8 2009-08-05 07:25 Damian Math 8 2007-02-12 22:25

All times are UTC. The time now is 19:55.

Wed Jul 6 19:56:00 UTC 2022 up 83 days, 17:57, 0 users, load averages: 2.12, 1.70, 1.48