mersenneforum.org Msieve GPU Linear Algebra
 Register FAQ Search Today's Posts Mark Forums Read

 2021-10-26, 09:35 #67 charybdis     Apr 2020 13×71 Posts What was your general impression of 34-bit vs 33-bit? Will the extra bit allow slightly larger jobs to be run as I'd hoped?
2021-10-26, 15:47   #68
VBCurtis

"Curtis"
Feb 2005
Riverside, CA

562210 Posts

Quote:
 Originally Posted by frmky 2,2174M is in LA, so here's one more data point. Running on eight NVLink-connected V100's, It'll take a bit longer due to queue logistics, but hopefully it'll be done within the week.
How many relations did you collect? Was the unique ratio better than 2,2174L's? The matrices came out pretty similar in size, so a comparison of relations counts (raw and unique) gives a nice 33 vs 34 data point.

 2021-10-26, 18:06 #69 frmky     Jul 2003 So Cal 2,593 Posts For 2,2174L we sieved from 20M - 6B, and collected 1.36B relations. This gave 734M uniques, so about 46% duplicates. For 2,2174M we sieved from 20M - 4B, and collected 2.19B relations. This gave 1.29B uniques, so about 41% duplicates. However, we sieved a considerably narrower range of q, and it was overall much faster.
 2021-10-27, 03:14 #70 LaurV Romulan Interpreter     "name field" Jun 2011 Thailand 282116 Posts [offtopic] I changed the thread title. The old one made me nostalgic every time someone posted in it... The new title is easier to search too, as the thread contains a lot of useful info... [/offtopic] Last fiddled with by LaurV on 2021-10-27 at 03:17
2021-10-31, 18:58   #71
frmky

Jul 2003
So Cal

50418 Posts

Quote:
 Originally Posted by frmky 2,2174M is in LA, so here's one more data point.
It's done.

 2022-02-20, 15:14 #72 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 2×5×521 Posts I'm contemplating playing with Colab to see if it could be used with smaller matrices. But I wonder if there is really any worth. If I do everything but LA locally and only upload the necessary files for the matrix work, I'm still looking at a pretty large relations file for anything of value. But, I'm currently looking at more than a day of local CPU LA for ~c170 candidates. If I could knock that down to a few hours, maybe it would be "fun" to try. The assigned GPUs vary widely as well. My last two experiments (sessions with GPU ECM) yielded a P100 and a K80. I do normally get some longer session times, but it's not guaranteed. Also, I may have only been getting half the card. (I'm still confused on shader/core/sm/etc. If my source is correct the K80 is only CUDA 3.7. Is this current enough to work? Would d/ling the checkpoint file at regular intervals be enough to be able to restart a timed out session later? What else would I need to consider? Sorry for the questions. Thanks for any help. An extra question: Since the K80 is only CUDA 3.7 architecture, would it even be worth obtaining one? It seems the current minimum is at 3.5 and I'd hate to have another obsolete card right after getting one.
 2022-02-21, 02:38 #73 frmky     Jul 2003 So Cal 259310 Posts Yes, it will work on a K80. My updated version requires CC 3.5 or greater. You don't need to transfer the large relations file. Do this: 1. Complete the filtering and build the matrix locally. You can stop it manually once you see "commencing Lanczos iteration". 2. Transfer the ini, fb, and mat files (and mat.idx if using multiple GPUs with MPI, not covered here) to the GPU node. 3. On the GPU node, start the LA with options like ./msieve -nc2 skip_matbuild=1 -g 0 -v 4. You can interrupt it and restart it with "-ncr -g 0". 5. Once it's complete, transfer the dep file to the local node and run sqrt with -nc3 as usual. The local and GPU msieve binaries can be compiled with different values for VBITS since the LA is run entirely using the GPU binary. And yes, you just need the chk file in addition to the other files above to restart. A K80 is a dual GPU card, so without using MPI you will only be using half the card. And each half is only a little bit faster than a K20. It will be slower than a P100 as you would expect.
 2022-02-21, 03:29 #74 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 2·5·521 Posts Thanks frmky! This helps a bunch. I will pursue the Colab session. I also have a 3.5 card to play with, but it only has 2GB. Not sure if that's enough to even get a small matrix into. I'm off to study. . . Last fiddled with by EdH on 2022-02-21 at 13:54 Reason: It's only 3.0! (frown)
2022-02-22, 23:40   #76
EdH

"Ed Hall"
Dec 2009

145A16 Posts

Quote:
 Originally Posted by EdH . . . I may try again later to get Msieve to process the test case, since at this point I have the needed files in Google Drive, but the practicality is in doubt. Thank you for the assistance. I will surely put this to use when I finally acquire a usable CUDA GPU. (I'm even eying some K20s ATM.)
I'm going to claim success!

I got a Colab session to run Msieve LA on a Tesla T4! I didn't let it complete, but the log claims:
Code:
Tue Feb 22 22:48:53 2022  linear algebra at 0.0%, ETA 3h44m
The best time I could get for a 40 threaded Xeon was about twice that long.

I was able to compress the .mat file to almost half the size, but it still takes an hour to upload it to Google Drive and a little bit of time to decompress it. (Others may be able to upload a lot faster.)

The actual details are much more complicated than my other sessions, so I need to work quite a bit on them before I can publish them. As to the earlier comments of practicality, I will have to study this further for my use. On one hand, it takes a lot of manual intervention and timely success is not guaranteed. On the other hand, all of this work being done by Colab is letting the local machines perform other work. Perhaps the value can be realized for larger jobs.

I don't seem to be getting the screen output I expected from the -v option.

Is there a way to redirect the checkpoint file? I couldn't find an option that I thought existed.

Thanks again for all the help.

 2022-02-24, 01:24 #77 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 521010 Posts Sorry if you're tired of these reports, but here's another: I have a full-fledged Colab session that works through completion of LA. I let a c157 finish today, that I had recently run on my 20c/40t Xeon. The times were nearly identical: Code: Xeon 04:17:41 elapsed time Colab 04:19:08 elapsed time I hope to do the same test with a different GPU, to compare.

 Similar Threads Thread Thread Starter Forum Replies Last Post Timic Msieve 35 2020-10-05 23:08 aein Msieve 2 2017-10-05 01:52 fivemack Hardware 3 2017-10-03 03:11 CRGreathouse Msieve 8 2009-08-05 07:25 Damian Math 8 2007-02-12 22:25

All times are UTC. The time now is 10:33.

Fri Jan 27 10:33:55 UTC 2023 up 162 days, 8:02, 0 users, load averages: 1.08, 1.20, 1.14