![]() |
![]() |
#67 |
Apr 2020
3B616 Posts |
![]()
What was your general impression of 34-bit vs 33-bit? Will the extra bit allow slightly larger jobs to be run as I'd hoped?
|
![]() |
![]() |
![]() |
#68 |
"Curtis"
Feb 2005
Riverside, CA
165E16 Posts |
![]()
How many relations did you collect? Was the unique ratio better than 2,2174L's? The matrices came out pretty similar in size, so a comparison of relations counts (raw and unique) gives a nice 33 vs 34 data point.
|
![]() |
![]() |
![]() |
#69 |
Jul 2003
So Cal
2,621 Posts |
![]()
For 2,2174L we sieved from 20M - 6B, and collected 1.36B relations. This gave 734M uniques, so about 46% duplicates.
For 2,2174M we sieved from 20M - 4B, and collected 2.19B relations. This gave 1.29B uniques, so about 41% duplicates. However, we sieved a considerably narrower range of q, and it was overall much faster. |
![]() |
![]() |
![]() |
#70 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
101000001011012 Posts |
![]()
[offtopic] I changed the thread title. The old one made me nostalgic every time someone posted in it... The new title is easier to search too, as the thread contains a lot of useful info... [/offtopic]
Last fiddled with by LaurV on 2021-10-27 at 03:17 |
![]() |
![]() |
![]() |
#71 |
Jul 2003
So Cal
2,621 Posts |
![]() |
![]() |
![]() |
![]() |
#72 |
"Ed Hall"
Dec 2009
Adirondack Mtns
10101001010112 Posts |
![]()
I'm contemplating playing with Colab to see if it could be used with smaller matrices. But I wonder if there is really any worth.
If I do everything but LA locally and only upload the necessary files for the matrix work, I'm still looking at a pretty large relations file for anything of value. But, I'm currently looking at more than a day of local CPU LA for ~c170 candidates. If I could knock that down to a few hours, maybe it would be "fun" to try. The assigned GPUs vary widely as well. My last two experiments (sessions with GPU ECM) yielded a P100 and a K80. I do normally get some longer session times, but it's not guaranteed. Also, I may have only been getting half the card. (I'm still confused on shader/core/sm/etc. If my source is correct the K80 is only CUDA 3.7. Is this current enough to work? Would d/ling the checkpoint file at regular intervals be enough to be able to restart a timed out session later? What else would I need to consider? Sorry for the questions. Thanks for any help. An extra question: Since the K80 is only CUDA 3.7 architecture, would it even be worth obtaining one? It seems the current minimum is at 3.5 and I'd hate to have another obsolete card right after getting one. |
![]() |
![]() |
![]() |
#73 |
Jul 2003
So Cal
2,621 Posts |
![]()
Yes, it will work on a K80. My updated version requires CC 3.5 or greater.
You don't need to transfer the large relations file. Do this: 1. Complete the filtering and build the matrix locally. You can stop it manually once you see "commencing Lanczos iteration". 2. Transfer the ini, fb, and mat files (and mat.idx if using multiple GPUs with MPI, not covered here) to the GPU node. 3. On the GPU node, start the LA with options like ./msieve -nc2 skip_matbuild=1 -g 0 -v 4. You can interrupt it and restart it with "-ncr -g 0". 5. Once it's complete, transfer the dep file to the local node and run sqrt with -nc3 as usual. The local and GPU msieve binaries can be compiled with different values for VBITS since the LA is run entirely using the GPU binary. And yes, you just need the chk file in addition to the other files above to restart. A K80 is a dual GPU card, so without using MPI you will only be using half the card. And each half is only a little bit faster than a K20. It will be slower than a P100 as you would expect. |
![]() |
![]() |
![]() |
#74 |
"Ed Hall"
Dec 2009
Adirondack Mtns
152B16 Posts |
![]()
Thanks frmky! This helps a bunch. I will pursue the Colab session.
I'm off to study. . . Last fiddled with by EdH on 2022-02-21 at 13:54 Reason: It's only 3.0! (frown) |
![]() |
![]() |
![]() |
#75 |
"Ed Hall"
Dec 2009
Adirondack Mtns
5,419 Posts |
![]()
I'm saddened to report that even had I been successful with my Colab experiments, it would still be impractical.
I was able to compile Msieve for two different GPUs, a K80 (3.7) and a T4 (7.5). However, Msieve refused to understand the options although I tried all the variations I could think of in both Python and BASH scripts, with and without single/double quotes around various portions, and in a variety of orders. In all cases, Msieve simply displayed all the available options. In any case, the impracticality is that for a c160, the msieve.dat.mat file is just short of 2GB. The two tested methods of getting the file loaded into the Colab sessions were via SSH and via Google Drive. SSH took just under two hours. Uploading the file to Google Drive took just under two hours. The first method held the session open without using the GPU for anything, for which Colab complained, while the second allowed the session to start rather quickly (after the two hour upload to Google Drive). But, since a c160 created a 2GB file, I'm expecting larger matrices will just take a much longer time to load into a Colab Session. I may try again later to get Msieve to process the test case, since at this point I have the needed files in Google Drive, but the practicality is in doubt. Thank you for the assistance. I will surely put this to use when I finally acquire a usable CUDA GPU. (I'm even eying some K20s ATM.) Last fiddled with by EdH on 2022-02-22 at 01:07 Reason: spillin' |
![]() |
![]() |
![]() |
#76 | |
"Ed Hall"
Dec 2009
Adirondack Mtns
5,419 Posts |
![]() Quote:
I got a Colab session to run Msieve LA on a Tesla T4! I didn't let it complete, but the log claims: Code:
Tue Feb 22 22:48:53 2022 linear algebra at 0.0%, ETA 3h44m I was able to compress the .mat file to almost half the size, but it still takes an hour to upload it to Google Drive and a little bit of time to decompress it. (Others may be able to upload a lot faster.) The actual details are much more complicated than my other sessions, so I need to work quite a bit on them before I can publish them. As to the earlier comments of practicality, I will have to study this further for my use. On one hand, it takes a lot of manual intervention and timely success is not guaranteed. On the other hand, all of this work being done by Colab is letting the local machines perform other work. Perhaps the value can be realized for larger jobs. I don't seem to be getting the screen output I expected from the -v option. Is there a way to redirect the checkpoint file? I couldn't find an option that I thought existed. Thanks again for all the help. |
|
![]() |
![]() |
![]() |
#77 |
"Ed Hall"
Dec 2009
Adirondack Mtns
5,419 Posts |
![]()
Sorry if you're tired of these reports, but here's another:
I have a full-fledged Colab session that works through completion of LA. I let a c157 finish today, that I had recently run on my 20c/40t Xeon. The times were nearly identical: Code:
Xeon 04:17:41 elapsed time Colab 04:19:08 elapsed time |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Resume linear algebra | Timic | Msieve | 35 | 2020-10-05 23:08 |
use msieve linear algebra after CADO-NFS filtering | aein | Msieve | 2 | 2017-10-05 01:52 |
Has anyone tried linear algebra on a Threadripper yet? | fivemack | Hardware | 3 | 2017-10-03 03:11 |
Linear algebra at 600% | CRGreathouse | Msieve | 8 | 2009-08-05 07:25 |
Linear algebra proof | Damian | Math | 8 | 2007-02-12 22:25 |