![]() |
![]() |
#78 |
Jul 2003
So Cal
23·52·13 Posts |
![]() |
![]() |
![]() |
![]() |
#79 |
"Ed Hall"
Dec 2009
Adirondack Mtns
5×1,051 Posts |
![]()
Well, I spent quite a bit of time today with a T4, but I didn't let it finish, because I was (unsuccessfully) trying to get a file copy for the checkpoint file to work right, so it could be saved past a session end. However, the T4 did consistently give estimates of 2:33 for completion. This was the same matrix that took 4:19 to finish on the K80.
|
![]() |
![]() |
![]() |
#80 |
"Ed Hall"
Dec 2009
Adirondack Mtns
5·1,051 Posts |
![]()
Disappointing update. Although Colab successfully completed LA on the test set, the returned msieve.dat.dep file is corrupt according to Msieve on the local machine.
![]() |
![]() |
![]() |
![]() |
#81 |
"Ed Hall"
Dec 2009
Adirondack Mtns
148716 Posts |
![]()
I have not been playing with Colab for the last few days, due to trying to get a Tesla K20Xm working locally. I had it working with GMP-ECM, but couldn't get frmky's Msieve to run. I battled with all kinds of CUDA (9/10.2/11.x,etc.). All resisted, including the stand alone cuda 10.2 .run runfile. For some time, I lost GMP-ECM, too.
But, I'm happy to mention I finally have all (GMP-ECM, Msieve and frmky's Msieve) running. I'm using CUDA 11.4, NVidia driver 470.103.71 and I had to install a shared object file from CUDA 9 (that may have been for GMP-ECM, in which I also had to disable some code in the Makefile). In any case, they are all running the K20Xm! As to performance, the limited testing seems to show nearly a halving of the time taken on my 24 thread machine, but the 40 thread machines still have an edge on the K20Xm. But, in effect, it represents an extra machine, since it can free up the others. The good part is that now that I have this local card running, I can get back to my Colab session work and have a local card to compare and help figure things out. Thank you to everyone for all the help in this and other threads! |
![]() |
![]() |
![]() |
#82 |
"Ed Hall"
Dec 2009
Adirondack Mtns
5×1,051 Posts |
![]()
The Colab "How I. . ." is complete. I have tested it directly from the thread and it worked as designed. The latest session was assigned a K80, which was detected correctly and its Compute Capability used during the compilation of Msieve.
It can be reviewed at: How I Use a Colab GPU to Perform Msieve Linear Algebra (-nc2) Thanks everyone for all the help! |
![]() |
![]() |
![]() |
#83 |
"Ed Hall"
Dec 2009
Adirondack Mtns
5×1,051 Posts |
![]()
I've hit a snag playing with my GPU and wonder why:
Machine is Core2 Duo with 8GB RAM and GPU is K20Xm with 6GB RAM. Composite is 170 digits and the matrix was built on a separate machine, with msieve.dat.mat, msieve.fb and worktodo.ini supplied from the original alternate named files. I tried this twice. Here is the terminal display for the last try: Code:
$ ./msieve -nc2 skip_matbuild=1 -g 0 -v Msieve v. 1.54 (SVN Unversioned directory) Fri Mar 25 09:45:54 2022 random seeds: 6dc60c6a 05868252 factoring 10559103707847604096214709430530773995264391543587654452108598611359547436885517060868607845904851346765842831319837349071427368916165620453753530586945871555707605156809 (170 digits) no P-1/P+1/ECM available, skipping commencing number field sieve (170-digit input) R0: -513476789674487020805844014359613 R1: 4613148128511433126577 A0: -638650427125602136382789058618425254350 A1: 413978338424926800646481002860017 A2: 268129428386547641102884323 A3: -15312382615381572243 A4: -8137373995372 A5: 295890 skew 1.00, size 1.799e-16, alpha -5.336, combined = 2.653e-15 rroots = 5 commencing linear algebra using VBITS=256 skipping matrix build matrix starts at (0, 0) matrix is 11681047 x 11681223 (3520.8 MB) with weight 1098647874 (94.05/col) sparse part has weight 794456977 (68.01/col) saving the first 240 matrix rows for later matrix includes 256 packed rows matrix is 11680807 x 11681223 (3303.4 MB) with weight 723296676 (61.92/col) sparse part has weight 679062899 (58.13/col) using GPU 0 (Tesla K20Xm) selected card has CUDA arch 3.5 Nonzeros per block: 1750000000 converting matrix to CSR and copying it onto the GPU Killed Code:
Fri Mar 25 09:45:54 2022 Msieve v. 1.54 (SVN Unversioned directory) Fri Mar 25 09:45:54 2022 random seeds: 6dc60c6a 05868252 Fri Mar 25 09:45:54 2022 factoring 10559103707847604096214709430530773995264391543587654452108598611359547436885517060868607845904851346765842831319837349071427368916165620453753530586945871555707605156809 (170 digits) Fri Mar 25 09:45:55 2022 no P-1/P+1/ECM available, skipping Fri Mar 25 09:45:55 2022 commencing number field sieve (170-digit input) Fri Mar 25 09:45:55 2022 R0: -513476789674487020805844014359613 Fri Mar 25 09:45:55 2022 R1: 4613148128511433126577 Fri Mar 25 09:45:55 2022 A0: -638650427125602136382789058618425254350 Fri Mar 25 09:45:55 2022 A1: 413978338424926800646481002860017 Fri Mar 25 09:45:55 2022 A2: 268129428386547641102884323 Fri Mar 25 09:45:55 2022 A3: -15312382615381572243 Fri Mar 25 09:45:55 2022 A4: -8137373995372 Fri Mar 25 09:45:55 2022 A5: 295890 Fri Mar 25 09:45:55 2022 skew 1.00, size 1.799e-16, alpha -5.336, combined = 2.653e-15 rroots = 5 Fri Mar 25 09:45:55 2022 Fri Mar 25 09:45:55 2022 commencing linear algebra Fri Mar 25 09:45:55 2022 using VBITS=256 Fri Mar 25 09:45:55 2022 skipping matrix build Fri Mar 25 09:46:24 2022 matrix starts at (0, 0) Fri Mar 25 09:46:26 2022 matrix is 11681047 x 11681223 (3520.8 MB) with weight 1098647874 (94.05/col) Fri Mar 25 09:46:26 2022 sparse part has weight 794456977 (68.01/col) Fri Mar 25 09:46:26 2022 saving the first 240 matrix rows for later Fri Mar 25 09:46:30 2022 matrix includes 256 packed rows Fri Mar 25 09:46:35 2022 matrix is 11680807 x 11681223 (3303.4 MB) with weight 723296676 (61.92/col) Fri Mar 25 09:46:35 2022 sparse part has weight 679062899 (58.13/col) Fri Mar 25 09:46:35 2022 using GPU 0 (Tesla K20Xm) Fri Mar 25 09:46:35 2022 selected card has CUDA arch 3.5 |
![]() |
![]() |
![]() |
#84 |
Jul 2003
So Cal
50508 Posts |
![]()
That looks like the linux OOM killer. Which would mean the it has run out of available system (not GPU) memory.
|
![]() |
![]() |
![]() |
#85 | |
"Ed Hall"
Dec 2009
Adirondack Mtns
5·1,051 Posts |
![]() Quote:
I'll play a bit more with some sizes in between and see what may be the limit. Do you think a large swap file would be of any use? |
|
![]() |
![]() |
![]() |
#86 |
Sep 2009
45728 Posts |
![]()
Yes. I'd add 16-32Gb of swap space. Which should stop OOM killing jobs if they ask for lots of memory.
But the system could start page thrashing if they try to heavily use more memory than you have RAM. SSDs are faster than spinning disks but more prone to wearing out if heavily used. Adding more RAM would be the best option, if the system can take it. But that costs money unless you have some spare RAM to install. |
![]() |
![]() |
![]() |
#87 |
"Ed Hall"
Dec 2009
Adirondack Mtns
5×1,051 Posts |
![]()
Well, more study seems to say I might not be able to get there with a 32G swap,* although I might see what happens. I tried a matrix that was built with t_d=70 for a c158 to compare times with a 40 thread machine and I got a little more info. Here's what top says about Msieve:
Code:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21349 math55 20 0 33.7g 7.3g 47128 D 3.7 93.3 1:21.89 msieve Here's what Msieve had to say: Code:
commencing linear algebra using VBITS=256 skipping matrix build matrix starts at (0, 0) matrix is 7793237 x 7793427 (2367.4 MB) with weight 742866434 (95.32/col) sparse part has weight 534863189 (68.63/col) saving the first 240 matrix rows for later matrix includes 256 packed rows matrix is 7792997 x 7793427 (2195.0 MB) with weight 483246339 (62.01/col) sparse part has weight 450716902 (57.83/col) using GPU 0 (Tesla K20Xm) selected card has CUDA arch 3.5 Nonzeros per block: 1750000000 converting matrix to CSR and copying it onto the GPU 450716902 7792997 7793427 450716902 7793427 7792997 commencing Lanczos iteration vector memory use: 1664.9 MB dense rows memory use: 237.8 MB sparse matrix memory use: 3498.2 MB memory use: 5400.9 MB error (spmv_engine.cu:78): out of memory * The machine currently has an 8G swap partition and I have a 32G microSD handy that I might try to add to the system, to both test the concept of using such a card as swap and to add the swap space if it works. |
![]() |
![]() |
![]() |
#88 | |
Sep 2009
242610 Posts |
![]() Quote:
8Gb should be enough to solve the matrix on the CPU. I've done a GNFS c178 in 16Gb (the system has 32GB swap space as well but wasn't obviously paging). Last fiddled with by chris2be8 on 2022-03-27 at 16:02 Reason: clarify wording |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Resume linear algebra | Timic | Msieve | 35 | 2020-10-05 23:08 |
use msieve linear algebra after CADO-NFS filtering | aein | Msieve | 2 | 2017-10-05 01:52 |
Has anyone tried linear algebra on a Threadripper yet? | fivemack | Hardware | 3 | 2017-10-03 03:11 |
Linear algebra at 600% | CRGreathouse | Msieve | 8 | 2009-08-05 07:25 |
Linear algebra proof | Damian | Math | 8 | 2007-02-12 22:25 |