mersenneforum.org Msieve GPU Linear Algebra
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

2022-02-24, 06:17   #78
frmky

Jul 2003
So Cal

23·52·13 Posts

Quote:
 Originally Posted by EdH Sorry if you're tired of these reports
Not at all! I look forward to seeing how you have gotten it to work in Colab!

2022-02-26, 00:50   #79
EdH

"Ed Hall"
Dec 2009

5·1,051 Posts

Quote:
 Originally Posted by EdH . . . I hope to do the same test with a different GPU, to compare.
Well, I spent quite a bit of time today with a T4, but I didn't let it finish, because I was (unsuccessfully) trying to get a file copy for the checkpoint file to work right, so it could be saved past a session end. However, the T4 did consistently give estimates of 2:33 for completion. This was the same matrix that took 4:19 to finish on the K80.

 2022-02-28, 03:39 #80 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 5·1,051 Posts Disappointing update. Although Colab successfully completed LA on the test set, the returned msieve.dat.dep file is corrupt according to Msieve on the local machine.
 2022-03-02, 23:58 #81 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 5·1,051 Posts I have not been playing with Colab for the last few days, due to trying to get a Tesla K20Xm working locally. I had it working with GMP-ECM, but couldn't get frmky's Msieve to run. I battled with all kinds of CUDA (9/10.2/11.x,etc.). All resisted, including the stand alone cuda 10.2 .run runfile. For some time, I lost GMP-ECM, too. But, I'm happy to mention I finally have all (GMP-ECM, Msieve and frmky's Msieve) running. I'm using CUDA 11.4, NVidia driver 470.103.71 and I had to install a shared object file from CUDA 9 (that may have been for GMP-ECM, in which I also had to disable some code in the Makefile). In any case, they are all running the K20Xm! As to performance, the limited testing seems to show nearly a halving of the time taken on my 24 thread machine, but the 40 thread machines still have an edge on the K20Xm. But, in effect, it represents an extra machine, since it can free up the others. The good part is that now that I have this local card running, I can get back to my Colab session work and have a local card to compare and help figure things out. Thank you to everyone for all the help in this and other threads!
 2022-03-05, 22:58 #82 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 148716 Posts The Colab "How I. . ." is complete. I have tested it directly from the thread and it worked as designed. The latest session was assigned a K80, which was detected correctly and its Compute Capability used during the compilation of Msieve. It can be reviewed at: How I Use a Colab GPU to Perform Msieve Linear Algebra (-nc2) Thanks everyone for all the help!
 2022-03-25, 14:17 #83 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 5·1,051 Posts I've hit a snag playing with my GPU and wonder why: Machine is Core2 Duo with 8GB RAM and GPU is K20Xm with 6GB RAM. Composite is 170 digits and the matrix was built on a separate machine, with msieve.dat.mat, msieve.fb and worktodo.ini supplied from the original alternate named files. I tried this twice. Here is the terminal display for the last try: Code: \$ ./msieve -nc2 skip_matbuild=1 -g 0 -v Msieve v. 1.54 (SVN Unversioned directory) Fri Mar 25 09:45:54 2022 random seeds: 6dc60c6a 05868252 factoring 10559103707847604096214709430530773995264391543587654452108598611359547436885517060868607845904851346765842831319837349071427368916165620453753530586945871555707605156809 (170 digits) no P-1/P+1/ECM available, skipping commencing number field sieve (170-digit input) R0: -513476789674487020805844014359613 R1: 4613148128511433126577 A0: -638650427125602136382789058618425254350 A1: 413978338424926800646481002860017 A2: 268129428386547641102884323 A3: -15312382615381572243 A4: -8137373995372 A5: 295890 skew 1.00, size 1.799e-16, alpha -5.336, combined = 2.653e-15 rroots = 5 commencing linear algebra using VBITS=256 skipping matrix build matrix starts at (0, 0) matrix is 11681047 x 11681223 (3520.8 MB) with weight 1098647874 (94.05/col) sparse part has weight 794456977 (68.01/col) saving the first 240 matrix rows for later matrix includes 256 packed rows matrix is 11680807 x 11681223 (3303.4 MB) with weight 723296676 (61.92/col) sparse part has weight 679062899 (58.13/col) using GPU 0 (Tesla K20Xm) selected card has CUDA arch 3.5 Nonzeros per block: 1750000000 converting matrix to CSR and copying it onto the GPU Killed And, here is the log: Code: Fri Mar 25 09:45:54 2022 Msieve v. 1.54 (SVN Unversioned directory) Fri Mar 25 09:45:54 2022 random seeds: 6dc60c6a 05868252 Fri Mar 25 09:45:54 2022 factoring 10559103707847604096214709430530773995264391543587654452108598611359547436885517060868607845904851346765842831319837349071427368916165620453753530586945871555707605156809 (170 digits) Fri Mar 25 09:45:55 2022 no P-1/P+1/ECM available, skipping Fri Mar 25 09:45:55 2022 commencing number field sieve (170-digit input) Fri Mar 25 09:45:55 2022 R0: -513476789674487020805844014359613 Fri Mar 25 09:45:55 2022 R1: 4613148128511433126577 Fri Mar 25 09:45:55 2022 A0: -638650427125602136382789058618425254350 Fri Mar 25 09:45:55 2022 A1: 413978338424926800646481002860017 Fri Mar 25 09:45:55 2022 A2: 268129428386547641102884323 Fri Mar 25 09:45:55 2022 A3: -15312382615381572243 Fri Mar 25 09:45:55 2022 A4: -8137373995372 Fri Mar 25 09:45:55 2022 A5: 295890 Fri Mar 25 09:45:55 2022 skew 1.00, size 1.799e-16, alpha -5.336, combined = 2.653e-15 rroots = 5 Fri Mar 25 09:45:55 2022 Fri Mar 25 09:45:55 2022 commencing linear algebra Fri Mar 25 09:45:55 2022 using VBITS=256 Fri Mar 25 09:45:55 2022 skipping matrix build Fri Mar 25 09:46:24 2022 matrix starts at (0, 0) Fri Mar 25 09:46:26 2022 matrix is 11681047 x 11681223 (3520.8 MB) with weight 1098647874 (94.05/col) Fri Mar 25 09:46:26 2022 sparse part has weight 794456977 (68.01/col) Fri Mar 25 09:46:26 2022 saving the first 240 matrix rows for later Fri Mar 25 09:46:30 2022 matrix includes 256 packed rows Fri Mar 25 09:46:35 2022 matrix is 11680807 x 11681223 (3303.4 MB) with weight 723296676 (61.92/col) Fri Mar 25 09:46:35 2022 sparse part has weight 679062899 (58.13/col) Fri Mar 25 09:46:35 2022 using GPU 0 (Tesla K20Xm) Fri Mar 25 09:46:35 2022 selected card has CUDA arch 3.5 Is it possible the CSR conversion is overrunning memory?
 2022-03-25, 23:46 #84 frmky     Jul 2003 So Cal 23×52×13 Posts That looks like the linux OOM killer. Which would mean the it has run out of available system (not GPU) memory.
2022-03-26, 00:18   #85
EdH

"Ed Hall"
Dec 2009

5·1,051 Posts

Quote:
 Originally Posted by frmky That looks like the linux OOM killer. Which would mean the it has run out of available system (not GPU) memory.
Thanks! I wondered, since it seemed the Msieve reported matrix size was similar to the nvidia-smi reported size, but that isn't case with the run I just checked. Msieve says 545 MB and nvidia-smi says 1491MiB.

I'll play a bit more with some sizes in between and see what may be the limit.

Do you think a large swap file would be of any use?

2022-03-26, 16:45   #86
chris2be8

Sep 2009

242610 Posts

Quote:
 Originally Posted by EdH Do you think a large swap file would be of any use?
Yes. I'd add 16-32Gb of swap space. Which should stop OOM killing jobs if they ask for lots of memory.

But the system could start page thrashing if they try to heavily use more memory than you have RAM. SSDs are faster than spinning disks but more prone to wearing out if heavily used.

Adding more RAM would be the best option, if the system can take it. But that costs money unless you have some spare RAM to install.

 2022-03-27, 12:31 #87 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 525510 Posts Well, more study seems to say I might not be able to get there with a 32G swap,* although I might see what happens. I tried a matrix that was built with t_d=70 for a c158 to compare times with a 40 thread machine and I got a little more info. Here's what top says about Msieve: Code:  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 21349 math55 20 0 33.7g 7.3g 47128 D 3.7 93.3 1:21.89 msieve The machine only has 8G and it would be very expensive to take it to its max at 16G, which doesn't look sufficient, either. Here's what Msieve had to say: Code: commencing linear algebra using VBITS=256 skipping matrix build matrix starts at (0, 0) matrix is 7793237 x 7793427 (2367.4 MB) with weight 742866434 (95.32/col) sparse part has weight 534863189 (68.63/col) saving the first 240 matrix rows for later matrix includes 256 packed rows matrix is 7792997 x 7793427 (2195.0 MB) with weight 483246339 (62.01/col) sparse part has weight 450716902 (57.83/col) using GPU 0 (Tesla K20Xm) selected card has CUDA arch 3.5 Nonzeros per block: 1750000000 converting matrix to CSR and copying it onto the GPU 450716902 7792997 7793427 450716902 7793427 7792997 commencing Lanczos iteration vector memory use: 1664.9 MB dense rows memory use: 237.8 MB sparse matrix memory use: 3498.2 MB memory use: 5400.9 MB error (spmv_engine.cu:78): out of memory This looks to me like the card ran out, too. The K20Xm has 6G (displayed as 5700MiB by nvidia-smi). * The machine currently has an 8G swap partition and I have a 32G microSD handy that I might try to add to the system, to both test the concept of using such a card as swap and to add the swap space if it works.
2022-03-27, 15:59   #88
chris2be8

Sep 2009

242610 Posts

Quote:
 Originally Posted by EdH The machine only has 8G and it would be very expensive to take it to its max at 16G, which doesn't look sufficient, either.
The OOM killer should put messages into syslog so check syslog and dmesg output before buying memory or spending a lot of time checking other things. I should have said to do that first in my previous post, sorry.

8Gb should be enough to solve the matrix on the CPU. I've done a GNFS c178 in 16Gb (the system has 32GB swap space as well but wasn't obviously paging).

Last fiddled with by chris2be8 on 2022-03-27 at 16:02 Reason: clarify wording

 Similar Threads Thread Thread Starter Forum Replies Last Post Timic Msieve 35 2020-10-05 23:08 aein Msieve 2 2017-10-05 01:52 fivemack Hardware 3 2017-10-03 03:11 CRGreathouse Msieve 8 2009-08-05 07:25 Damian Math 8 2007-02-12 22:25

All times are UTC. The time now is 01:54.

Sun Feb 5 01:54:58 UTC 2023 up 170 days, 23:23, 1 user, load averages: 0.79, 0.74, 0.78

Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔