mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2022-02-24, 06:17   #78
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

25×34 Posts
Default

Quote:
Originally Posted by EdH View Post
Sorry if you're tired of these reports
Not at all! I look forward to seeing how you have gotten it to work in Colab!
frmky is online now   Reply With Quote
Old 2022-02-26, 00:50   #79
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

2×5×521 Posts
Default

Quote:
Originally Posted by EdH View Post
. . .
I hope to do the same test with a different GPU, to compare.
Well, I spent quite a bit of time today with a T4, but I didn't let it finish, because I was (unsuccessfully) trying to get a file copy for the checkpoint file to work right, so it could be saved past a session end. However, the T4 did consistently give estimates of 2:33 for completion. This was the same matrix that took 4:19 to finish on the K80.
EdH is offline   Reply With Quote
Old 2022-02-28, 03:39   #80
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

10100010110102 Posts
Default

Disappointing update. Although Colab successfully completed LA on the test set, the returned msieve.dat.dep file is corrupt according to Msieve on the local machine.
EdH is offline   Reply With Quote
Old 2022-03-02, 23:58   #81
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

2×5×521 Posts
Default

I have not been playing with Colab for the last few days, due to trying to get a Tesla K20Xm working locally. I had it working with GMP-ECM, but couldn't get frmky's Msieve to run. I battled with all kinds of CUDA (9/10.2/11.x,etc.). All resisted, including the stand alone cuda 10.2 .run runfile. For some time, I lost GMP-ECM, too.

But, I'm happy to mention I finally have all (GMP-ECM, Msieve and frmky's Msieve) running. I'm using CUDA 11.4, NVidia driver 470.103.71 and I had to install a shared object file from CUDA 9 (that may have been for GMP-ECM, in which I also had to disable some code in the Makefile). In any case, they are all running the K20Xm!

As to performance, the limited testing seems to show nearly a halving of the time taken on my 24 thread machine, but the 40 thread machines still have an edge on the K20Xm. But, in effect, it represents an extra machine, since it can free up the others.

The good part is that now that I have this local card running, I can get back to my Colab session work and have a local card to compare and help figure things out.

Thank you to everyone for all the help in this and other threads!
EdH is offline   Reply With Quote
Old 2022-03-05, 22:58   #82
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

2×5×521 Posts
Default

The Colab "How I. . ." is complete. I have tested it directly from the thread and it worked as designed. The latest session was assigned a K80, which was detected correctly and its Compute Capability used during the compilation of Msieve.

It can be reviewed at:

How I Use a Colab GPU to Perform Msieve Linear Algebra (-nc2)

Thanks everyone for all the help!
EdH is offline   Reply With Quote
Old 2022-03-25, 14:17   #83
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

521010 Posts
Default

I've hit a snag playing with my GPU and wonder why:

Machine is Core2 Duo with 8GB RAM and GPU is K20Xm with 6GB RAM.
Composite is 170 digits and the matrix was built on a separate machine, with msieve.dat.mat, msieve.fb and worktodo.ini supplied from the original alternate named files.

I tried this twice. Here is the terminal display for the last try:
Code:
$ ./msieve -nc2 skip_matbuild=1 -g 0 -v

Msieve v. 1.54 (SVN Unversioned directory)
Fri Mar 25 09:45:54 2022
random seeds: 6dc60c6a 05868252
factoring 10559103707847604096214709430530773995264391543587654452108598611359547436885517060868607845904851346765842831319837349071427368916165620453753530586945871555707605156809 (170 digits)
no P-1/P+1/ECM available, skipping
commencing number field sieve (170-digit input)
R0: -513476789674487020805844014359613
R1: 4613148128511433126577
A0: -638650427125602136382789058618425254350
A1: 413978338424926800646481002860017
A2: 268129428386547641102884323
A3: -15312382615381572243
A4: -8137373995372
A5: 295890
skew 1.00, size 1.799e-16, alpha -5.336, combined = 2.653e-15 rroots = 5

commencing linear algebra
using VBITS=256
skipping matrix build
matrix starts at (0, 0)
matrix is 11681047 x 11681223 (3520.8 MB) with weight 1098647874 (94.05/col)
sparse part has weight 794456977 (68.01/col)
saving the first 240 matrix rows for later
matrix includes 256 packed rows
matrix is 11680807 x 11681223 (3303.4 MB) with weight 723296676 (61.92/col)
sparse part has weight 679062899 (58.13/col)
using GPU 0 (Tesla K20Xm)
selected card has CUDA arch 3.5
Nonzeros per block: 1750000000
converting matrix to CSR and copying it onto the GPU
Killed
And, here is the log:
Code:
Fri Mar 25 09:45:54 2022  Msieve v. 1.54 (SVN Unversioned directory)
Fri Mar 25 09:45:54 2022  random seeds: 6dc60c6a 05868252
Fri Mar 25 09:45:54 2022  factoring 10559103707847604096214709430530773995264391543587654452108598611359547436885517060868607845904851346765842831319837349071427368916165620453753530586945871555707605156809 (170 digits)
Fri Mar 25 09:45:55 2022  no P-1/P+1/ECM available, skipping
Fri Mar 25 09:45:55 2022  commencing number field sieve (170-digit input)
Fri Mar 25 09:45:55 2022  R0: -513476789674487020805844014359613
Fri Mar 25 09:45:55 2022  R1: 4613148128511433126577
Fri Mar 25 09:45:55 2022  A0: -638650427125602136382789058618425254350
Fri Mar 25 09:45:55 2022  A1: 413978338424926800646481002860017
Fri Mar 25 09:45:55 2022  A2: 268129428386547641102884323
Fri Mar 25 09:45:55 2022  A3: -15312382615381572243
Fri Mar 25 09:45:55 2022  A4: -8137373995372
Fri Mar 25 09:45:55 2022  A5: 295890
Fri Mar 25 09:45:55 2022  skew 1.00, size 1.799e-16, alpha -5.336, combined = 2.653e-15 rroots = 5
Fri Mar 25 09:45:55 2022  
Fri Mar 25 09:45:55 2022  commencing linear algebra
Fri Mar 25 09:45:55 2022  using VBITS=256
Fri Mar 25 09:45:55 2022  skipping matrix build
Fri Mar 25 09:46:24 2022  matrix starts at (0, 0)
Fri Mar 25 09:46:26 2022  matrix is 11681047 x 11681223 (3520.8 MB) with weight 1098647874 (94.05/col)
Fri Mar 25 09:46:26 2022  sparse part has weight 794456977 (68.01/col)
Fri Mar 25 09:46:26 2022  saving the first 240 matrix rows for later
Fri Mar 25 09:46:30 2022  matrix includes 256 packed rows
Fri Mar 25 09:46:35 2022  matrix is 11680807 x 11681223 (3303.4 MB) with weight 723296676 (61.92/col)
Fri Mar 25 09:46:35 2022  sparse part has weight 679062899 (58.13/col)
Fri Mar 25 09:46:35 2022  using GPU 0 (Tesla K20Xm)
Fri Mar 25 09:46:35 2022  selected card has CUDA arch 3.5
Is it possible the CSR conversion is overrunning memory?
EdH is offline   Reply With Quote
Old 2022-03-25, 23:46   #84
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

1010001000002 Posts
Default

That looks like the linux OOM killer. Which would mean the it has run out of available system (not GPU) memory.
frmky is online now   Reply With Quote
Old 2022-03-26, 00:18   #85
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

2×5×521 Posts
Default

Quote:
Originally Posted by frmky View Post
That looks like the linux OOM killer. Which would mean the it has run out of available system (not GPU) memory.
Thanks! I wondered, since it seemed the Msieve reported matrix size was similar to the nvidia-smi reported size, but that isn't case with the run I just checked. Msieve says 545 MB and nvidia-smi says 1491MiB.

I'll play a bit more with some sizes in between and see what may be the limit.

Do you think a large swap file would be of any use?
EdH is offline   Reply With Quote
Old 2022-03-26, 16:45   #86
chris2be8
 
chris2be8's Avatar
 
Sep 2009

52·97 Posts
Default

Quote:
Originally Posted by EdH View Post
Do you think a large swap file would be of any use?
Yes. I'd add 16-32Gb of swap space. Which should stop OOM killing jobs if they ask for lots of memory.

But the system could start page thrashing if they try to heavily use more memory than you have RAM. SSDs are faster than spinning disks but more prone to wearing out if heavily used.

Adding more RAM would be the best option, if the system can take it. But that costs money unless you have some spare RAM to install.
chris2be8 is offline   Reply With Quote
Old 2022-03-27, 12:31   #87
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

10100010110102 Posts
Default

Well, more study seems to say I might not be able to get there with a 32G swap,* although I might see what happens. I tried a matrix that was built with t_d=70 for a c158 to compare times with a 40 thread machine and I got a little more info. Here's what top says about Msieve:
Code:
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND    
 21349 math55    20   0   33.7g   7.3g  47128 D   3.7  93.3   1:21.89 msieve
The machine only has 8G and it would be very expensive to take it to its max at 16G, which doesn't look sufficient, either.

Here's what Msieve had to say:
Code:
commencing linear algebra
 using VBITS=256
skipping matrix build
matrix starts at (0, 0)
matrix is 7793237 x 7793427 (2367.4 MB) with weight 742866434 (95.32/col)
sparse part has weight 534863189 (68.63/col)
saving the first 240 matrix rows for later
matrix includes 256 packed rows
matrix is 7792997 x 7793427 (2195.0 MB) with weight 483246339 (62.01/col)
sparse part has weight 450716902 (57.83/col)
using GPU 0 (Tesla K20Xm)
selected card has CUDA arch 3.5
Nonzeros per block: 1750000000
converting matrix to CSR and copying it onto the GPU
450716902 7792997 7793427
450716902 7793427 7792997
commencing Lanczos iteration
vector memory use: 1664.9 MB
dense rows memory use: 237.8 MB
sparse matrix memory use: 3498.2 MB
memory use: 5400.9 MB
 error (spmv_engine.cu:78): out of memory
This looks to me like the card ran out, too. The K20Xm has 6G (displayed as 5700MiB by nvidia-smi).

* The machine currently has an 8G swap partition and I have a 32G microSD handy that I might try to add to the system, to both test the concept of using such a card as swap and to add the swap space if it works.
EdH is offline   Reply With Quote
Old 2022-03-27, 15:59   #88
chris2be8
 
chris2be8's Avatar
 
Sep 2009

52·97 Posts
Default

Quote:
Originally Posted by EdH View Post
The machine only has 8G and it would be very expensive to take it to its max at 16G, which doesn't look sufficient, either.
The OOM killer should put messages into syslog so check syslog and dmesg output before buying memory or spending a lot of time checking other things. I should have said to do that first in my previous post, sorry.

8Gb should be enough to solve the matrix on the CPU. I've done a GNFS c178 in 16Gb (the system has 32GB swap space as well but wasn't obviously paging).

Last fiddled with by chris2be8 on 2022-03-27 at 16:02 Reason: clarify wording
chris2be8 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Resume linear algebra Timic Msieve 35 2020-10-05 23:08
use msieve linear algebra after CADO-NFS filtering aein Msieve 2 2017-10-05 01:52
Has anyone tried linear algebra on a Threadripper yet? fivemack Hardware 3 2017-10-03 03:11
Linear algebra at 600% CRGreathouse Msieve 8 2009-08-05 07:25
Linear algebra proof Damian Math 8 2007-02-12 22:25

All times are UTC. The time now is 06:58.


Fri Jan 27 06:58:30 UTC 2023 up 162 days, 4:27, 0 users, load averages: 0.96, 0.81, 0.83

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔