mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > NFS@Home

Reply
 
Thread Tools
Old 2022-06-08, 17:48   #1
bur
 
bur's Avatar
 
Aug 2020
79*6581e-4;3*2539e-3

58010 Posts
Default 5p2_950M doable on 24 cores?

Somewhat embarrisingly I underestimated how different a 172 digits composite I could nicely do on a 10 core i10 incl sieving is from just the post-processing step of the 192 digits 5p2_950M.


The server supposed to do it is a 24 core Epyc 7401P with 128 GB RAM. Is it idiotic to try that or not? I wouldn't mind having it busy for 3 weeks or so, but if it's months I'm getting worried about instability or the hoster (Hetzner) deciding they don't like that kind of usage.
bur is offline   Reply With Quote
Old 2022-06-08, 17:59   #2
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

2×3×181 Posts
Default

This should be okay. The c204 from the last team sieve took a month on 16 cores, and that was much more difficult.
kruoli is online now   Reply With Quote
Old 2022-06-08, 18:12   #3
charybdis
 
charybdis's Avatar
 
Apr 2020

797 Posts
Default

Yeah, you'll be fine. As a rule of thumb, matrix solving time roughly multiplies by 4.5 for each doubling of the matrix dimensions.
charybdis is offline   Reply With Quote
Old 2022-06-08, 19:49   #4
bur
 
bur's Avatar
 
Aug 2020
79*6581e-4;3*2539e-3

22·5·29 Posts
Default

Ok, thanks. I panicked a bit... ;)
bur is offline   Reply With Quote
Old 2022-06-08, 21:06   #5
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

22×17×79 Posts
Default

Your machine is plenty. A C192 should produce a matrix around 30M dimensions, which ought to solve on a 24GB ram machine. My 16-core Ryzen would take 10 days or so; your machine might complete it in a week, give or take a day or two. I'm not sure how more RAM channels helps speed, so you might be as fast as 5 days if you use the whole machine?
VBCurtis is offline   Reply With Quote
Old 2022-06-13, 08:46   #6
bur
 
bur's Avatar
 
Aug 2020
79*6581e-4;3*2539e-3

22×5×29 Posts
Default

So, the matrix was build successfully using TD=130:

Code:
Mon Jun 13 09:13:59 2022  matrix includes 64 packed rows
Mon Jun 13 09:14:03 2022  matrix is 31962806 x 31963031 (15913.7 MB) with weight 4274411069 (133.73/col)
Mon Jun 13 09:14:03 2022  sparse part has weight 3852042710 (120.52/col)
Mon Jun 13 09:14:03 2022  using block size 8192 and superblock size 6291456 for processor cache size 65536 kB
Mon Jun 13 09:16:54 2022  commencing Lanczos iteration (20 threads)
Mon Jun 13 09:16:54 2022  memory use: 15106.0 MB

linear algebra completed 56625 of 31963031 dimensions (0.2%, ETA 805h36m)
ETA might decrease somewhat but it's in the range of 4-5 weeks (using 20 threads). The 7401P is not that fast, I would 20 cores of it to 10 cores of an i10-10900K, a bit slower rather. So is the ETA ok, or should the matrix be smaller, would it make sense to filter again with TD=140?

It's using about 20 GB RAM btw.

Last fiddled with by bur on 2022-06-13 at 08:49
bur is offline   Reply With Quote
Old 2022-06-13, 09:00   #7
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

2×3×181 Posts
Default

What does found 21680656 cycles, need 21619192 say for you?

I would suggest trying different ways to execute msieve like in the 3,748+ team sieve, maybe trying MPI. What I found:
  • VBITS=256 is usually faster.
  • Do not use hyperthreads.
  • Manually attach the threads of msieve to physical CPUs.
  • If using MPI on EPYC/Zen, use --map-by l3cache. For me, MPI was slower as running msieve without it, but this might be different for you. It's worth testing.
  • Building GMP yourself and/or tuning it is usually not worth it or might even slow things down. I guess this does not apply if you are an expert with GMP; I am not.
kruoli is online now   Reply With Quote
Old 2022-06-13, 09:12   #8
bur
 
bur's Avatar
 
Aug 2020
79*6581e-4;3*2539e-3

22·5·29 Posts
Default

Quote:
Originally Posted by kruoli View Post
What does found 21680656 cycles, need 21619192 say for you?
found 32078384 cycles, need 31964190.

Quote:
Manually attach the threads of msieve to physical CPUs.
I'll try that, but it's usually said that, contrary to the windows scheduler, the linux cpu scheduler does a good job and just to leave it to its own devices. HT shouldn't be used since I have 20 threads and 24 cores.

I don't have the hardware to use MPI, just this one server.

I'm mainly wondering if the dimensions are ok, it's more than the 30M vbcurtis mentioned. Decreasing that would have the biggest impact, I guess.


edit: Browsing through old logs, it seems 800 h for a 30M matrix is very long. Does LA time increase linearly with dimensions?

I also noticed that the threads are not always running at 100%. Average according to htop is 18.5 cores, i.e. 92.5%. They even regularly switch from running to sleeping. It's not a general problem, yafu runs ECM with 100%+ utilization. Is that normal behavior for msieve?

Last fiddled with by bur on 2022-06-13 at 09:53
bur is offline   Reply With Quote
Old 2022-06-13, 10:03   #9
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

2·3·181 Posts
Default

You can use MPI on a single machine to optimize the usage of the single CPU's "apartments".

I cannot really comment on the matrix size since I am not experienced enough with this. Have you had a look at the according NFS@home post-processing thread to get a glimpse of what others got with similar size numbers and similar TD?

As Robert pointed out, matrix solving time roughly multiplies by 4.5 for each doubling of the matrix dimensions, and since I do not think you will be able to decrease the dimensions much further, I would leave it running. It might be that the time possible to save is smaller than the time it would take to optimize the matrix.

Please double check you msieve filtering invocation; I had made an error more than once where the TD was ignored on the command line because of wrong parameter order. But msieve will log the TD if it detects it correctly.

Regarding Linux scheduler: Yes, often, it is better than the Windows one. But in no means perfect! Especially when working with CPUs that have divided L3 caches, chiplets or in case you would have multiple CPUs, manually setting the affinity will usually help immensely. For "basic" CPUs, I see only a few percent improvement usually. But this changed again drastically if you run multiple things in parallel on a single machine..
kruoli is online now   Reply With Quote
Old 2022-06-13, 10:07   #10
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

100001111102 Posts
Default

Quote:
Originally Posted by bur View Post
I also noticed that the threads are not always running at 100%. Average according to htop is 18.5 cores, i.e. 92.5%. They even regularly switch from running to sleeping. It's not a general problem, yafu runs ECM with 100%+ utilization. Is that normal behavior for msieve?
Yes, it seems really slow. Is this with VBITS=128? What happens if you try 24 threads? Is memory configured as octa channel? Please give MPI a try on that machine! You would need one process per L3 cache.
kruoli is online now   Reply With Quote
Old 2022-06-13, 11:04   #11
bur
 
bur's Avatar
 
Aug 2020
79*6581e-4;3*2539e-3

10010001002 Posts
Default

I didn't know MPI even helps on one CPU, if only due to overhead. VBITS is a compiler setting? No idea about octa channel. Isn't that a hardware thing that I can't change anyway?

I will leave it running for now, don't really have time to get into MPI at the moment and I suspect it just is that slow, now that I remembered the exponential increase in time for LA you mentioned. Swellman even estimated much more than 30 days.

If someone could confirm that it's normal that msieve tasks sleep a lot? That's the only thing I find really weird.

Quote:
Please double check you msieve filtering invocation; I had made an error more than once where the TD was ignored on the command line because of wrong parameter order. But msieve will log the TD if it detects it correctly.
Yes, I made that mistake before. These arguments have to be immediately after the -nc option.
bur is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Is there any sensible auxiliary task for HT logical cores when physical cores already used for PRP? hansl Information & Answers 5 2019-06-17 14:07
More cores or less. Math31415 Hardware 6 2019-01-16 18:51
Cannot use two cores abelianbhaskar Information & Answers 3 2018-05-28 15:40
Is an online exercise game not based on trust doable? jasong jasong 1 2013-04-07 05:55
CPU cores Unregistered Information & Answers 7 2009-11-02 08:27

All times are UTC. The time now is 17:38.


Thu Aug 11 17:38:44 UTC 2022 up 35 days, 12:26, 3 users, load averages: 1.84, 1.44, 1.31

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔