mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > NFS@Home

Reply
 
Thread Tools
Old 2022-06-13, 11:09   #12
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

1,087 Posts
Default

Yes, VBITS will be set in the make command of msieve. Sometimes MPI may help with data locality, but it also might induce overhead, but directions are possible. Yes, octa channel is hardware, it depends on whether this is your own machine or not. If you own it, you can install eight DIMMs in the according slots and this will enable octa channel memory.

What do you mean by msieve sleeping a lot? Going with your estimation of 92.5 % efficiency, I would say that this is normal for more than 16 threads.
kruoli is offline   Reply With Quote
Old 2022-06-13, 11:36   #13
bur
 
bur's Avatar
 
Aug 2020
79*6581e-4;3*2539e-3

7×83 Posts
Default

Ok, having looked at old logs I can say this is sub-average speed but nothing strange. A 15M matrix took 140 hours, a 13 M 130 hours. So the 800 h is fine, as I said the 7401P is not that fast. So as long as the matrix size is ok, I don't think any further optimization makes sense.

"Sleeping a lot" as in every few seconds they all go to "S" and then switch back. Htop average is 18.5/20 cores utilization.
bur is offline   Reply With Quote
Old 2022-06-13, 11:46   #14
swellman
 
swellman's Avatar
 
Jun 2012

22×32×101 Posts
Default

Quote:
Originally Posted by bur View Post

Yes, I made that mistake before. These arguments have to be immediately after the -nc option.
See this post.
swellman is online now   Reply With Quote
Old 2022-06-13, 17:10   #15
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

2×2,689 Posts
Default

32M is close enough to my guess of 30M to consider the matrix a typical size / normal for this size of job.

800 hours sounds high; I bet if you tried the list of ideas in the last few posts you'd find 10-30% more speed. The good news is that when you do find the time to test ideas, the fastest invocation will be fastest on all future jobs.

For (poor) comparison: I'm running my first job on a new 5950x, 16 core 2-channel RAM. a 35M matrix is taking 420 hr, so I would do a 32M matrix in around 360hr on an otherwise-idle CPU. My cores are much faster than that EPYC's, but I still would expect the EPYC to be relatively faster than the clockspeed comparison due to all its memory bandwidth.
VBCurtis is offline   Reply With Quote
Old 2022-06-13, 17:40   #16
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

26×79 Posts
Default

I recall msieve not utilising in full a thread or a core, maybe the reference for the sleeping comes from this. Some threads/cores are awaiting for others to finish.
pinhodecarlos is online now   Reply With Quote
Old 2022-06-14, 07:01   #17
bur
 
bur's Avatar
 
Aug 2020
79*6581e-4;3*2539e-3

10010001012 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
800 hours sounds high; I bet if you tried the list of ideas in the last few posts you'd find 10-30% more speed. The good news is that when you do find the time to test ideas, the fastest invocation will be fastest on all future jobs.
Other than MPI that would be:
  • recompiling with VBITS=256
  • assigning to physical cores
  • not use HT
?

Regarding HT, if the threads aren't fully utilizing the physical cores, wouldn't that be an ideal situation for HT?

If I try something, I can just close msieve with Ctrl+C and restart with -ncr? Will I loose everything not checkpointed, i.e. would it make sense to wait for a checkpoint?
bur is offline   Reply With Quote
Old 2022-06-14, 14:21   #18
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

2×2,689 Posts
Default

The more threads you assign to msieve, the more waiting there is. Break a task into 20 pieces instead of 10 pieces, and wait for everyone to finish their slice before anyone can proceed; while each task is half as big, there would be a lot of waiting. One experiments to find how to reduce this waiting (such as assigning cores with taskset). Hyperthreads aren't always bad for msieve, just usually- they introduce more variability into how long each slice takes, so more waiting happens.

Yes to using -ncr. You don't lose work when closing via ctrl-c; msieve writes a checkpoint upon closure.
VBCurtis is offline   Reply With Quote
Old 2022-06-14, 18:31   #19
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

2·23·53 Posts
Default

Note, though, that you can't restart if you change VBITS. You would have to start the linear algebra from the beginning.
frmky is offline   Reply With Quote
Old 2022-06-15, 06:52   #20
bur
 
bur's Avatar
 
Aug 2020
79*6581e-4;3*2539e-3

7×83 Posts
Default

Thanks, that's what I figured, so I didn't do that but just assigned cores. ETA went down for a while and now it's up again. I also tried assigning the range of physical cores, but that didn't do anything noticeable.

But actually, I'm not sure there's really that much to gain. 24 cores vs 10 cores, the 7401P is still 10-15% slower than an i10 10900k - and I'm only using 20/24. And a recent LA by swellman with 8 threads had 400+ hours for a 17M matrix. That would correspond to nearly 1600 hours for 31M, if I'm not mistaken.

I can later on offer the matrix for benchmarking, if someone's interested.
bur is offline   Reply With Quote
Old 2022-06-15, 09:34   #21
swellman
 
swellman's Avatar
 
Jun 2012

363610 Posts
Default

Quote:
Originally Posted by bur View Post
But actually, I'm not sure there's really that much to gain. 24 cores vs 10 cores, the 7401P is still 10-15% slower than an i10 10900k - and I'm only using 20/24. And a recent LA by swellman with 8 threads had 400+ hours for a 17M matrix. That would correspond to nearly 1600 hours for 31M, if I'm not mistaken.
Keep in mind that recent result was performed on an 8 year old laptop with an i7-3520M CPU @2.9 GHz and 16 Gb of RAM. Not sure it has enough memory to even move a 31M matrix into LA never mind solve it. So a direct comparison between your machine and my old box likely doesn’t work. But I agree with your scaling estimate if we ignore the RAM limitations.

Not sure you can gain much more performance out of your machine, though I do sincerely hope I’m wrong.
swellman is online now   Reply With Quote
Old 2022-07-19, 06:44   #22
bur
 
bur's Avatar
 
Aug 2020
79*6581e-4;3*2539e-3

7×83 Posts
Default

It's finally done, LA took 800 hours.

I noticed the machine is generally not very effective if you use a lot of threads for one task. As an example:

Single threaded LLR took 4.5 ms per iteration
2 threads = 2.9 ms / iter
20 threads = 1.2 ms / iter

The cores start to see only 70-90 % utilization once a lot of threads are involved.

I still have it until end of August, so I guess I'll use it for sieving.


Quote:
Sun Jul 17 03:04:52 2022 commencing square root phase
Sun Jul 17 03:04:52 2022 Sqrt: Handling dependencies 1 to 8
I saw this in a log of frmky, is it a result of using MPI? Seems like parallel processing of more than one dependency?

Last fiddled with by bur on 2022-07-19 at 06:50 Reason: what's it with the line breaks added all the time...
bur is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Is there any sensible auxiliary task for HT logical cores when physical cores already used for PRP? hansl Information & Answers 5 2019-06-17 14:07
More cores or less. Math31415 Hardware 6 2019-01-16 18:51
Cannot use two cores abelianbhaskar Information & Answers 3 2018-05-28 15:40
Is an online exercise game not based on trust doable? jasong jasong 1 2013-04-07 05:55
CPU cores Unregistered Information & Answers 7 2009-11-02 08:27

All times are UTC. The time now is 09:59.


Sat Aug 13 09:59:44 UTC 2022 up 37 days, 4:47, 2 users, load averages: 1.24, 1.17, 1.09

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔