mersenneforum.org 5p2_950M doable on 24 cores?
 Register FAQ Search Today's Posts Mark Forums Read

 2022-06-08, 17:48 #1 bur     Aug 2020 79*6581e-4;3*2539e-3 58010 Posts 5p2_950M doable on 24 cores? Somewhat embarrisingly I underestimated how different a 172 digits composite I could nicely do on a 10 core i10 incl sieving is from just the post-processing step of the 192 digits 5p2_950M. The server supposed to do it is a 24 core Epyc 7401P with 128 GB RAM. Is it idiotic to try that or not? I wouldn't mind having it busy for 3 weeks or so, but if it's months I'm getting worried about instability or the hoster (Hetzner) deciding they don't like that kind of usage.
 2022-06-08, 17:59 #2 kruoli     "Oliver" Sep 2017 Porta Westfalica, DE 2×3×181 Posts This should be okay. The c204 from the last team sieve took a month on 16 cores, and that was much more difficult.
 2022-06-08, 18:12 #3 charybdis     Apr 2020 797 Posts Yeah, you'll be fine. As a rule of thumb, matrix solving time roughly multiplies by 4.5 for each doubling of the matrix dimensions.
 2022-06-08, 19:49 #4 bur     Aug 2020 79*6581e-4;3*2539e-3 22·5·29 Posts Ok, thanks. I panicked a bit... ;)
 2022-06-08, 21:06 #5 VBCurtis     "Curtis" Feb 2005 Riverside, CA 22×17×79 Posts Your machine is plenty. A C192 should produce a matrix around 30M dimensions, which ought to solve on a 24GB ram machine. My 16-core Ryzen would take 10 days or so; your machine might complete it in a week, give or take a day or two. I'm not sure how more RAM channels helps speed, so you might be as fast as 5 days if you use the whole machine?
 2022-06-13, 08:46 #6 bur     Aug 2020 79*6581e-4;3*2539e-3 22×5×29 Posts So, the matrix was build successfully using TD=130: Code: Mon Jun 13 09:13:59 2022 matrix includes 64 packed rows Mon Jun 13 09:14:03 2022 matrix is 31962806 x 31963031 (15913.7 MB) with weight 4274411069 (133.73/col) Mon Jun 13 09:14:03 2022 sparse part has weight 3852042710 (120.52/col) Mon Jun 13 09:14:03 2022 using block size 8192 and superblock size 6291456 for processor cache size 65536 kB Mon Jun 13 09:16:54 2022 commencing Lanczos iteration (20 threads) Mon Jun 13 09:16:54 2022 memory use: 15106.0 MB linear algebra completed 56625 of 31963031 dimensions (0.2%, ETA 805h36m) ETA might decrease somewhat but it's in the range of 4-5 weeks (using 20 threads). The 7401P is not that fast, I would 20 cores of it to 10 cores of an i10-10900K, a bit slower rather. So is the ETA ok, or should the matrix be smaller, would it make sense to filter again with TD=140? It's using about 20 GB RAM btw. Last fiddled with by bur on 2022-06-13 at 08:49
 2022-06-13, 09:00 #7 kruoli     "Oliver" Sep 2017 Porta Westfalica, DE 2×3×181 Posts What does found 21680656 cycles, need 21619192 say for you? I would suggest trying different ways to execute msieve like in the 3,748+ team sieve, maybe trying MPI. What I found:VBITS=256 is usually faster. Do not use hyperthreads. Manually attach the threads of msieve to physical CPUs. If using MPI on EPYC/Zen, use --map-by l3cache. For me, MPI was slower as running msieve without it, but this might be different for you. It's worth testing. Building GMP yourself and/or tuning it is usually not worth it or might even slow things down. I guess this does not apply if you are an expert with GMP; I am not.
2022-06-13, 09:12   #8
bur

Aug 2020
79*6581e-4;3*2539e-3

22·5·29 Posts

Quote:
 Originally Posted by kruoli What does found 21680656 cycles, need 21619192 say for you?
found 32078384 cycles, need 31964190.

Quote:
 Manually attach the threads of msieve to physical CPUs.
I'll try that, but it's usually said that, contrary to the windows scheduler, the linux cpu scheduler does a good job and just to leave it to its own devices. HT shouldn't be used since I have 20 threads and 24 cores.

I don't have the hardware to use MPI, just this one server.

I'm mainly wondering if the dimensions are ok, it's more than the 30M vbcurtis mentioned. Decreasing that would have the biggest impact, I guess.

edit: Browsing through old logs, it seems 800 h for a 30M matrix is very long. Does LA time increase linearly with dimensions?

I also noticed that the threads are not always running at 100%. Average according to htop is 18.5 cores, i.e. 92.5%. They even regularly switch from running to sleeping. It's not a general problem, yafu runs ECM with 100%+ utilization. Is that normal behavior for msieve?

Last fiddled with by bur on 2022-06-13 at 09:53

 2022-06-13, 10:03 #9 kruoli     "Oliver" Sep 2017 Porta Westfalica, DE 2·3·181 Posts You can use MPI on a single machine to optimize the usage of the single CPU's "apartments". I cannot really comment on the matrix size since I am not experienced enough with this. Have you had a look at the according NFS@home post-processing thread to get a glimpse of what others got with similar size numbers and similar TD? As Robert pointed out, matrix solving time roughly multiplies by 4.5 for each doubling of the matrix dimensions, and since I do not think you will be able to decrease the dimensions much further, I would leave it running. It might be that the time possible to save is smaller than the time it would take to optimize the matrix. Please double check you msieve filtering invocation; I had made an error more than once where the TD was ignored on the command line because of wrong parameter order. But msieve will log the TD if it detects it correctly. Regarding Linux scheduler: Yes, often, it is better than the Windows one. But in no means perfect! Especially when working with CPUs that have divided L3 caches, chiplets or in case you would have multiple CPUs, manually setting the affinity will usually help immensely. For "basic" CPUs, I see only a few percent improvement usually. But this changed again drastically if you run multiple things in parallel on a single machine..
2022-06-13, 10:07   #10
kruoli

"Oliver"
Sep 2017
Porta Westfalica, DE

100001111102 Posts

Quote:
 Originally Posted by bur I also noticed that the threads are not always running at 100%. Average according to htop is 18.5 cores, i.e. 92.5%. They even regularly switch from running to sleeping. It's not a general problem, yafu runs ECM with 100%+ utilization. Is that normal behavior for msieve?
Yes, it seems really slow. Is this with VBITS=128? What happens if you try 24 threads? Is memory configured as octa channel? Please give MPI a try on that machine! You would need one process per L3 cache.

2022-06-13, 11:04   #11
bur

Aug 2020
79*6581e-4;3*2539e-3

10010001002 Posts

I didn't know MPI even helps on one CPU, if only due to overhead. VBITS is a compiler setting? No idea about octa channel. Isn't that a hardware thing that I can't change anyway?

I will leave it running for now, don't really have time to get into MPI at the moment and I suspect it just is that slow, now that I remembered the exponential increase in time for LA you mentioned. Swellman even estimated much more than 30 days.

If someone could confirm that it's normal that msieve tasks sleep a lot? That's the only thing I find really weird.

Quote:
 Please double check you msieve filtering invocation; I had made an error more than once where the TD was ignored on the command line because of wrong parameter order. But msieve will log the TD if it detects it correctly.
Yes, I made that mistake before. These arguments have to be immediately after the -nc option.

 Similar Threads Thread Thread Starter Forum Replies Last Post hansl Information & Answers 5 2019-06-17 14:07 Math31415 Hardware 6 2019-01-16 18:51 abelianbhaskar Information & Answers 3 2018-05-28 15:40 jasong jasong 1 2013-04-07 05:55 Unregistered Information & Answers 7 2009-11-02 08:27

All times are UTC. The time now is 17:38.

Thu Aug 11 17:38:44 UTC 2022 up 35 days, 12:26, 3 users, load averages: 1.84, 1.44, 1.31