mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet

Reply
 
Thread Tools
Old 2008-12-08, 00:26   #12
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

41×193 Posts
Default

Quote:
Originally Posted by petrw1 View Post
.but isn't it documented that 64Bit OS along with 64Bit Prime95 are faster for factoring? Or is that just for TF and not for P-1?
Just TF
Prime95 is offline   Reply With Quote
Old 2008-12-08, 02:55   #13
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

2×32×172 Posts
Default

OK. My plan for my Q9550, before year end, is:
- upgrade to 25.8.2 (very soon)
- Configure 2 CPUs as a Worker/Helper pair to do LL (according to Phantomas either 1,2 or 3,0)
- 1 CPU for TF
- 1 CPU for P-1
petrw1 is offline   Reply With Quote
Old 2008-12-08, 20:51   #14
Phantomas
 
Phantomas's Avatar
 
Oct 2008
Germany, Hamburg

4116 Posts
Default

Quote:
Originally Posted by petrw1 View Post
- Configure 2 CPUs as a Worker/Helper pair to do LL (according to Phantomas
As I noted in the thread, it depends on the used FSB/Speed (in my PC ).
And I must find out, that after a power down cycle, my optimal affinity switched back to [0,1] [2,3] at 400MHz:surprised

I now realy don't know whats going on there..... So you have to find out the optimal affinity by yourself.
Phantomas is offline   Reply With Quote
Old 2008-12-09, 04:47   #15
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

2×32×172 Posts
Default

Quote:
Originally Posted by Phantomas View Post
As I noted in the thread, it depends on the used FSB/Speed (in my PC ).
And I must find out, that after a power down cycle, my optimal affinity switched back to [0,1] [2,3] at 400MHz:surprised

I now realy don't know whats going on there..... So you have to find out the optimal affinity by yourself.
Well unless I am jinxed or doing something wrong it makes no difference.

I added to prime.txt the line:
Affinityscramble=xxxx
and in every single case I got the same results:
Quote:
Affinityscramble=0123
Or any of the following: 0123; 1230; 3102; 0213; 2031; 3210
And without the line
Quote:
Q9550
No overclock
4GB DDR2-1066
2 workers X 2 CPUs
Assignments were LL=47M
Timings = 0.037 for each
But back to 4 workers my timings were:
Quote:
LL 47M = 0.065
LL 47M = 0.064
LL 47M = 0.065
LL 29M = 0.037
petrw1 is offline   Reply With Quote
Old 2008-12-09, 05:51   #16
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

2·32·172 Posts
Default Work reworked....

So until I figure out why I can't get as good a performance under 25.8.2 with 2 CPUs paired as I can alone my workload now looks like:

Quote:
All 1 CPU each
Worker #1: LL - 47.7M = 0.062 secs
Worker #2: LL - 47.7M = 0.061 secs
Worker #3: TF
Worker #4: P-1
NOTE: With 4 LL each iteration for 47.7M was .065
petrw1 is offline   Reply With Quote
Old 2008-12-09, 14:28   #17
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11110111010012 Posts
Default

Quote:
Originally Posted by petrw1 View Post
So until I figure out why I can't get as good a performance under 25.8.2 with 2 CPUs paired as I can alone my workload now looks like:
For most people, it makes sense to run 2 LLs rather than 1 LL with a helper thread.

Your machine seems to have better bandwidth than similar machines. With that small a drop going from 3 LL to 4 LL, feel free to change your TF worker thread to LL. Of course, it's your machine - do what you find most enjoyable.
Prime95 is offline   Reply With Quote
Old 2008-12-09, 16:05   #18
db597
 
db597's Avatar
 
Jan 2003

CE16 Posts
Default

Ok, I have a dual core Xeon 3.6GHz that is now transferred from LL double testing to P1 factoring. It will start the P1 factoring tomorrow, once the current exponent is is double testing is done.

For P1 factoring, will I get better results working on 1 P1 factor using a helper thread, or should I get it working on 2 P1 factors (1 on each core)?
db597 is offline   Reply With Quote
Old 2008-12-09, 16:21   #19
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

791310 Posts
Default

Quote:
Originally Posted by db597 View Post
For P1 factoring, will I get better results working on 1 P1 factor using a helper thread, or should I get it working on 2 P1 factors (1 on each core)?
Run 2 worker threads.
Prime95 is offline   Reply With Quote
Old 2008-12-10, 01:30   #20
penguin66
 
penguin66's Avatar
 
Oct 2008
Left of Albuquerque

7 Posts
Default Running P-1 factoring

I've got one core doing P-1 factoring.....
penguin66 is offline   Reply With Quote
Old 2008-12-10, 21:26   #21
S00113
 
S00113's Avatar
 
Dec 2003

23×33 Posts
Default

Quote:
Originally Posted by Prime95 View Post
If not enough people sign up for P-1, I'll have to tweak "do what makes the most sense" to assign some P-1 factoring.
Why not do it anyway?

I have signed up a lot of CPUs for P-1. In the beginning I only signed up one CPU on machines with more than 2 GiB of RAM availiable for mprime, having the other doing TF to avoid high pressure on the memory bus slowing calculations down. Recently I also switched some machines with 2 GiB RAM (about 1,5 GiB reserved for mprime at night) to P-1 after seeing that my LL-NF CPUs just got factoring work, assuming that there were no fully factored exponents available.

I like to put my CPUs to work at tasks where I can get the most useful work done per cycle compared to others. Knowing that P-1 is faster or has a higher success rate with more RAM, I use the CPUs with most RAM for P-1. Also, since a high pressure on the memory bus will slow calculations down, and may make a noticeable slowdown for other tasks on the machines, and since these are 64bit AMD CPUs, which are extremely fast per clock at TF, I have selected TF for the second core. (I was far down the TF ranks until mprime came in 64bit version, and now I'm almost at the top!)

There are a lot of considerations to take when choosing work. Caches, CPU type, memory and memory speed, SSE2 or not, etc. I.e. PIIIs have an edge at LL testing in a few small windows where the SSE2 version need a higher FFT size. I don't think the server ever considered that when a PIII asked for LL work. IMHO it should.

What we need is simple guidelines to help us choose. How important is cache for P-1? Is my assumption of memory bus pressure true, or will the entire stage 1 fit easly in a 2 MiB cache? Is more than 2 GiB RAM helpful at all, or could I do just as well with 512 MiB? At all exponent sizes? What CPUs are the fastest per clock at LL for different FFT sizes? I guess some CPUs have large enough cache to fit the entire working set theese days. Is it considered by the server when it hands out work? How about TF? Is the CPU type and bitness considered when hading out TF work, or is it just "fist in line"?

I believe we could speed up GIMPS a two figure percentage by considering all this when handing out any type of work. The perfect solution would be to run combined benchmarks and tests for an hour or so to find the optimal combination of work types for a particular machine, and use that information to select the best work adjusted to GIMPS' needs. Perhaps with some user input to tune the selection, asking about memory and memory bus pressure. (A saturated memory bus is very noticeable to an interactive user.)

And another thing: Some machines are probably faster per cycle at stage 1, and other machines are better suited at stage 2. Handing over an intermediate result from step 1 could be very useful in some circumstances. (Also if you want to continue with gmp-ecm to a higher limit than mprime can handle.) I assume this isn't done now because of bandwidth limitations at the server, but what could be done to make stage 1 and stage 2 separate work types, and just let selected machines do stage 2?
S00113 is offline   Reply With Quote
Old 2008-12-10, 22:00   #22
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1EE916 Posts
Default

Quote:
Originally Posted by S00113 View Post
Why not do it anyway?
I will if I have to. In an ideal world, enough people would voluntarily enroll for P-1 to keep ahead of the LL testers. That is what happens with TF. Since P-1 is a new work type, we'll need to see how many choose that work preference. Hopefully, the fact that P-1 has its own top producers chart will be enough encouragement.

Quote:
I have signed up a lot of CPUs for P-1. In the beginning I only signed up one CPU on machines with more than 2 GiB of RAM availiable for mprime, having the other doing TF to avoid high pressure on the memory bus slowing calculations down. Recently I also switched some machines with 2 GiB RAM (about 1,5 GiB reserved for mprime at night) to P-1 after seeing that my LL-NF CPUs just got factoring work, assuming that there were no fully factored exponents available.
LL-NF should be fixed. Thanks for choosing P-1. I've noticed a big uptick in P-1 reservations.

Quote:
I like to put my CPUs to work at tasks where I can get the most useful work done per cycle compared to others...
Good idea.

As you noted, the biggest difference between architectures is for TF. The P-III and early Athlon being better at certain LL ranges is true. I tried to make the server handle this but couldn't come up with an efficient SQL query. Fortunately, these chips are now rare and probably better suited to TF (due to long double-check times).

Cache size is not relevant. No chip's cache can hold the working set of a double-check, LL, or P-1 assignment.

Saving ECM stage 1 results to feed to gmp-ecm is not useful for big exponents. As I understand it, GMP-ECM's stage 2 is not faster than prime95 for these 100,000 to 2,500,000 bit numbers.
Prime95 is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 19:57.


Wed Jul 6 19:57:05 UTC 2022 up 83 days, 17:58, 0 users, load averages: 1.31, 1.54, 1.44

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔