![]() |
![]() |
#12 |
P90 years forever!
Aug 2002
Yeehaw, FL
41×193 Posts |
![]() |
![]() |
![]() |
![]() |
#13 |
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
2×32×172 Posts |
![]()
OK. My plan for my Q9550, before year end, is:
- upgrade to 25.8.2 (very soon) - Configure 2 CPUs as a Worker/Helper pair to do LL (according to Phantomas either 1,2 or 3,0) - 1 CPU for TF - 1 CPU for P-1 |
![]() |
![]() |
![]() |
#14 | |
Oct 2008
Germany, Hamburg
4116 Posts |
![]() Quote:
And I must find out, that after a power down cycle, my optimal affinity switched back to [0,1] [2,3] at 400MHz:surprised I now realy don't know whats going on there..... So you have to find out the optimal affinity by yourself. |
|
![]() |
![]() |
![]() |
#15 | ||||
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
2×32×172 Posts |
![]() Quote:
I added to prime.txt the line: Affinityscramble=xxxx and in every single case I got the same results: Quote:
Quote:
Quote:
|
||||
![]() |
![]() |
![]() |
#16 | |
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
2·32·172 Posts |
![]()
So until I figure out why I can't get as good a performance under 25.8.2 with 2 CPUs paired as I can alone my workload now looks like:
Quote:
|
|
![]() |
![]() |
![]() |
#17 | |
P90 years forever!
Aug 2002
Yeehaw, FL
11110111010012 Posts |
![]() Quote:
Your machine seems to have better bandwidth than similar machines. With that small a drop going from 3 LL to 4 LL, feel free to change your TF worker thread to LL. Of course, it's your machine - do what you find most enjoyable. |
|
![]() |
![]() |
![]() |
#18 |
Jan 2003
CE16 Posts |
![]()
Ok, I have a dual core Xeon 3.6GHz that is now transferred from LL double testing to P1 factoring. It will start the P1 factoring tomorrow, once the current exponent is is double testing is done.
For P1 factoring, will I get better results working on 1 P1 factor using a helper thread, or should I get it working on 2 P1 factors (1 on each core)? |
![]() |
![]() |
![]() |
#19 |
P90 years forever!
Aug 2002
Yeehaw, FL
791310 Posts |
![]() |
![]() |
![]() |
![]() |
#20 |
Oct 2008
Left of Albuquerque
7 Posts |
![]()
I've got one core doing P-1 factoring.....
|
![]() |
![]() |
![]() |
#21 | |
Dec 2003
23×33 Posts |
![]() Quote:
I have signed up a lot of CPUs for P-1. In the beginning I only signed up one CPU on machines with more than 2 GiB of RAM availiable for mprime, having the other doing TF to avoid high pressure on the memory bus slowing calculations down. Recently I also switched some machines with 2 GiB RAM (about 1,5 GiB reserved for mprime at night) to P-1 after seeing that my LL-NF CPUs just got factoring work, assuming that there were no fully factored exponents available. I like to put my CPUs to work at tasks where I can get the most useful work done per cycle compared to others. Knowing that P-1 is faster or has a higher success rate with more RAM, I use the CPUs with most RAM for P-1. Also, since a high pressure on the memory bus will slow calculations down, and may make a noticeable slowdown for other tasks on the machines, and since these are 64bit AMD CPUs, which are extremely fast per clock at TF, I have selected TF for the second core. (I was far down the TF ranks until mprime came in 64bit version, and now I'm almost at the top!) There are a lot of considerations to take when choosing work. Caches, CPU type, memory and memory speed, SSE2 or not, etc. I.e. PIIIs have an edge at LL testing in a few small windows where the SSE2 version need a higher FFT size. I don't think the server ever considered that when a PIII asked for LL work. IMHO it should. What we need is simple guidelines to help us choose. How important is cache for P-1? Is my assumption of memory bus pressure true, or will the entire stage 1 fit easly in a 2 MiB cache? Is more than 2 GiB RAM helpful at all, or could I do just as well with 512 MiB? At all exponent sizes? What CPUs are the fastest per clock at LL for different FFT sizes? I guess some CPUs have large enough cache to fit the entire working set theese days. Is it considered by the server when it hands out work? How about TF? Is the CPU type and bitness considered when hading out TF work, or is it just "fist in line"? I believe we could speed up GIMPS a two figure percentage by considering all this when handing out any type of work. The perfect solution would be to run combined benchmarks and tests for an hour or so to find the optimal combination of work types for a particular machine, and use that information to select the best work adjusted to GIMPS' needs. Perhaps with some user input to tune the selection, asking about memory and memory bus pressure. (A saturated memory bus is very noticeable to an interactive user.) And another thing: Some machines are probably faster per cycle at stage 1, and other machines are better suited at stage 2. Handing over an intermediate result from step 1 could be very useful in some circumstances. (Also if you want to continue with gmp-ecm to a higher limit than mprime can handle.) I assume this isn't done now because of bandwidth limitations at the server, but what could be done to make stage 1 and stage 2 separate work types, and just let selected machines do stage 2? |
|
![]() |
![]() |
![]() |
#22 | ||
P90 years forever!
Aug 2002
Yeehaw, FL
1EE916 Posts |
![]()
I will if I have to. In an ideal world, enough people would voluntarily enroll for P-1 to keep ahead of the LL testers. That is what happens with TF. Since P-1 is a new work type, we'll need to see how many choose that work preference. Hopefully, the fact that P-1 has its own top producers chart will be enough encouragement.
Quote:
Quote:
As you noted, the biggest difference between architectures is for TF. The P-III and early Athlon being better at certain LL ranges is true. I tried to make the server handle this but couldn't come up with an efficient SQL query. Fortunately, these chips are now rare and probably better suited to TF (due to long double-check times). Cache size is not relevant. No chip's cache can hold the working set of a double-check, LL, or P-1 assignment. Saving ECM stage 1 results to feed to gmp-ecm is not useful for big exponents. As I understand it, GMP-ECM's stage 2 is not faster than prime95 for these 100,000 to 2,500,000 bit numbers. |
||
![]() |
![]() |