mersenneforum.org Oh Brother, What betid to mine Haswell 4770?
 Register FAQ Search Today's Posts Mark Forums Read

 2014-09-27, 04:20 #1 petrw1 1976 Toyota Corona years forever!     "Wayne" Nov 2006 Saskatchewan, Canada 23×3×199 Posts Oh Brother, What betid to mine Haswell 4770? Here are the specs from the Computer Properties screen. Code: Software Version Windows64,Prime95,v28.5,build 2 Model Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz Features 4 core, hyperthreaded, Prefetch,SSE,SSE2,SSE4,AVX,AVX2,FMA, Speed 3.435 GHz (21.359 GHz P4 effective equivalent) L1/L2 Cache 32 / 256 KB Computer Memory 4008 MB configured usage 800 MB day / 800 MB night And here is a summary of the work and timings of the 4 workers over the last week or so. I noted the times when 1 or more workers iteration times changed a lot. Code:  Date/Time #1 TF and DC #2 TF #3 DC #4 LL 18/09/2014 8:19 TF 490 Sec TF Unknown 36.2M 17 Ms 67.8M 33 Ms 19/09/2014 0:12 TF 480 Sec TF Unknown 33.2M 16 Ms 67.8M 33 Ms 19/09/2014 17:54 TF 475 Sec TF Unknown 33.2M 16 Ms 67.8M 52 Ms 21/09/2014 20:21 35.7M 26 Ms TF 470 Sec 33.2M 24 Ms 67.8M 80 Ms 22/09/2014 17:55 35.7M 24 Ms TF 500 Sec 33.2M 23 Ms 67.8M 48 Ms 23/09/2014 17:56 35.7M 25 Ms TF 490 Sec 33.2M 23 Ms 67.8M 50 Ms 25/09/2014 17:57 35.7M 24 Ms TF 495 Sec 33.2M 30 Ms 67.8M 48 Ms The questions that come to mind in no specific order of importance: 1. Almost every time all the workers stop before they send new end dates....well almost. Not on the 24th. Why are they stopping? 2. What would cause such a drastic increase in iteration times in Workers #3 and #4 when Worker #1 changes from TF to DC. Sept 21 17:55? I thought Haswell (as with Ivy and Sandy and all i-series) were much better at channel capacity and worker independence. 3. When worker #4 changed from 33 to 52 there were NO changes in work on the other 3 workers. I might just chalk that one up to external forces on the PC. Though it seemed to increase to an iteration time that is where it is consistently now. 4. Granted Benchmarks are "perfect" situations... that being said my times are WAY WAY above the benchmark I ran only a few weeks ago. About 10 Ms for the 35M DC and 20Ms for the 68M LL. 5. Could slower RAM make SUCH a big difference? Considering the TF times very little suggest to me RAM is NOT the issues...I may be wrong. 6. In a few weeks worker #2 will also be doing LL ..... are they all going to get SLOWER yet? Or simply give me some hints of where to start looking.... Last fiddled with by petrw1 on 2014-09-27 at 04:22
 2014-09-27, 04:37 #2 Prime95 P90 years forever!     Aug 2002 Yeehaw, FL 167268 Posts Have you turned off hyperthreading in the BIOS? I suspect two LL tests are getting assigned to the same physical core.
 2014-09-27, 04:42 #3 petrw1 1976 Toyota Corona years forever!     "Wayne" Nov 2006 Saskatchewan, Canada 10010101010002 Posts So setting 1 core per worker is not enough for Haswell? Not sure I can change the BIOS. It is a "borg". Any other ways around it.? Thanks
 2014-09-27, 04:51 #4 sdbardwick     Aug 2002 North San Diego County 13×53 Posts With HT and Windows, I always end up playing around with AffinityScramble2 to make sure threads don't share physical cores. For example, I had to set my 2600K running 4 workers to Code: AffinityScramble2=02461357
2014-09-27, 04:51   #5
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

11101110101102 Posts

Quote:
 Originally Posted by petrw1 So setting 1 core per worker is not enough for Haswell?
It should be, but prime95's hyperthread detection does not always work. Can you run task manager and see if two workers are running on one core?

2014-09-27, 16:10   #6
Mark Rose

"/X\(‘-‘)/X\"
Jan 2013

37×79 Posts

Quote:
 Originally Posted by sdbardwick With HT and Windows, I always end up playing around with AffinityScramble2 to make sure threads don't share physical cores. For example, I had to set my 2600K running 4 workers to Code: AffinityScramble2=02461357
You may wish to use 13570246 instead. The first CPU core in x86 usually handles more interrupts, so having it free to handle those is an advantage.

2014-09-29, 22:48   #7
petrw1
1976 Toyota Corona years forever!

"Wayne"
Nov 2006

23·3·199 Posts

Quote:
 Originally Posted by Mark Rose You may wish to use 13570246 instead. The first CPU core in x86 usually handles more interrupts, so having it free to handle those is an advantage.
So this didn't make a difference...I suspect I still have 2 tests running on the same physical core. I still need to verify this.

Could it be their Cores (Physical and HT) are numbered different?
For example (completely made up guess) ... maybe Physical Core 0's HT partner is 7 ( 1 is 6, etc) ...
How could I find out?

Or could it even be that Haswell has in a way randomized how it numbers them based on the work load?

2014-09-29, 23:23   #8
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

2·3·19·67 Posts

Quote:
 Originally Posted by petrw1 Could it be their Cores (Physical and HT) are numbered different? For example (completely made up guess) ... maybe Physical Core 0's HT partner is 7 ( 1 is 6, etc) ... How could I find out?
Set DebugAffinityScramble=1 in prime.txt. At startup, prime95 will output its calculations trying to determine logical/physical CPUs.

Prime95 does this by running some code it thinks should take 100K clock cycles. It then puts a logical CPU in a busy loop and times this 100K code on the other 7 logical CPUs. The theory is that 6 logical CPUs will time at 100K and one will time at 200K. Then the busy loop logical CPU and the 200K logical CPU are on one physical core.

2014-09-30, 04:36   #9
petrw1
1976 Toyota Corona years forever!

"Wayne"
Nov 2006

12A816 Posts

Quote:
 Originally Posted by Prime95 Set DebugAffinityScramble=1 in prime.txt. At startup, prime95 will output its calculations trying to determine logical/physical CPUs. Prime95 does this by running some code it thinks should take 100K clock cycles. It then puts a logical CPU in a busy loop and times this 100K code on the other 7 logical CPUs. The theory is that 6 logical CPUs will time at 100K and one will time at 200K. Then the busy loop logical CPU and the 200K logical CPU are on one physical core.
ok will do ... I need to get someone else to do this .... not as geeky.

Will it simply output this to results.txt or do I need to look at the actual window that runs it?

AND....

Just so I get it right once I know the pairs is the proper way to record AffinityScramble2=
A). In Physical/Logical pairs
B). All the Physical then all the Logical

i.e. if 0 Physical is with 4 Logical; and 1 with 5; 2 with 6; 3 with 7. Do I code
AffinityScramble2=04152637 (This is my guess)
OR
AffinityScramble2=01234567

Last fiddled with by petrw1 on 2014-09-30 at 04:52

 2014-09-30, 17:29 #10 petrw1 1976 Toyota Corona years forever!     "Wayne" Nov 2006 Saskatchewan, Canada 23·3·199 Posts Code: [Main thread Sep 30 11:23] Test clocks on logical CPU #1: 214592 [Main thread Sep 30 11:23] Logical CPU 2 clocks: 407000 [Main thread Sep 30 11:23] Logical CPU 3 clocks: 214576 [Main thread Sep 30 11:23] Logical CPU 4 clocks: 214720 [Main thread Sep 30 11:23] Logical CPU 5 clocks: 214576 [Main thread Sep 30 11:23] Logical CPU 6 clocks: 214712 [Main thread Sep 30 11:23] Logical CPU 7 clocks: 214608 [Main thread Sep 30 11:23] Logical CPU 8 clocks: 214856 [Main thread Sep 30 11:23] Test clocks on logical CPU #3: 214576 [Main thread Sep 30 11:23] Logical CPU 4 clocks: 201806 [Main thread Sep 30 11:23] Logical CPU 5 clocks: 113962 [Main thread Sep 30 11:23] Logical CPU 6 clocks: 114196 [Main thread Sep 30 11:23] Logical CPU 7 clocks: 113964 [Main thread Sep 30 11:23] Logical CPU 8 clocks: 114040 [Main thread Sep 30 11:23] Test clocks on logical CPU #5: 114028 [Main thread Sep 30 11:23] Logical CPU 6 clocks: 177253 [Main thread Sep 30 11:23] Logical CPU 7 clocks: 93538 [Main thread Sep 30 11:23] Logical CPU 8 clocks: 93586 [Main thread Sep 30 11:23] Test clocks on logical CPU #7: 93583 [Main thread Sep 30 11:23] Logical CPU 8 clocks: 177235 [Main thread Sep 30 11:23] Logical CPUs 1,2 form one physical CPU. [Main thread Sep 30 11:23] Logical CPUs 3,4 form one physical CPU. [Main thread Sep 30 11:23] Logical CPUs 5,6 form one physical CPU. [Main thread Sep 30 11:23] Logical CPUs 7,8 form one physical CPU. [Main thread Sep 30 11:23] Starting workers. So this tells me I want AffinityScramble2=02461357 (or 13570246) Correct??? Turns out the program wasn't completely stopped/started yesterday so I don't believe the above changes actually took effect...stay tuned...
 2014-10-01, 01:06 #11 petrw1 1976 Toyota Corona years forever!     "Wayne" Nov 2006 Saskatchewan, Canada 23×3×199 Posts So here is where I am at.... My worker doing 67M LL is still getting iteration times of 50Ms. It was at 33Ms when only 1 other worker was doing DC and the rest were doing TF. So I suspect I still don't have it right.... 1. DebugAffinityScramble determined the following CPU Pairings. Code: Logical CPUs 1,2 form one physical CPU. Logical CPUs 3,4 form one physical CPU. Logical CPUs 5,6 form one physical CPU. Logical CPUs 7,8 form one physical CPU. Do I correctly assume that once it runs it will use that knowledge to assign the correct CPUs to each worker so that each gets a separate physical core? Or is it strictly informational and I use that knowledge as I see fit to set AffinityScramble2? What if I also have AffinityScramble2 set? Which setting takes precedence? 2. I tried to set AffinityScramble2 but I think I screwed up. But is that discussion even relevant if the DebugAffinityScramble forced the correct worker/CPU settings? 3. Turns out the AffinityScramble2 I had the person enter likely did NOT take affect because Prime95 was not exited and restarted to grab the new settings. It was only a stop all/start all workers. Am I correct here that it did not take affect? 4. Furthermore I suspect it was placed in the wrong place in local.txt. I incorrectly said it could go "anywhere" in that file. It was placed at the very end within the [Worker #4] section. Can I assume it would have been ignored even if Prime95 was completely exited/restarted? 5. I had it set as 13570246. Is this a correct setting based on what DebugAffinityScramble determined? Or is it not? By putting the HT cores all first will that cause Prime95 to assign the work to the HT cores instead of the Physical cores? BOTTOM LINE: Should I simply use the output from DebugAffinityScramble to set AffinityScramble2 correctly? What is correct? 02461357? 13570246? something else? OR should I leave in DebugAffinityScramble and remove AffinityScramble2?

 Similar Threads Thread Thread Starter Forum Replies Last Post CuriousKit PrimeNet 25 2016-06-02 05:44 dh1 GPU to 72 1 2015-11-29 14:03 davieddy Lounge 0 2011-12-11 11:31 petrw1 Math 3 2008-03-30 14:20 jerico2day Software 5 2005-03-30 09:19

All times are UTC. The time now is 23:29.

Tue Oct 26 23:29:48 UTC 2021 up 95 days, 17:58, 1 user, load averages: 1.68, 1.50, 1.37