mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2008-11-01, 07:05   #1
lidocorc
 
lidocorc's Avatar
 
Nov 2008
Rosenheim, Germany

23×3 Posts
Default LL test with V25 much slower than with V24

LL test with v25.7 much slower than with v24.14

Since yesterday I'm using the new Version 25.7 instead of 24.14 on my AMD X64 dual core CPU driven machine. Up to yesterday I ran only one core with prime95 and it took about 0.100 sec for one iteration step. Now since I'm using both cores of the CPU with prime95 v25 speed is much slower on that core that continues proceeding the LL test of the old Mersenne exponent. Time per iteration is now 0.146 sec, which means only 65% of the previous speed. Any suggestions what's the reason? I thought both cores work separately whithout influencing each other.

lidocorc
lidocorc is offline   Reply With Quote
Old 2008-11-01, 15:40   #2
starrynte
 
starrynte's Avatar
 
Oct 2008
California

3548 Posts
Default

if i understand correctly, though there are two separate cores, there is only one memory being shared for both instances, so it will be slightly slower for each individual assignment, but overall it will be faster
starrynte is offline   Reply With Quote
Old 2008-11-01, 21:01   #3
Phantomas
 
Phantomas's Avatar
 
Oct 2008
Germany, Hamburg

5×13 Posts
Default

you can also let both cores work on the same exponent. But in general that is slower overall.
Phantomas is offline   Reply With Quote
Old 2008-11-01, 22:09   #4
Freightyard
 
Nov 2008
San Luis Obispo CA

27 Posts
Default

Intel Nehalem with Quickpath.
Freightyard is offline   Reply With Quote
Old 2008-11-01, 23:46   #5
dan3ny
 
Oct 2008

22 Posts
Default

You may also be experiencing what I am, and having p95 not using much of your CPU
dan3ny is offline   Reply With Quote
Old 2008-11-02, 07:52   #6
lidocorc
 
lidocorc's Avatar
 
Nov 2008
Rosenheim, Germany

2410 Posts
Default

@starrynte
It's as you write. But is shared cache memory such a bottleneck, that it reduces performance that much?

@dan3ny
No. I'm experiencing 100% CPU load.


I wonder what would happen if I'd shut down only one of the workers and let the other one continue. But I can't find a menue command to switch off a single worker independently from the other one.

lidocorc
lidocorc is offline   Reply With Quote
Old 2008-11-02, 08:34   #7
Kevin
 
Kevin's Avatar
 
Aug 2002
Ann Arbor, MI

1101100012 Posts
Default

Quote:
Originally Posted by lidocorc View Post
@starrynte
It's as you write. But is shared cache memory such a bottleneck, that it reduces performance that much?

@dan3ny
No. I'm experiencing 100% CPU load.


I wonder what would happen if I'd shut down only one of the workers and let the other one continue. But I can't find a menue command to switch off a single worker independently from the other one.

lidocorc
The shared cache memory is only a bottleneck if both cores are running LL tests. There's a menu command called "worker windows" under Test. Each "worker" corresponds to a core. Set worker #1 to do LL tests like you did before, and set worker #2 to do trial factoring. The core working on the LL test will go as fast as it did before, and the second core is still contributing.

You might not notice an immediate change because both cores will still be working on the LL tests they've been assigned. What you can do is shut down the client, go to worktodo.txt, and move the line corresponding to the second LL test from the heading under "worker #2" to the one under "worker #1" (assuming worker #1 is the one with your current LL test that you have set to do LL tests). Then when you restart the client, your first worker will be testing your old exponent at full speed, and it will continue the test the other core started when the current one finishes. The second worker will reserve trial factoring assignments and begin work on those.
Kevin is offline   Reply With Quote
Old 2008-11-02, 10:25   #8
S485122
 
S485122's Avatar
 
"Jacob"
Sep 2006
Brussels, Belgium

175610 Posts
Default

Quote:
Originally Posted by Kevin View Post
The shared cache memory is only a bottleneck if both cores are running LL tests.
The shared memory cache is not the problem, the code supports a maximum of 1024 KB cache per worker (cfr. Prime95 and L2 Cache). If you have less than 1024 KB of L2 cache per core, f.i. 1024 KB of shared cache for two cores, you can include a line "CpuL2CacheSize=128 or 256 or 512" in local.txt.

The problem lies in access to memory on multicores. This is especially true with some of the NVidia chipsets.

Jacob
S485122 is online now   Reply With Quote
Old 2008-11-02, 10:28   #9
Oleg V.Cat
 
Oct 2008
Riga, Latvia

10112 Posts
Default

Quote:
Originally Posted by lidocorc View Post
LL test with v25.7 much slower than with v24.14

Since yesterday I'm using the new Version 25.7 instead of 24.14 on my AMD X64 dual core CPU driven machine.

Having 2 worker threads on double core CPU - that is a bad idea, especially if you have desktop machine. You can put 2 CPU threads on one worker thread, and get ~50%-70% grater performance.

To switch from 2 worker threads to 1 – you need to go to Test->Worker windows, and set 1 in “Number of worker windows to run”. Then (100% success way) – stop P95 and exit, and manually move all work in worktodo.txt file and then run P95 again.
Oleg V.Cat is offline   Reply With Quote
Old 2008-11-02, 15:25   #10
S485122
 
S485122's Avatar
 
"Jacob"
Sep 2006
Brussels, Belgium

22·439 Posts
Default

Quote:
Originally Posted by Oleg V.Cat View Post
Having 2 worker threads on double core CPU - that is a bad idea, especially if you have desktop machine. You can put 2 CPU threads on one worker thread, and get ~50%-70% grater performance.
According to all other testers that have expressed themselves on the forum and more important according to the writer of the program, this is not true. It may be true in special cases, but I doubt it, certainly your alleged 50 % to 70% increase in throughput.

Can it be that you did not take into account that testing one different number on each core might take longer to produce a result, but that you end up with more than one result ?

Jacob
S485122 is online now   Reply With Quote
Old 2008-11-02, 17:44   #11
Oleg V.Cat
 
Oct 2008
Riga, Latvia

11 Posts
Default

Quote:
Originally Posted by S485122 View Post
According to all other testers that have expressed themselves on the forum and more important according to the writer of the program, this is not true. It may be true in special cases, but I doubt it, certainly your alleged 50 % to 70% increase in throughput.
Strange, but it's real for my cheap desktop E2160*1.8Ghz. In one worker with one CPU thread I have iteration time approx ~0.090, with two - ~0.055 when CPU is free and ~0.090 in heavy loaded. In two workers I have approx ~0.140 for each worker thread...
Oleg V.Cat is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Version 28.7 slower? drew Software 2 2016-03-29 18:03
A (new) old, (faster) slower mersenne-(primality) PRP test boldi Miscellaneous Math 74 2014-04-17 07:16
Linux slower then Windows ( both 64 bit) pepi37 Linux 20 2011-12-14 19:47
Is version 25 a lot slower? Jud McCranie Information & Answers 3 2008-11-12 15:21
Why is my PC slower than comparable PCs? markhl Software 15 2003-07-22 18:47

All times are UTC. The time now is 01:11.


Sat Dec 4 01:11:47 UTC 2021 up 133 days, 19:40, 0 users, load averages: 1.28, 1.36, 1.39

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.