mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-10-14, 16:51   #1
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2·3·52·67 Posts
Default GPU Noob-Experiences/Questions

Hi,
I'm picking up from
Best 4XX series GPU
where I've taken the thread far afield from the original topic. All of this is running on a GTX460 and a 1090T. At this point I am running mfaktc under 32 bit XP, and cudalucas in Win7 64 bit.

1) Huzzah! An mfaktc run found my first factor ever last night.

2) I did a trial of running a second instance of mfaktc. It was a short run: 2^68 to 2^69. For the duration, I shut down P95 and gave each instance of mfaktc its own core affinity. This did get GPU usage maxed out, but at the cost of considerable display lag....no surprise there. From the README I see that reducing NumStreams from 3 to 2 could improve this.

The question is, which is a more effective use of overall processing capacity?

Currently, I have P95 set to run on 5 cores with the sixth given over to feeding mfaktc. Until recently I had all of the P95 workers set to "Whatever makes the most sense". After reading of a need for more P-1 factoring I set two workers for that. I do note that two workers just started new assignments and are running, or just ran, P-1 factoring with the "just ran" worker going into Primality Testing. The workers I set for P-1 are still running LL assignments from the previous settings.

I let P95/64 have lots of memory (1370MB) with no day/night difference. As I understand things, this is good for P-1, Stage 2.

All that said, would PrimeNet benefit more from 2 instances of mfaktc and 4 P95/64 workers? (Assuming that I could reduce display lag to a comfortable level.) Or is 1-mfaktc, 5-P95 a better balance? I have to say that I'm more inclined to the 1 and 5 scenario because I run cudalucas in Win7. While I know that cudalucas does not really need the 6th CPU core, keeping P64 limited to 5 cores avoids the conflict with Photoshop. It is also easier for me to keep the same number of workers in P95 and P64, since they both work on the same assignments, depending on which OS I boot.

Side note to the above. With the current settings, on a 2^70-71 run, mfaktc reduces SievePrimes to 5000 (This is always the case). Avg.Rate runs in the upper 70's to mid 80's. Avg.Wait runs in the 60's with occasional higher or lower numbers.

I have gathered that the SievePrimes and Avg.Wait numbers are not optimal, but I'm not sure what to do about it. If it would be helpful I will post my mfaktc.ini and any other system details requested.

I've probably bored everyone enough for one post. I would greatly appreciate any comments or suggestions on any of the elements of all this rambling.
kladner is offline   Reply With Quote
Old 2011-10-14, 21:08   #2
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×2,399 Posts
Default

What about avg. rate and sieve primes with two instances going? If you can't get rid of the display lag (or don't like 4 cores P95) then I'm out of ideas for getting those numbers up with one instance.
Dubslow is offline   Reply With Quote
Old 2011-10-14, 21:53   #3
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

235028 Posts
Default

Quote:
Originally Posted by Dubslow View Post
What about avg. rate and sieve primes with two instances going?
I'll have another go at it and get back to you. My rough impression is that there wasn't that much difference from a single instance.

Regarding finding a first factor, I actually discovered that this box had found one previously doing P-1. Still, getting that from mfaktc gave me a sense of accomplishment.
kladner is offline   Reply With Quote
Old 2011-10-14, 22:08   #4
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2·3·52·67 Posts
Default

Here is a quick and dirty side by side. I haven't tried tweaking to reduce lag, yet. This was just a couple of minutes after Sieve Primes settled to 5000. Avg.Rate did go up noticeably. Avg.Wait bounces around.

EDIT: This was with NumStreams=10. I had gone there because README says Windows systems need more than Linux. The following post will show results for NumStreams=2.
Attached Thumbnails
Click image for larger version

Name:	2instance01.jpg
Views:	167
Size:	120.4 KB
ID:	7160  

Last fiddled with by kladner on 2011-10-14 at 22:36
kladner is offline   Reply With Quote
Old 2011-10-14, 22:41   #5
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

274216 Posts
Default

These are run with NumStreams=2. Lag is still noticeable, but somewhat better. NumStreams=3 gave similar results, but the lag was still pretty bad.

I don't find the lag surprising, given a GPU usage between 97-99%. I guess I'll have to decide if it is too intrusive.

The other issue is still whether cutting P95 to 4 cores to run mfaktc x2 is the most productive for GIMPS.
Attached Thumbnails
Click image for larger version

Name:	2instance02.jpg
Views:	140
Size:	122.2 KB
ID:	7161  
kladner is offline   Reply With Quote
Old 2011-10-14, 23:45   #6
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

160358 Posts
Default

Quote:
Originally Posted by kladner View Post
The other issue is still whether cutting P95 to 4 cores to run mfaktc x2 is the most productive for GIMPS.
I can't really answer that; as far as GHz-days, the answer is obvious, but you weren't asking that.

It does seem that two cores does saturate your GPU, though with the obvious lag problem. Does each instance go as fast or almost as fast as if you just had one? I tried looking through the old thread, but avg. rate isn't the best measure and the older assignments were different bit lengths. (Time per class)
Dubslow is offline   Reply With Quote
Old 2011-10-15, 02:35   #7
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2·3·52·67 Posts
Default

I've been experimenting with the NumStreams, and the effect on mfaktc of running P95. The difference between NumStreams 2 & 3 is pretty marked once Sieve Primes drops to 5000. I didn't capture that in the NS3 set, but you can see it with NS2.

Next, I'll show what starting P95 workers doing primality testing did to things.
Attached Thumbnails
Click image for larger version

Name:	2instance03.jpg
Views:	91
Size:	157.8 KB
ID:	7163  
kladner is offline   Reply With Quote
Old 2011-10-15, 02:55   #8
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×52×67 Posts
Default Adding P95 to mfaktc, 2 instances

The red lines are approximate for a worker being started. All of this has been with NumStreams=2. That seems to be the most effective setting, at least with this rig.

Granted, these are not totally pristine tests. There are 3 Firefox windows, with 30 tabs between them, Thunderbird email and Photoshop 6 running. They (and Norton Inet Security) might be consuming 8-10% CPU time.

But it's kind of dramatic what happens next, when P95 with 4 workers running is stopped.
Attached Thumbnails
Click image for larger version

Name:	1wrkr-2wrkr.jpg
Views:	92
Size:	162.3 KB
ID:	7164  

Last fiddled with by kladner on 2011-10-15 at 03:24
kladner is offline   Reply With Quote
Old 2011-10-15, 03:22   #9
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2×3×52×67 Posts
Default

Here, P95 is stopped from 4 workers, then a minute or so later the 2 mfaktc's are shut down. It seems that 2 mfaktc's actually want more than two CPU cores. There's activity still on the first four cores when P95 stops. That quiets down considerably when mfaktc is stopped. Interestingly, though, when mfaktc x2 is running, each instance is always showing 16-17% CPU. But the overall CPU usage drops a lot more than 32-34% when they stop. I'm guessing they may be stimulating some System activity that doesn't show up on their usage tabs.

Dubslow - I do understand about total processed output--the GHz-days. I guess I'm just looking to pick up slack if one part or another of the whole GIMPS process is getting less attention. And the display does get pretty herky-jerky with mfaktc hitting 99% on the GPU.

I haven't gotten back to try 1 instance vs 2. I guess to be really scientific, both should be running the same exponent. But
59288869,70,71 and
59288753,70,71 seem pretty close together.

I'll try that while I've still got some run-time on those two.
Attached Thumbnails
Click image for larger version

Name:	P95-mfaktc_stopped.jpg
Views:	82
Size:	166.5 KB
ID:	7165  

Last fiddled with by kladner on 2011-10-15 at 03:27
kladner is offline   Reply With Quote
Old 2011-10-15, 03:42   #10
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

2·3·52·67 Posts
Default

Crap. 59288869,70,71 finished too soon after its Sieve Primes bottomed out. There wasn't time for 59288753,70,71 to get to the same stage. However, #1's Avg.Wait jumped into the 100-200 range when Instance 2 came online. The screen shot is of number 2, showing the same kind of Avg.Wait as number 1 had, running by itself.

mfaktc is happier with company?
Attached Thumbnails
Click image for larger version

Name:	Single_Instance01.jpg
Views:	82
Size:	69.3 KB
ID:	7166  
kladner is offline   Reply With Quote
Old 2011-10-15, 06:33   #11
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·2,399 Posts
Default

I'd say you can write off the extra cpu usage without P95 running as just system stuff, not necessarily related to mfaktc. It does seem happier with two, but we already knew that it was a cpu limited process. As for what you should work on, I'd say the optimal solution is one mfaktc instance, 3 P95 workers, and CUDALucas simultaneously to use the rest of the GPU, though from what I understand you haven't gotten CL and mf- working in the same OS yet. In my experience, with either CUDALucas or mfaktc keeping my GPU (also a 460) at ~90%, I am still able to play TF2 without any laginess, so hopefully, if you get them working at the same time, your screen won't be crapped out.

(Also from the one post I would have said NS3. The time per class is slightly lower.)

Last fiddled with by Dubslow on 2011-10-15 at 06:33
Dubslow is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Some noob questions about differential equation ? awholenumber Math 7 2017-06-18 07:25
Noob Question: What to use bozocv Msieve 36 2015-12-31 00:12
2 Noob questions FlightTribe Information & Answers 13 2012-11-28 19:57
noob poly questions sleigher Msieve 56 2011-11-17 11:37
Noob question xago666 Information & Answers 3 2008-03-11 01:35

All times are UTC. The time now is 23:20.

Sun Nov 29 23:20:33 UTC 2020 up 80 days, 20:31, 3 users, load averages: 1.48, 1.55, 1.42

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.