mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-05-05, 04:16   #2146
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

23×449 Posts
Default different fft sizes

I have been running two instances at 5632K. Now I have a new set of assignments at 6144K. Running two different FFT sizes has two effects.: The smaller runs faster and the larger runs much slower. It is more efficient to run one instance. Will this imbalance be redressed when I have equal 6144K assignments?
paulunderwood is offline   Reply With Quote
Old 2020-05-05, 04:41   #2147
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

23·919 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
I have been running two instances at 5632K. Now I have a new set of assignments at 6144K. Running two different FFT sizes has two effects.: The smaller runs faster and the larger runs much slower. It is more efficient to run one instance. Will this imbalance be redressed when I have equal 6144K assignments?
One would presume so. BTW, the latest commit supports exponents up to 106.6M in the 5.5M FFT.
Prime95 is offline   Reply With Quote
Old 2020-05-05, 06:37   #2148
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

E0816 Posts
Default

Quote:
Originally Posted by Prime95 View Post
One would presume so. BTW, the latest commit supports exponents up to 106.6M in the 5.5M FFT.
Code:
./gpuowl 
./gpuowl: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by ./gpuowl)
I don't have GLIBCXX_3.4.26 on my Debian Buster -- is there a work-around?
paulunderwood is offline   Reply With Quote
Old 2020-05-05, 07:36   #2149
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

23×449 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
Code:
./gpuowl 
./gpuowl: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by ./gpuowl)
I don't have GLIBCXX_3.4.26 on my Debian Buster -- is there a work-around?
I have two compilers. I think I installed gcc 9 manually for the source and 8 is native. Anyway for my purposes I hard wired g++-8 into the make file and all is hunky dory now.

Last fiddled with by paulunderwood on 2020-05-05 at 13:57
paulunderwood is offline   Reply With Quote
Old 2020-05-05, 18:50   #2150
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

7·701 Posts
Default

Quote:
Originally Posted by kriesel View Post
Quick update / recap on that
For what it's worth, this matching LL DC with shift 0 was performed in gpuowl v6.11-264 on the RX480 in the same system as was previously having frequent GEC errors on rx550s.
https://www.mersenne.org/report_expo...5487503&full=1
kriesel is online now   Reply With Quote
Old 2020-05-05, 20:24   #2151
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

3×53×73 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
I have been running two instances at 5632K. Now I have a new set of assignments at 6144K. Running two different FFT sizes has two effects.: The smaller runs faster and the larger runs much slower. It is more efficient to run one instance. Will this imbalance be redressed when I have equal 6144K assignments?
Paul, what expo(s) are you running that need 6144K? I'd be interested to see your per-job timings for the following 2-job setups:

1. Both @5632K;
2. One each @5632K and @6144K;
3. Both @6144K.

Presumably you already know the timing for the first 2 ... you could temporarily move one of your queued-up 6144K assignments to top of the worktodo file for the current 5632K run to get both @6144K.

If the slowdown for 2 compared to 1 and 3 really is as bad as you describe, I wonder if it's something to do with context-switching on the GPU between tasks that have different memory mappings: 2 jobs at same FFT length have different run data and e.g. DWT weights but have the same memory profile and GPU resources usage.

Edit: I tried the above three 2-jobs scenarios on my own Radeon7, using expos ~107M to trigger the 6M FFT length. Here are the per-iteration timings:

1. Both @5632K: 1470 us/iter for each, total throughput 1360 iter/sec;
2. One each @5632K,@6144K: 1530,1546 us/iter resp., total throughput 1300 iter/sec;
3. Both @6144K: 1615 us/iter for each, total throughput 1238 iter/sec.

So no anomalous slowdowns for me at any of these combos, and the per-iteration timings hew very closely to what one would expect based on an n*log(n) per-autosquaring scaling.

Last fiddled with by ewmayer on 2020-05-05 at 21:33
ewmayer is offline   Reply With Quote
Old 2020-05-05, 21:59   #2152
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

23·449 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Paul, what expo(s) are you running that need 6144K? I'd be interested to see your per-job timings for the following 2-job setups:

1. Both @5632K;
2. One each @5632K and @6144K;
3. Both @6144K.

Presumably you already know the timing for the first 2 ... you could temporarily move one of your queued-up 6144K assignments to top of the worktodo file for the current 5632K run to get both @6144K.

If the slowdown for 2 compared to 1 and 3 really is as bad as you describe, I wonder if it's something to do with context-switching on the GPU between tasks that have different memory mappings: 2 jobs at same FFT length have different run data and e.g. DWT weights but have the same memory profile and GPU resources usage.

Edit: I tried the above three 2-jobs scenarios on my own Radeon7, using expos ~107M to trigger the 6M FFT length. Here are the per-iteration timings:

1. Both @5632K: 1470 us/iter for each, total throughput 1360 iter/sec;
2. One each @5632K,@6144K: 1530,1546 us/iter resp., total throughput 1300 iter/sec;
3. Both @6144K: 1615 us/iter for each, total throughput 1238 iter/sec.

So no anomalous slowdowns for me at any of these combos, and the per-iteration timings hew very closely to what one would expect based on an n*log(n) per-autosquaring scaling.
1. Both @5632K; ---> 1489us/it each
2. One each @5632K and @6144K ----> the latter was ~2300us/it (very slow); the former 1125us/it

At the moment (with latest commit) it is running ~1200us/it (103.9M) and ~1800us/it (104.9M). They were running at the average earlier until I restarted them.

It is my last 103.9M exponent.

Last fiddled with by paulunderwood on 2020-05-05 at 22:13
paulunderwood is offline   Reply With Quote
Old 2020-05-07, 05:49   #2153
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

23·449 Posts
Default

Was ~1200us/it (103.9M) and ~1800us/it (104.9M).

Now 1440us/it each -- both at 104.9M.
paulunderwood is offline   Reply With Quote
Old 2020-05-07, 16:56   #2154
xx005fs
 
"Eric"
Jan 2018
USA

D416 Posts
Default PM1 Result not understood

It seems that for factored PM1 results out of GPUOWL, primenet won't be able to understand it.

Code:
{"status":"F", "exponent":"98141611", "worktype":"PM1", "B1":"750000", "B2":"15000000", "fft-length":"5767168", "factors":"["****"]", "program":{"name":"gpuowl", "version":"v6.11-258-gb92cdfd"}, "computer":"TITAN V-0", "aid":"******", "timestamp":"2020-05-06 07:29:29 UTC"}
xx005fs is offline   Reply With Quote
Old 2020-05-07, 18:23   #2155
S485122
 
S485122's Avatar
 
Sep 2006
Brussels, Belgium

13·127 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
Was ~1200us/it (103.9M) and ~1800us/it (104.9M).

Now 1440us/it each -- both at 104.9M.
"us" ? Usually it is capitalised as "US", but it is not a unit (AFAIK.) Or do you (and preceding posters) mean µs ?

Jacob
S485122 is offline   Reply With Quote
Old 2020-05-07, 18:35   #2156
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

23×449 Posts
Default

Quote:
Originally Posted by S485122 View Post
"us" ? Usually it is capitalised as "US", but it is not a unit (AFAIK.) Or do you (and preceding posters) mean µs ?

Jacob
Yes, I meant µs. But how do I generate mu with the keyboard easily?

nvm: I found this on how to do it Gnome without having to remember and use unicodes. Thanks for prompting me!

Last fiddled with by paulunderwood on 2020-05-07 at 18:47
paulunderwood is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1668 2020-12-22 15:38
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 14:22.

Mon Mar 1 14:22:05 UTC 2021 up 88 days, 10:33, 0 users, load averages: 1.94, 1.80, 2.07

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.