mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2020-05-05, 04:16   #2146
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

1110010110112 Posts
Default different fft sizes

I have been running two instances at 5632K. Now I have a new set of assignments at 6144K. Running two different FFT sizes has two effects.: The smaller runs faster and the larger runs much slower. It is more efficient to run one instance. Will this imbalance be redressed when I have equal 6144K assignments?
paulunderwood is offline   Reply With Quote
Old 2020-05-05, 04:41   #2147
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22·1,873 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
I have been running two instances at 5632K. Now I have a new set of assignments at 6144K. Running two different FFT sizes has two effects.: The smaller runs faster and the larger runs much slower. It is more efficient to run one instance. Will this imbalance be redressed when I have equal 6144K assignments?
One would presume so. BTW, the latest commit supports exponents up to 106.6M in the 5.5M FFT.
Prime95 is offline   Reply With Quote
Old 2020-05-05, 06:37   #2148
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

3·52·72 Posts
Default

Quote:
Originally Posted by Prime95 View Post
One would presume so. BTW, the latest commit supports exponents up to 106.6M in the 5.5M FFT.
Code:
./gpuowl 
./gpuowl: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by ./gpuowl)
I don't have GLIBCXX_3.4.26 on my Debian Buster -- is there a work-around?
paulunderwood is offline   Reply With Quote
Old 2020-05-05, 07:36   #2149
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

3·52·72 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
Code:
./gpuowl 
./gpuowl: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by ./gpuowl)
I don't have GLIBCXX_3.4.26 on my Debian Buster -- is there a work-around?
I have two compilers. I think I installed gcc 9 manually for the source and 8 is native. Anyway for my purposes I hard wired g++-8 into the make file and all is hunky dory now.

Last fiddled with by paulunderwood on 2020-05-05 at 13:57
paulunderwood is offline   Reply With Quote
Old 2020-05-05, 18:50   #2150
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·29·59 Posts
Default

Quote:
Originally Posted by kriesel View Post
Quick update / recap on that
For what it's worth, this matching LL DC with shift 0 was performed in gpuowl v6.11-264 on the RX480 in the same system as was previously having frequent GEC errors on rx550s.
https://www.mersenne.org/report_expo...5487503&full=1
kriesel is offline   Reply With Quote
Old 2020-05-05, 20:24   #2151
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

3×3,877 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
I have been running two instances at 5632K. Now I have a new set of assignments at 6144K. Running two different FFT sizes has two effects.: The smaller runs faster and the larger runs much slower. It is more efficient to run one instance. Will this imbalance be redressed when I have equal 6144K assignments?
Paul, what expo(s) are you running that need 6144K? I'd be interested to see your per-job timings for the following 2-job setups:

1. Both @5632K;
2. One each @5632K and @6144K;
3. Both @6144K.

Presumably you already know the timing for the first 2 ... you could temporarily move one of your queued-up 6144K assignments to top of the worktodo file for the current 5632K run to get both @6144K.

If the slowdown for 2 compared to 1 and 3 really is as bad as you describe, I wonder if it's something to do with context-switching on the GPU between tasks that have different memory mappings: 2 jobs at same FFT length have different run data and e.g. DWT weights but have the same memory profile and GPU resources usage.

Edit: I tried the above three 2-jobs scenarios on my own Radeon7, using expos ~107M to trigger the 6M FFT length. Here are the per-iteration timings:

1. Both @5632K: 1470 us/iter for each, total throughput 1360 iter/sec;
2. One each @5632K,@6144K: 1530,1546 us/iter resp., total throughput 1300 iter/sec;
3. Both @6144K: 1615 us/iter for each, total throughput 1238 iter/sec.

So no anomalous slowdowns for me at any of these combos, and the per-iteration timings hew very closely to what one would expect based on an n*log(n) per-autosquaring scaling.

Last fiddled with by ewmayer on 2020-05-05 at 21:33
ewmayer is offline   Reply With Quote
Old 2020-05-05, 21:59   #2152
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

3×52×72 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Paul, what expo(s) are you running that need 6144K? I'd be interested to see your per-job timings for the following 2-job setups:

1. Both @5632K;
2. One each @5632K and @6144K;
3. Both @6144K.

Presumably you already know the timing for the first 2 ... you could temporarily move one of your queued-up 6144K assignments to top of the worktodo file for the current 5632K run to get both @6144K.

If the slowdown for 2 compared to 1 and 3 really is as bad as you describe, I wonder if it's something to do with context-switching on the GPU between tasks that have different memory mappings: 2 jobs at same FFT length have different run data and e.g. DWT weights but have the same memory profile and GPU resources usage.

Edit: I tried the above three 2-jobs scenarios on my own Radeon7, using expos ~107M to trigger the 6M FFT length. Here are the per-iteration timings:

1. Both @5632K: 1470 us/iter for each, total throughput 1360 iter/sec;
2. One each @5632K,@6144K: 1530,1546 us/iter resp., total throughput 1300 iter/sec;
3. Both @6144K: 1615 us/iter for each, total throughput 1238 iter/sec.

So no anomalous slowdowns for me at any of these combos, and the per-iteration timings hew very closely to what one would expect based on an n*log(n) per-autosquaring scaling.
1. Both @5632K; ---> 1489us/it each
2. One each @5632K and @6144K ----> the latter was ~2300us/it (very slow); the former 1125us/it

At the moment (with latest commit) it is running ~1200us/it (103.9M) and ~1800us/it (104.9M). They were running at the average earlier until I restarted them.

It is my last 103.9M exponent.

Last fiddled with by paulunderwood on 2020-05-05 at 22:13
paulunderwood is offline   Reply With Quote
Old 2020-05-07, 05:49   #2153
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

367510 Posts
Default

Was ~1200us/it (103.9M) and ~1800us/it (104.9M).

Now 1440us/it each -- both at 104.9M.
paulunderwood is offline   Reply With Quote
Old 2020-05-07, 16:56   #2154
xx005fs
 
"Eric"
Jan 2018
USA

22×53 Posts
Default PM1 Result not understood

It seems that for factored PM1 results out of GPUOWL, primenet won't be able to understand it.

Code:
{"status":"F", "exponent":"98141611", "worktype":"PM1", "B1":"750000", "B2":"15000000", "fft-length":"5767168", "factors":"["****"]", "program":{"name":"gpuowl", "version":"v6.11-258-gb92cdfd"}, "computer":"TITAN V-0", "aid":"******", "timestamp":"2020-05-06 07:29:29 UTC"}
xx005fs is offline   Reply With Quote
Old 2020-05-07, 18:23   #2155
S485122
 
S485122's Avatar
 
Sep 2006
Brussels, Belgium

2×5×167 Posts
Default

Quote:
Originally Posted by paulunderwood View Post
Was ~1200us/it (103.9M) and ~1800us/it (104.9M).

Now 1440us/it each -- both at 104.9M.
"us" ? Usually it is capitalised as "US", but it is not a unit (AFAIK.) Or do you (and preceding posters) mean µs ?

Jacob
S485122 is offline   Reply With Quote
Old 2020-05-07, 18:35   #2156
paulunderwood
 
paulunderwood's Avatar
 
Sep 2002
Database er0rr

3×52×72 Posts
Default

Quote:
Originally Posted by S485122 View Post
"us" ? Usually it is capitalised as "US", but it is not a unit (AFAIK.) Or do you (and preceding posters) mean µs ?

Jacob
Yes, I meant µs. But how do I generate mu with the keyboard easily?

nvm: I found this on how to do it Gnome without having to remember and use unicodes. Thanks for prompting me!

Last fiddled with by paulunderwood on 2020-05-07 at 18:47
paulunderwood is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1668 2020-12-22 15:38
GPUOWL AMD Windows OpenCL issues xx005fs GpuOwl 0 2019-07-26 21:37
Testing an expression for primality 1260 Software 17 2015-08-28 01:35
Testing Mersenne cofactors for primality? CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12
Primality-testing program with multiple types of moduli (PFGW-related) Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 03:20.

Sun May 16 03:20:55 UTC 2021 up 37 days, 22:01, 0 users, load averages: 1.70, 2.30, 2.57

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.