mersenneforum.org gpuOwL: an OpenCL program for Mersenne primality testing
 Register FAQ Search Today's Posts Mark Forums Read

 2020-05-05, 04:16 #2146 paulunderwood     Sep 2002 Database er0rr 1110010110112 Posts different fft sizes I have been running two instances at 5632K. Now I have a new set of assignments at 6144K. Running two different FFT sizes has two effects.: The smaller runs faster and the larger runs much slower. It is more efficient to run one instance. Will this imbalance be redressed when I have equal 6144K assignments?
2020-05-05, 04:41   #2147
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

22·1,873 Posts

Quote:
 Originally Posted by paulunderwood I have been running two instances at 5632K. Now I have a new set of assignments at 6144K. Running two different FFT sizes has two effects.: The smaller runs faster and the larger runs much slower. It is more efficient to run one instance. Will this imbalance be redressed when I have equal 6144K assignments?
One would presume so. BTW, the latest commit supports exponents up to 106.6M in the 5.5M FFT.

2020-05-05, 06:37   #2148
paulunderwood

Sep 2002
Database er0rr

3·52·72 Posts

Quote:
 Originally Posted by Prime95 One would presume so. BTW, the latest commit supports exponents up to 106.6M in the 5.5M FFT.
Code:
./gpuowl
./gpuowl: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version GLIBCXX_3.4.26' not found (required by ./gpuowl)
I don't have GLIBCXX_3.4.26 on my Debian Buster -- is there a work-around?

2020-05-05, 07:36   #2149
paulunderwood

Sep 2002
Database er0rr

3·52·72 Posts

Quote:
 Originally Posted by paulunderwood Code: ./gpuowl ./gpuowl: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version GLIBCXX_3.4.26' not found (required by ./gpuowl) I don't have GLIBCXX_3.4.26 on my Debian Buster -- is there a work-around?
I have two compilers. I think I installed gcc 9 manually for the source and 8 is native. Anyway for my purposes I hard wired g++-8 into the make file and all is hunky dory now.

Last fiddled with by paulunderwood on 2020-05-05 at 13:57

2020-05-05, 18:50   #2150
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·29·59 Posts

Quote:
 Originally Posted by kriesel Quick update / recap on that
For what it's worth, this matching LL DC with shift 0 was performed in gpuowl v6.11-264 on the RX480 in the same system as was previously having frequent GEC errors on rx550s.
https://www.mersenne.org/report_expo...5487503&full=1

2020-05-05, 20:24   #2151
ewmayer
2ω=0

Sep 2002
República de California

3×3,877 Posts

Quote:
 Originally Posted by paulunderwood I have been running two instances at 5632K. Now I have a new set of assignments at 6144K. Running two different FFT sizes has two effects.: The smaller runs faster and the larger runs much slower. It is more efficient to run one instance. Will this imbalance be redressed when I have equal 6144K assignments?
Paul, what expo(s) are you running that need 6144K? I'd be interested to see your per-job timings for the following 2-job setups:

1. Both @5632K;
2. One each @5632K and @6144K;
3. Both @6144K.

Presumably you already know the timing for the first 2 ... you could temporarily move one of your queued-up 6144K assignments to top of the worktodo file for the current 5632K run to get both @6144K.

If the slowdown for 2 compared to 1 and 3 really is as bad as you describe, I wonder if it's something to do with context-switching on the GPU between tasks that have different memory mappings: 2 jobs at same FFT length have different run data and e.g. DWT weights but have the same memory profile and GPU resources usage.

Edit: I tried the above three 2-jobs scenarios on my own Radeon7, using expos ~107M to trigger the 6M FFT length. Here are the per-iteration timings:

1. Both @5632K: 1470 us/iter for each, total throughput 1360 iter/sec;
2. One each @5632K,@6144K: 1530,1546 us/iter resp., total throughput 1300 iter/sec;
3. Both @6144K: 1615 us/iter for each, total throughput 1238 iter/sec.

So no anomalous slowdowns for me at any of these combos, and the per-iteration timings hew very closely to what one would expect based on an n*log(n) per-autosquaring scaling.

Last fiddled with by ewmayer on 2020-05-05 at 21:33

2020-05-05, 21:59   #2152
paulunderwood

Sep 2002
Database er0rr

3×52×72 Posts

Quote:
 Originally Posted by ewmayer Paul, what expo(s) are you running that need 6144K? I'd be interested to see your per-job timings for the following 2-job setups: 1. Both @5632K; 2. One each @5632K and @6144K; 3. Both @6144K. Presumably you already know the timing for the first 2 ... you could temporarily move one of your queued-up 6144K assignments to top of the worktodo file for the current 5632K run to get both @6144K. If the slowdown for 2 compared to 1 and 3 really is as bad as you describe, I wonder if it's something to do with context-switching on the GPU between tasks that have different memory mappings: 2 jobs at same FFT length have different run data and e.g. DWT weights but have the same memory profile and GPU resources usage. Edit: I tried the above three 2-jobs scenarios on my own Radeon7, using expos ~107M to trigger the 6M FFT length. Here are the per-iteration timings: 1. Both @5632K: 1470 us/iter for each, total throughput 1360 iter/sec; 2. One each @5632K,@6144K: 1530,1546 us/iter resp., total throughput 1300 iter/sec; 3. Both @6144K: 1615 us/iter for each, total throughput 1238 iter/sec. So no anomalous slowdowns for me at any of these combos, and the per-iteration timings hew very closely to what one would expect based on an n*log(n) per-autosquaring scaling.
1. Both @5632K; ---> 1489us/it each
2. One each @5632K and @6144K ----> the latter was ~2300us/it (very slow); the former 1125us/it

At the moment (with latest commit) it is running ~1200us/it (103.9M) and ~1800us/it (104.9M). They were running at the average earlier until I restarted them.

It is my last 103.9M exponent.

Last fiddled with by paulunderwood on 2020-05-05 at 22:13

 2020-05-07, 05:49 #2153 paulunderwood     Sep 2002 Database er0rr 367510 Posts Was ~1200us/it (103.9M) and ~1800us/it (104.9M). Now 1440us/it each -- both at 104.9M.
 2020-05-07, 16:56 #2154 xx005fs   "Eric" Jan 2018 USA 22×53 Posts PM1 Result not understood It seems that for factored PM1 results out of GPUOWL, primenet won't be able to understand it. Code: {"status":"F", "exponent":"98141611", "worktype":"PM1", "B1":"750000", "B2":"15000000", "fft-length":"5767168", "factors":"["****"]", "program":{"name":"gpuowl", "version":"v6.11-258-gb92cdfd"}, "computer":"TITAN V-0", "aid":"******", "timestamp":"2020-05-06 07:29:29 UTC"}
2020-05-07, 18:23   #2155
S485122

Sep 2006
Brussels, Belgium

2×5×167 Posts

Quote:
 Originally Posted by paulunderwood Was ~1200us/it (103.9M) and ~1800us/it (104.9M). Now 1440us/it each -- both at 104.9M.
"us" ? Usually it is capitalised as "US", but it is not a unit (AFAIK.) Or do you (and preceding posters) mean µs ?

Jacob

2020-05-07, 18:35   #2156
paulunderwood

Sep 2002
Database er0rr

3×52×72 Posts

Quote:
 Originally Posted by S485122 "us" ? Usually it is capitalised as "US", but it is not a unit (AFAIK.) Or do you (and preceding posters) mean µs ? Jacob
Yes, I meant µs. But how do I generate mu with the keyboard easily?

nvm: I found this on how to do it Gnome without having to remember and use unicodes. Thanks for prompting me!

Last fiddled with by paulunderwood on 2020-05-07 at 18:47

 Similar Threads Thread Thread Starter Forum Replies Last Post Bdot GPU Computing 1668 2020-12-22 15:38 xx005fs GpuOwl 0 2019-07-26 21:37 1260 Software 17 2015-08-28 01:35 CRGreathouse Computer Science & Computational Number Theory 18 2013-06-08 19:12 Unregistered Information & Answers 4 2006-10-04 22:38

All times are UTC. The time now is 03:20.

Sun May 16 03:20:55 UTC 2021 up 37 days, 22:01, 0 users, load averages: 1.70, 2.30, 2.57