mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing > GpuOwl

Reply
 
Thread Tools
Old 2018-03-24, 23:11   #1
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

23×3×29 Posts
Default gpuowl tuning

I'm trying to tune my Vega56 for gpuowl at 5M FFT. I'm a novice at overclocking especially on linux, please let me know if I'm doing things wrong. I'm aiming for two profiles, one for efficiency the other throughput (without going crazy either way). All I've done so far is mess around with the default p-states using "rocm-smi --setsclk LEVEL". Couldn't get states 6 or 7 to stick (they are sometime's briefly entered before staying 99% of the time at level 5). Set "rocm-smi --setfan 120" to be able to compare temps.

Software: Ubuntu 16.04, latest 4.13 kernel, ROCm 1.7.1, gcc 5.4.0, gpuowl v2.0-dbc5a01
Code:
P-state core_clk mem_clk temp watts ms/it mJ/it
5         1474     800    59   165  2.68  442.2
4         1312     800    49   132  2.87  378.84
3         1269     800    46   120  2.9   348
2         1138     800    43   109  3.16  344.44
1          991     800    42    97  3.5   339.5
0          852     800    40    87  3.93  341.91
P3 looks pretty good for a balanced profile, I'd be happy if underclocking could shave 10-20 watts off for the same performance. Doesn't look like rocm-smi can underclock, what's the best tool for that? I found this but don't want to mess with it without knowing if it's any good: https://github.com/OhGodACompany/OhGodATool

If you want to chime in with your GPUs gpuowl 5M stats and how you got them feel free, who doesn't love benchmarks.
M344587487 is online now   Reply With Quote
Old 2018-10-28, 14:36   #2
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

23×3×29 Posts
Default

ROCm supports Ubuntu 18.04 now so I migrated. The gpuowl version used above was consistently slower on this setup by 0.1ms/it at level 5 so I updated gpuowl and retested. It's not an apples to apples comparison as the new testing uses a 4608K kernel, but it's still interesting.

Software: Ubuntu 18.04, kernel 4.15.0-38-generic, gcc 8.2.0, gpuowl 4.7-5b01b65
Code:
P-state core_clk mem_clk temp watts ms/it mJ/it
5         1474     800    59   164  2.38  390.32
4         1312     800    50   130  2.50  325   
3         1269     800    47   120  2.52  302.4 
2         1138     800    40    91  2.66  242.06
1          991     700    37    75  3.04  228   
0          852     167    29    40  8.84  353.6
This is a big difference, partly due to using a more suited FFT, other potential optimisations by preda, and updated P-state profiles in the driver with better efficiency. It's interesting how much more efficient states 1 and 2 are compared to before and how aggressive the lower P-states are now in terms of power consumption. State 2 is now my preferred state for efficiency without sacrificing much throughput, bearing in mind that these figures are as reported by rocm-smi not at the wall.

I'm looking forward to the day ROCm exposes voltage control, going beyond 800MHz memory clock and finer control of core clocks. It looks like it may be possible to do this manually now by pushing PPT tables in binary form instead of the currently not working text form, has anyone tried this? https://github.com/RadeonOpenCompute...ment-418597555
M344587487 is online now   Reply With Quote
Old 2018-10-28, 15:42   #3
SELROC
 

23A16 Posts
Default

Quote:
Originally Posted by M344587487 View Post
ROCm supports Ubuntu 18.04 now so I migrated. The gpuowl version used above was consistently slower on this setup by 0.1ms/it at level 5 so I updated gpuowl and retested. It's not an apples to apples comparison as the new testing uses a 4608K kernel, but it's still interesting.

Software: Ubuntu 18.04, kernel 4.15.0-38-generic, gcc 8.2.0, gpuowl 4.7-5b01b65
Code:
P-state core_clk mem_clk temp watts ms/it mJ/it
5         1474     800    59   164  2.38  390.32
4         1312     800    50   130  2.50  325   
3         1269     800    47   120  2.52  302.4 
2         1138     800    40    91  2.66  242.06
1          991     700    37    75  3.04  228   
0          852     167    29    40  8.84  353.6
This is a big difference, partly due to using a more suited FFT, other potential optimisations by preda, and updated P-state profiles in the driver with better efficiency. It's interesting how much more efficient states 1 and 2 are compared to before and how aggressive the lower P-states are now in terms of power consumption. State 2 is now my preferred state for efficiency without sacrificing much throughput, bearing in mind that these figures are as reported by rocm-smi not at the wall.

I'm looking forward to the day ROCm exposes voltage control, going beyond 800MHz memory clock and finer control of core clocks. It looks like it may be possible to do this manually now by pushing PPT tables in binary form instead of the currently not working text form, has anyone tried this? https://github.com/RadeonOpenCompute...ment-418597555

I have always used the amdgpu driver. I run my gpus at nominal clock, and I let the gpu do automatic voltage and fan control with factory settings.
At nominal clock the gpu goes up to 77C, 144W, as reported by the "sensors" command.
  Reply With Quote
Old 2018-10-28, 16:33   #4
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

52F16 Posts
Default

I use ROCm 1.9.1, Ubuntu 18.04 with Linux kernel 4.18.8.
With dual Vega64 (air with the standard "blower" cooler).
Here are my observations:

1. ROCm is in general faster then amdgpu-pro (better compiler, producing better ISA code).
2. My sweet-spot is p-state 5 (rocm-smi --setsclk 5), which results in 1401MHz, GPU fan at 2300 RPM (automatic), 150W power, 75degC temperature.

If I set the frequency higher (p-state 6, or 7, or automatic (default)), the GPU quickly reaches 82-84 decC and there does thermal throttling. This thermal throttling results in worse performance then p-state 5, so it's a lose-lose: higher temperature, higher power use, lower performance.

I do not set the fan speed manually, I leave it on automatic, which is enough cooling for 150W with 75C.
preda is online now   Reply With Quote
Old 2018-10-29, 07:25   #5
SELROC
 

22·1,601 Posts
Default

Quote:
Originally Posted by preda View Post
I use ROCm 1.9.1, Ubuntu 18.04 with Linux kernel 4.18.8.
With dual Vega64 (air with the standard "blower" cooler).
Here are my observations:

1. ROCm is in general faster then amdgpu-pro (better compiler, producing better ISA code).
2. My sweet-spot is p-state 5 (rocm-smi --setsclk 5), which results in 1401MHz, GPU fan at 2300 RPM (automatic), 150W power, 75degC temperature.

If I set the frequency higher (p-state 6, or 7, or automatic (default)), the GPU quickly reaches 82-84 decC and there does thermal throttling. This thermal throttling results in worse performance then p-state 5, so it's a lose-lose: higher temperature, higher power use, lower performance.

I do not set the fan speed manually, I leave it on automatic, which is enough cooling for 150W with 75C.

I would suggest to leave the fan in automatic mode, until you have a stable temperature and know for sure which is the max temperature, then you can tune your fan speed if you want.
  Reply With Quote
Old 2018-10-29, 16:46   #6
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

12708 Posts
Default

Just installed the latest modded kernel and appear to be able to alter P-state voltage and clock settings via PPT. I'm in the process of checking things and have a watt-meter so I'll be able to see if voltage control actually works soon.

All previous testing appears to be invalid as it may not have been on the stock config. Never been able to get states 6 or 7 to stick or break over the 75C temperature target, quite possibly due to a holdover from mining crypto on windows. I was under the impression that using the blockchain drivers and setting powerplay tables didn't write anything to the card (at least nothing that survives a reboot) but maybe it remembers the last good PPT setting or something. Am I wrong to think that firmware is the only thing stored on the card and that it hasn't been updated?

Quote:
Originally Posted by preda View Post
2. My sweet-spot is p-state 5 (rocm-smi --setsclk 5), which results in 1401MHz, GPU fan at 2300 RPM (automatic), 150W power, 75degC temperature.

If I set the frequency higher (p-state 6, or 7, or automatic (default)), the GPU quickly reaches 82-84 decC and there does thermal throttling. This thermal throttling results in worse performance then p-state 5, so it's a lose-lose: higher temperature, higher power use, lower performance.
Lowering core voltage may allow state 6 without throttling, probably not though as Vega at higher clocks is pretty far from optimal on the curve.

Quote:
Originally Posted by preda View Post
I do not set the fan speed manually, I leave it on automatic, which is enough cooling for 150W with 75C.
The reported wattage of a given state changes by up to 5W depending on the temperature so it was pretty dumb of me to fix fan speed, I'll do all testing at 75C from now on.

Quote:
Originally Posted by SELROC View Post
I would suggest to leave the fan in automatic mode, until you have a stable temperature and know for sure which is the max temperature, then you can tune your fan speed if you want.
75C is the temperature target, I think it'll only exceed this in states 6 and 7 until throttling. Until now I haven't had states 6 and 7 to play with so don't know for sure. It looks like the temp target can be changed via PPT which is nice.
M344587487 is online now   Reply With Quote
Old 2018-10-29, 17:19   #7
SELROC
 

17×29 Posts
Default

Quote:
Originally Posted by preda View Post
I use ROCm 1.9.1, Ubuntu 18.04 with Linux kernel 4.18.8.
With dual Vega64 (air with the standard "blower" cooler).
Here are my observations:

1. ROCm is in general faster then amdgpu-pro (better compiler, producing better ISA code).
2. My sweet-spot is p-state 5 (rocm-smi --setsclk 5), which results in 1401MHz, GPU fan at 2300 RPM (automatic), 150W power, 75degC temperature.

If I set the frequency higher (p-state 6, or 7, or automatic (default)), the GPU quickly reaches 82-84 decC and there does thermal throttling. This thermal throttling results in worse performance then p-state 5, so it's a lose-lose: higher temperature, higher power use, lower performance.

I do not set the fan speed manually, I leave it on automatic, which is enough cooling for 150W with 75C.

Today I have opened an issue on ROCm github and requested Debian support. I was not the first one to open an issue for Debian support, so may be something is moving in that direction.
  Reply With Quote
Old 2018-10-29, 20:05   #8
xx005fs
 
"Eric"
Jan 2018
USA

211 Posts
Default Vega Tweaking

I have done quite some tweaking on both Windows and Linux and due to my lack of knowledge when I use Linux I didn't use the ROCm driver but instead opted for the amdgpu-pro driver and I used an utility called amdcovc to overclock and tweak my GPU. I found that to try to maximize performance try to max out the HBM2 clock as high as possible especially if you have Samsung HBM rather than Hynix. What I found is that flashing a Vega 56 with a 64 BIOS (ONLY with Samsung HBM2 and I don't know how to check that on Linux) increases HBM voltage and thus overclocks better. On Vega 56 the max limit for Samsung is about 1020-1050MHz AFAIK and for Hynix it's usually under 1000. Undervolting it also greatly help as reducing the voltage to 0.95V will greatly reduce power and heat and increase stability on the memory as the core heats up the memory also becomes less stable. My personal finding is that even with core overclocked to massive speeds like 1750+MHz, if the HBM is at stock it barely improves compared to pure stock performance at about 2.2ms/it on Vega 56 flashed to 64 BIOS. However, with a memory overclock I can easily push it to 2.06ms/it while drawing half the power, from 300W down to less than 150W. If electricity is not a concern try to push it as high as the core will go while keeping the temperature below 70C and with reasonable fan speed which will probably improve the speed to about 1.9ms/it on Samsung HBM.
xx005fs is offline   Reply With Quote
Old 2018-10-29, 21:13   #9
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

23·3·29 Posts
Default

Thanks. Pretty sure it's Samsung HBM2 as I believe none of the early Vega 56 used Hynix but we'll see. Undervolt core and overclock memory is the idea. If the spreadsheet used to generate the powerplay tables is to be believed the memory voltage of P-State 4 can be altered because it uses the core voltage of P-State 4 instead (which may not be useful as we won't be able to undervolt core and OC mem at the same time). If I can avoid a bios flash I will but it's worth keeping in the back pocket.

I'm aiming for efficiency without sacrificing too much throughput. The best so far is 0.825V 1269MHz core, 900MHz memory for 2.47 ms/it at 100W for the low end. If it's stable I might end up using that (or a slightly bumped voltage for stability), depends how good a clock of ~1400 ends up being.
M344587487 is online now   Reply With Quote
Old 2018-10-29, 22:41   #10
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

23×3×29 Posts
Default Editing Vega voltages and clocks with ROCm

Prep:


Create and use powerplay table:
  • Edit column AE of spreadsheet with the values you want (use state 4 as you can alter memory clocks. Voltages and clocks of state n need to be >= those of state n-1)
  • Click on hex string and CTRL-C (mouse copy does not work)
  • cd VegaToolsNConfigs/config/PPTDIR
  • rm curr.*
  • cat {paste_string} > curr.hex
  • java -jar SoftPPT-1.0.0.jar curr.hex curr.
  • sudo ./setPPT.sh 0 curr.1
  • sudo /opt/rocm/bin/rocm-smi --setsclk 4


Tested with ROCm 1.9.1, Ubuntu 18.04, custom kernel 4.19, latest gpuowl as of 2018-10-28 on a Vega 56 with stock bios. YMMV, don't blow up your card ;)
M344587487 is online now   Reply With Quote
Old 2018-10-30, 19:39   #11
xx005fs
 
"Eric"
Jan 2018
USA

211 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Thanks. Pretty sure it's Samsung HBM2 as I believe none of the early Vega 56 used Hynix but we'll see. Undervolt core and overclock memory is the idea. If the spreadsheet used to generate the powerplay tables is to be believed the memory voltage of P-State 4 can be altered because it uses the core voltage of P-State 4 instead (which may not be useful as we won't be able to undervolt core and OC mem at the same time). If I can avoid a bios flash I will but it's worth keeping in the back pocket.

I'm aiming for efficiency without sacrificing too much throughput. The best so far is 0.825V 1269MHz core, 900MHz memory for 2.47 ms/it at 100W for the low end. If it's stable I might end up using that (or a slightly bumped voltage for stability), depends how good a clock of ~1400 ends up being.
Try to not go below 150W as efficiency curve significantly diminishes. You are getting way less work done while only lowering by about 20-30W. I tested on my own card (despite not 30% decrease in perf), it decreased from 2.05ms/it to around 2.2 which is quite significant. so what I suggest is that increasing HBM voltage also increases SoC voltage when you go above 1100MHz, which increases the minimum core voltage. I suggest hitting around 1040-1090 MHz on your HBM (depending on your luck) and try to lower Vcore as much as possible and use GPUZ to check power draw. Since Vega cards have dual BIOS you can always flash to Vega 64 BIOS to get to the 1040-1080 range. Else it might not even hit 1000 MHZ
xx005fs is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2606 2020-11-20 20:35
gpuowl: runtime error SELROC GpuOwl 59 2020-10-02 03:56
How to interface gpuOwl with PrimeNet preda PrimeNet 2 2017-10-07 21:32
Organizational tuning biwema Software 12 2006-01-17 03:02

All times are UTC. The time now is 12:04.

Tue Nov 24 12:04:42 UTC 2020 up 75 days, 9:15, 4 users, load averages: 1.31, 1.32, 1.48

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.