mersenneforum.org gpuowl tuning
 Register FAQ Search Today's Posts Mark Forums Read

 2018-03-24, 23:11 #1 M344587487     "Composite as Heck" Oct 2017 10111001002 Posts gpuowl tuning I'm trying to tune my Vega56 for gpuowl at 5M FFT. I'm a novice at overclocking especially on linux, please let me know if I'm doing things wrong. I'm aiming for two profiles, one for efficiency the other throughput (without going crazy either way). All I've done so far is mess around with the default p-states using "rocm-smi --setsclk LEVEL". Couldn't get states 6 or 7 to stick (they are sometime's briefly entered before staying 99% of the time at level 5). Set "rocm-smi --setfan 120" to be able to compare temps. Software: Ubuntu 16.04, latest 4.13 kernel, ROCm 1.7.1, gcc 5.4.0, gpuowl v2.0-dbc5a01 Code: P-state core_clk mem_clk temp watts ms/it mJ/it 5 1474 800 59 165 2.68 442.2 4 1312 800 49 132 2.87 378.84 3 1269 800 46 120 2.9 348 2 1138 800 43 109 3.16 344.44 1 991 800 42 97 3.5 339.5 0 852 800 40 87 3.93 341.91 P3 looks pretty good for a balanced profile, I'd be happy if underclocking could shave 10-20 watts off for the same performance. Doesn't look like rocm-smi can underclock, what's the best tool for that? I found this but don't want to mess with it without knowing if it's any good: https://github.com/OhGodACompany/OhGodATool If you want to chime in with your GPUs gpuowl 5M stats and how you got them feel free, who doesn't love benchmarks.
 2018-10-28, 14:36 #2 M344587487     "Composite as Heck" Oct 2017 13448 Posts ROCm supports Ubuntu 18.04 now so I migrated. The gpuowl version used above was consistently slower on this setup by 0.1ms/it at level 5 so I updated gpuowl and retested. It's not an apples to apples comparison as the new testing uses a 4608K kernel, but it's still interesting. Software: Ubuntu 18.04, kernel 4.15.0-38-generic, gcc 8.2.0, gpuowl 4.7-5b01b65 Code: P-state core_clk mem_clk temp watts ms/it mJ/it 5 1474 800 59 164 2.38 390.32 4 1312 800 50 130 2.50 325 3 1269 800 47 120 2.52 302.4 2 1138 800 40 91 2.66 242.06 1 991 700 37 75 3.04 228 0 852 167 29 40 8.84 353.6 This is a big difference, partly due to using a more suited FFT, other potential optimisations by preda, and updated P-state profiles in the driver with better efficiency. It's interesting how much more efficient states 1 and 2 are compared to before and how aggressive the lower P-states are now in terms of power consumption. State 2 is now my preferred state for efficiency without sacrificing much throughput, bearing in mind that these figures are as reported by rocm-smi not at the wall. I'm looking forward to the day ROCm exposes voltage control, going beyond 800MHz memory clock and finer control of core clocks. It looks like it may be possible to do this manually now by pushing PPT tables in binary form instead of the currently not working text form, has anyone tried this? https://github.com/RadeonOpenCompute...ment-418597555
2018-10-28, 15:42   #3
SELROC

7,823 Posts

Quote:
 Originally Posted by M344587487 ROCm supports Ubuntu 18.04 now so I migrated. The gpuowl version used above was consistently slower on this setup by 0.1ms/it at level 5 so I updated gpuowl and retested. It's not an apples to apples comparison as the new testing uses a 4608K kernel, but it's still interesting. Software: Ubuntu 18.04, kernel 4.15.0-38-generic, gcc 8.2.0, gpuowl 4.7-5b01b65 Code: P-state core_clk mem_clk temp watts ms/it mJ/it 5 1474 800 59 164 2.38 390.32 4 1312 800 50 130 2.50 325 3 1269 800 47 120 2.52 302.4 2 1138 800 40 91 2.66 242.06 1 991 700 37 75 3.04 228 0 852 167 29 40 8.84 353.6 This is a big difference, partly due to using a more suited FFT, other potential optimisations by preda, and updated P-state profiles in the driver with better efficiency. It's interesting how much more efficient states 1 and 2 are compared to before and how aggressive the lower P-states are now in terms of power consumption. State 2 is now my preferred state for efficiency without sacrificing much throughput, bearing in mind that these figures are as reported by rocm-smi not at the wall. I'm looking forward to the day ROCm exposes voltage control, going beyond 800MHz memory clock and finer control of core clocks. It looks like it may be possible to do this manually now by pushing PPT tables in binary form instead of the currently not working text form, has anyone tried this? https://github.com/RadeonOpenCompute...ment-418597555

I have always used the amdgpu driver. I run my gpus at nominal clock, and I let the gpu do automatic voltage and fan control with factory settings.
At nominal clock the gpu goes up to 77C, 144W, as reported by the "sensors" command.

 2018-10-28, 16:33 #4 preda     "Mihai Preda" Apr 2015 2·23·29 Posts I use ROCm 1.9.1, Ubuntu 18.04 with Linux kernel 4.18.8. With dual Vega64 (air with the standard "blower" cooler). Here are my observations: 1. ROCm is in general faster then amdgpu-pro (better compiler, producing better ISA code). 2. My sweet-spot is p-state 5 (rocm-smi --setsclk 5), which results in 1401MHz, GPU fan at 2300 RPM (automatic), 150W power, 75degC temperature. If I set the frequency higher (p-state 6, or 7, or automatic (default)), the GPU quickly reaches 82-84 decC and there does thermal throttling. This thermal throttling results in worse performance then p-state 5, so it's a lose-lose: higher temperature, higher power use, lower performance. I do not set the fan speed manually, I leave it on automatic, which is enough cooling for 150W with 75C.
2018-10-29, 07:25   #5
SELROC

7,309 Posts

Quote:
 Originally Posted by preda I use ROCm 1.9.1, Ubuntu 18.04 with Linux kernel 4.18.8. With dual Vega64 (air with the standard "blower" cooler). Here are my observations: 1. ROCm is in general faster then amdgpu-pro (better compiler, producing better ISA code). 2. My sweet-spot is p-state 5 (rocm-smi --setsclk 5), which results in 1401MHz, GPU fan at 2300 RPM (automatic), 150W power, 75degC temperature. If I set the frequency higher (p-state 6, or 7, or automatic (default)), the GPU quickly reaches 82-84 decC and there does thermal throttling. This thermal throttling results in worse performance then p-state 5, so it's a lose-lose: higher temperature, higher power use, lower performance. I do not set the fan speed manually, I leave it on automatic, which is enough cooling for 150W with 75C.

I would suggest to leave the fan in automatic mode, until you have a stable temperature and know for sure which is the max temperature, then you can tune your fan speed if you want.

2018-10-29, 16:46   #6
M344587487

"Composite as Heck"
Oct 2017

22×5×37 Posts

Just installed the latest modded kernel and appear to be able to alter P-state voltage and clock settings via PPT. I'm in the process of checking things and have a watt-meter so I'll be able to see if voltage control actually works soon.

All previous testing appears to be invalid as it may not have been on the stock config. Never been able to get states 6 or 7 to stick or break over the 75C temperature target, quite possibly due to a holdover from mining crypto on windows. I was under the impression that using the blockchain drivers and setting powerplay tables didn't write anything to the card (at least nothing that survives a reboot) but maybe it remembers the last good PPT setting or something. Am I wrong to think that firmware is the only thing stored on the card and that it hasn't been updated?

Quote:
 Originally Posted by preda 2. My sweet-spot is p-state 5 (rocm-smi --setsclk 5), which results in 1401MHz, GPU fan at 2300 RPM (automatic), 150W power, 75degC temperature. If I set the frequency higher (p-state 6, or 7, or automatic (default)), the GPU quickly reaches 82-84 decC and there does thermal throttling. This thermal throttling results in worse performance then p-state 5, so it's a lose-lose: higher temperature, higher power use, lower performance.
Lowering core voltage may allow state 6 without throttling, probably not though as Vega at higher clocks is pretty far from optimal on the curve.

Quote:
 Originally Posted by preda I do not set the fan speed manually, I leave it on automatic, which is enough cooling for 150W with 75C.
The reported wattage of a given state changes by up to 5W depending on the temperature so it was pretty dumb of me to fix fan speed, I'll do all testing at 75C from now on.

Quote:
 Originally Posted by SELROC I would suggest to leave the fan in automatic mode, until you have a stable temperature and know for sure which is the max temperature, then you can tune your fan speed if you want.
75C is the temperature target, I think it'll only exceed this in states 6 and 7 until throttling. Until now I haven't had states 6 and 7 to play with so don't know for sure. It looks like the temp target can be changed via PPT which is nice.

2018-10-29, 17:19   #7
SELROC

637610 Posts

Quote:
 Originally Posted by preda I use ROCm 1.9.1, Ubuntu 18.04 with Linux kernel 4.18.8. With dual Vega64 (air with the standard "blower" cooler). Here are my observations: 1. ROCm is in general faster then amdgpu-pro (better compiler, producing better ISA code). 2. My sweet-spot is p-state 5 (rocm-smi --setsclk 5), which results in 1401MHz, GPU fan at 2300 RPM (automatic), 150W power, 75degC temperature. If I set the frequency higher (p-state 6, or 7, or automatic (default)), the GPU quickly reaches 82-84 decC and there does thermal throttling. This thermal throttling results in worse performance then p-state 5, so it's a lose-lose: higher temperature, higher power use, lower performance. I do not set the fan speed manually, I leave it on automatic, which is enough cooling for 150W with 75C.

Today I have opened an issue on ROCm github and requested Debian support. I was not the first one to open an issue for Debian support, so may be something is moving in that direction.

 2018-10-29, 20:05 #8 xx005fs   "Eric" Jan 2018 USA 110101002 Posts Vega Tweaking I have done quite some tweaking on both Windows and Linux and due to my lack of knowledge when I use Linux I didn't use the ROCm driver but instead opted for the amdgpu-pro driver and I used an utility called amdcovc to overclock and tweak my GPU. I found that to try to maximize performance try to max out the HBM2 clock as high as possible especially if you have Samsung HBM rather than Hynix. What I found is that flashing a Vega 56 with a 64 BIOS (ONLY with Samsung HBM2 and I don't know how to check that on Linux) increases HBM voltage and thus overclocks better. On Vega 56 the max limit for Samsung is about 1020-1050MHz AFAIK and for Hynix it's usually under 1000. Undervolting it also greatly help as reducing the voltage to 0.95V will greatly reduce power and heat and increase stability on the memory as the core heats up the memory also becomes less stable. My personal finding is that even with core overclocked to massive speeds like 1750+MHz, if the HBM is at stock it barely improves compared to pure stock performance at about 2.2ms/it on Vega 56 flashed to 64 BIOS. However, with a memory overclock I can easily push it to 2.06ms/it while drawing half the power, from 300W down to less than 150W. If electricity is not a concern try to push it as high as the core will go while keeping the temperature below 70C and with reasonable fan speed which will probably improve the speed to about 1.9ms/it on Samsung HBM.
 2018-10-29, 21:13 #9 M344587487     "Composite as Heck" Oct 2017 22·5·37 Posts Thanks. Pretty sure it's Samsung HBM2 as I believe none of the early Vega 56 used Hynix but we'll see. Undervolt core and overclock memory is the idea. If the spreadsheet used to generate the powerplay tables is to be believed the memory voltage of P-State 4 can be altered because it uses the core voltage of P-State 4 instead (which may not be useful as we won't be able to undervolt core and OC mem at the same time). If I can avoid a bios flash I will but it's worth keeping in the back pocket. I'm aiming for efficiency without sacrificing too much throughput. The best so far is 0.825V 1269MHz core, 900MHz memory for 2.47 ms/it at 100W for the low end. If it's stable I might end up using that (or a slightly bumped voltage for stability), depends how good a clock of ~1400 ends up being.
 2018-10-29, 22:41 #10 M344587487     "Composite as Heck" Oct 2017 22×5×37 Posts Editing Vega voltages and clocks with ROCm Prep: Install and use custom kernel (not tested with a normal kernel, it might work): https://github.com/M-Bab/linux-kernel-amdgpu-binaries git clone https://github.com/xmrminer01102018/VegaToolsNConfigs Copy setPPT.sh from ./tools into ./config/PPTDIR Copy this powerplay table generator into your own google sheet: https://docs.google.com/spreadsheets...#gid=964538665 Instead of generating a registry entry edit sheet to generate a single hex string with no formatting Create and use powerplay table: Edit column AE of spreadsheet with the values you want (use state 4 as you can alter memory clocks. Voltages and clocks of state n need to be >= those of state n-1) Click on hex string and CTRL-C (mouse copy does not work) cd VegaToolsNConfigs/config/PPTDIR rm curr.* cat {paste_string} > curr.hex java -jar SoftPPT-1.0.0.jar curr.hex curr. sudo ./setPPT.sh 0 curr.1 sudo /opt/rocm/bin/rocm-smi --setsclk 4 Tested with ROCm 1.9.1, Ubuntu 18.04, custom kernel 4.19, latest gpuowl as of 2018-10-28 on a Vega 56 with stock bios. YMMV, don't blow up your card ;)
2018-10-30, 19:39   #11
xx005fs

"Eric"
Jan 2018
USA

22×53 Posts

Quote:
 Originally Posted by M344587487 Thanks. Pretty sure it's Samsung HBM2 as I believe none of the early Vega 56 used Hynix but we'll see. Undervolt core and overclock memory is the idea. If the spreadsheet used to generate the powerplay tables is to be believed the memory voltage of P-State 4 can be altered because it uses the core voltage of P-State 4 instead (which may not be useful as we won't be able to undervolt core and OC mem at the same time). If I can avoid a bios flash I will but it's worth keeping in the back pocket. I'm aiming for efficiency without sacrificing too much throughput. The best so far is 0.825V 1269MHz core, 900MHz memory for 2.47 ms/it at 100W for the low end. If it's stable I might end up using that (or a slightly bumped voltage for stability), depends how good a clock of ~1400 ends up being.
Try to not go below 150W as efficiency curve significantly diminishes. You are getting way less work done while only lowering by about 20-30W. I tested on my own card (despite not 30% decrease in perf), it decreased from 2.05ms/it to around 2.2 which is quite significant. so what I suggest is that increasing HBM voltage also increases SoC voltage when you go above 1100MHz, which increases the minimum core voltage. I suggest hitting around 1040-1090 MHz on your HBM (depending on your luck) and try to lower Vcore as much as possible and use GPUZ to check power draw. Since Vega cards have dual BIOS you can always flash to Vega 64 BIOS to get to the 1040-1080 range. Else it might not even hit 1000 MHZ

 Similar Threads Thread Thread Starter Forum Replies Last Post preda GpuOwl 2686 2021-01-14 21:32 SELROC GpuOwl 59 2020-10-02 03:56 preda PrimeNet 2 2017-10-07 21:32 biwema Software 12 2006-01-17 03:02

All times are UTC. The time now is 06:26.

Thu Jan 28 06:26:17 UTC 2021 up 56 days, 2:37, 0 users, load averages: 2.59, 2.61, 2.60