mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuowl: runtime error (https://www.mersenneforum.org/showthread.php?t=23117)

preda 2018-03-12 00:03

Did you consider trying ROCm?
[url]https://github.com/RadeonOpenCompute/ROCm[/url]

(in my oppinion it's at least on par with amdgpu-pro performance-wise)

It may be interesting to see if it encounters the problem in the same way.
(OTOH this may be too much trouble just to debug this issue).

[QUOTE=SELROC;482063]I am doing another test and waiting for more time before shutting down the system, to see if any other messages are generated in dmesg.[/QUOTE]

SELROC 2018-03-12 09:16

[QUOTE=preda;482110]Did you consider trying ROCm?
[URL]https://github.com/RadeonOpenCompute/ROCm[/URL]

(in my oppinion it's at least on par with amdgpu-pro performance-wise)

It may be interesting to see if it encounters the problem in the same way.
(OTOH this may be too much trouble just to debug this issue).[/QUOTE]

I am doing a test with amdgpu-pro (which includes rocm in the packages), the problem seems to be in the BIOS settings on the motherboard, so I apologize for rising the issue which maybe isn't on the software area...

Later on I will post my findings

SELROC 2018-03-13 09:40

[QUOTE=SELROC;482137]I am doing a test with amdgpu-pro (which includes rocm in the packages), the problem seems to be in the BIOS settings on the motherboard, so I apologize for rising the issue which maybe isn't on the software area...

Later on I will post my findings[/QUOTE]

I went ahead and started another test from scratch, this time with ROCm from the official repo...

SELROC 2018-03-13 17:32

[QUOTE=SELROC;482221]I went ahead and started another test from scratch, this time with ROCm from the official repo...[/QUOTE]

with ROCm I get an error -30 and gpuowl does not run

the clinfo utility gives a message: NULL platform behavior

still trying to figure out which opencl package is good to install with ROCm

..I'm back to trying with amdgpu-pro

SELROC 2018-03-14 16:17

[QUOTE=SELROC;482257]with ROCm I get an error -30 and gpuowl does not run

the clinfo utility gives a message: NULL platform behavior

still trying to figure out which opencl package is good to install with ROCm

..I'm back to trying with amdgpu-pro[/QUOTE]


I have resolved. It was an hardware issue.

SELROC 2018-09-22 07:23

[QUOTE=SELROC;481451]One thing I notice with two instances of gpuowl running: one instance gets stuck and the only way to stop it is reboot[/QUOTE]


Two issues have been isolated here:


1) the GPUs riser card location in pcie slots must be in contiguous order with no holes,


2) the processor C-States bios setting must be disabled to avoid low power state.

preda 2018-09-22 10:18

[QUOTE=SELROC;496555]Two issues have been isolated here:

1) the GPUs riser card location in pcie slots must be in contiguous order with no holes,

2) the processor C-States bios setting must be disabled to avoid low power state.[/QUOTE]

You may want to report these, if they only happen with ROCm but not with amdgpu-pro -- then maybe report to "rocm issues" on github, so that they know about it.

SELROC 2018-09-22 16:09

[QUOTE=preda;496559]You may want to report these, if they only happen with ROCm but not with amdgpu-pro -- then maybe report to "rocm issues" on github, so that they know about it.[/QUOTE]


I was expecting more collaboration from the ROCm team. They just said that mine is "unsupported hardware" and closed the issue.

moebius 2020-08-30 23:25

I have now finally managed to create an executable binary for colab-Ubuntu from the preda repository. Unfortunately, this error occurs while running. What is wrong?

[B][SIZE="1"]2020-08-30 23:04:28 gpuowl v6.11-380-g79ea0cc-dirty
2020-08-30 23:04:28 Note: not found 'config.txt'
2020-08-30 23:04:28 device 0, unique id ''
2020-08-30 23:04:28 Tesla P100-PCIE-16GB-0 104930401 FFT: 5.50M 1K:11:256 (18.19 bpw)
2020-08-30 23:04:28 Tesla P100-PCIE-16GB-0 Expected maximum carry32: 50950000
2020-08-30 23:04:29 Tesla P100-PCIE-16GB-0 OpenCL args "-DEXP=104930401u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=11u -DPM1=0 -DMM2_CHAIN=1u -DMAX_ACCURACY=1 -DWEIGHT_STEP_MINUS_1=0x1.7ee28e7ec46ep-1 -DIWEIGHT_STEP_MINUS_1=-0x1.b620c8c81195dp-2 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only "
2020-08-30 23:04:29 Tesla P100-PCIE-16GB-0

2020-08-30 23:04:29 Tesla P100-PCIE-16GB-0 OpenCL compilation in 0.00 s
2020-08-30 23:04:29 Tesla P100-PCIE-16GB-0 Exception gpu_error: INVALID_KERNEL clSetKernelArg(k, pos, sizeof(value), &value) at clwrap.h:77 setArg
2020-08-30 23:04:29 Tesla P100-PCIE-16GB-0 Bye[/SIZE][/B]

moebius 2020-08-31 09:04

[QUOTE=preda;496559.].....[/QUOTE]
Maybe preda can say something about it, I used the notebook from Kriesel to build

ATH 2020-08-31 10:12

Are your worktodo.txt lines of this format?:
PRP=<AID>,1,2,<exponent>,-1,75,0


You can also try this notebook which compiles gmp-6.2.0 and gpuowl with gcc9 in /root folder and copies the executable "gpuowl" to the root of the Google Drive, I just tested it:


[CODE]
import subprocess
import os

from google.colab import drive

if not os.path.exists('/content/drive/My Drive'):
drive.mount('/content/drive')

%cd ~
!sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
!sudo apt install -y gcc-9
!sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 800
!sudo apt install -y g++-9
!sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 800
!sudo apt-get -y install lzip
!sudo apt-get -y install m4
!sudo apt-get -y install libtool
!sudo apt-get -y install subversion
!sudo apt-get -y install make
!sudo apt-get -y install autoconf
!sudo apt-get -y install automake
!sudo wget https://gmplib.org/download/gmp/gmp-6.2.0.tar.lz
!sudo tar --lzip -xvf gmp-6.2.0.tar.lz
%cd gmp-6.2.0
!./configure ABI=64 CC=gcc CFLAGS="-O3 -m64 -mavx -mavx2" --build=x86_64-pc-linux-gnu --enable-cxx --enable-static --disable-shared
!make
!sudo make install

%cd ..
!git clone https://github.com/preda/gpuowl
%cd gpuowl
!make gpuowl
!cp gpuowl '/content/drive/My Drive/'

!cp /usr/lib/x86_64-linux-gnu/libstdc* '/content/drive/My Drive/'

[/CODE]

Last line will copy the 2 libstdc* files to your Google Drive.
Each time you need to run gpuowl without compiling it, you will need this line first (after connecting to Google Drive) to copy them back to the correct folder:

[CODE]!cp /content/drive/My\ Drive/libstdc* /usr/lib/x86_64-linux-gnu/[/CODE]


All times are UTC. The time now is 09:36.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.