View Single Post
Old 2020-06-10, 23:33   #1
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

22×5×587 Posts
Default How to set up for running gpuOwl under Ubuntu (and other Linux) with OpenCL

Moderator Note: Post #1 of this thread is intended to provide, step by step, everything needed by a user wanting to do what the thread title states, starting with the procedure for creating a Ubuntu boot-image USB. Later comments are tasked with noting specific (and hopefully small) differences needed for e.g. other Linux distros and specific GPU models, and may be folded into the OP as warranted. Post #1 will be continually maintained and updated so as to stay current.

Thanks to Xyzzy for handholding me through the boot-image procedure, M344587487 for the original version of the gpuowl-setup recipe and the Radeon VII settings-tweak shell script, and all the various forumites (preda, paulunderwood, Prime95, etc) who helped the OP with this stuff when he purchased his first GPU, a Radeon VII, early in 2020. I have only tried the recipe out on one other GPU model, a Radeon 540 in an Intel NUC, there I sucessful built gpuowl but was unable to run it due to an issue of OpenCL not recognizing that GPU model. So feedback regarding whether it works - or how to make it work - with other GPUs and Linux distros is needed, and welcome.

Creating a Ubuntu boot-image USB: If you already have such a boot-image USB, you can skip to the next section. Note in the following all mount/umount/fdisk commands except of the informational kind must be done as root or using the 'sudo' prefix command.

Technical note: Both cp and dd do faithful byte-copy of a file, thus e.g. md5/sha1 will agree between original and copy. But dd copies to address-offset 0 on the target filesystem, because that is where bootloaders expects a boot image to start. And dd copies a file as a single contiguous block, whereas cp copies to wherever it finds a good spot, and use filesystem magic to link noncontiguous fragments into what looks like a single file to the outside world.

0. Go to the list of currently-supported Ubuntu releases and download the .iso file of the one you want. In my most-recent case I grabbed the 19.10 "64-bit PC (AMD64) desktop image" .iso file, and my notes will use that as an example;

1. Insert a usb stick into an existing linux or MacOs system. Many Linux distros will auto-mount USB storage media, but for boot-disk-creation, we must make sure it is *not* mounted. To see the mount point, use the linux lsblk command. E.g. on my 2015-vintage Intel NUC the USB was auto-mounted as /dev/sdb1, with mount point /media/ewmayer, ls of which showed a files-containing directory ... 'umount /dev/sdb /media/ewmayer' left 'ls -l /media/ewmayer' showing just . and .., no more directory entry. You need to be careful to specify both the block device (/dev/sd*) and the specific mount point of the USB, since it is common to have multiple filesystems sharing the same block device. I'll replace my 'sdb' with a generic 'sdX' and let users properly fill in for the 'X'.

2. Clear the usb stick - note this is slow and linear-time in the size of the storage medium, so it pays to use the smallest USB needed to store the ISO file. The trailing bs= option overrides the default blocksize-to-write, 512 bytes, with a much larger 1MB, which should speed things significantly:

sudo dd if=/dev/zero of=/dev/sdX bs=1M

The completion message looks scary but is simply an expected 'hit end of fs' message (note if your system hangs for, say more than a minute after printing the "No space left on device" message, you may need to ctrl-c it). Your numbers will be different, but in my case I saw this:

failed to open 'dev/sdb': No space left on device
31116289+0 records in
31116289+0 records out
15931539456 bytes (16 GB) copied, 3842.03 s, 4.1 MB/s [using newer 16GB USB, needed just 1566 s, 10.2 MB/s]

3. use dd to copy the .iso file. As dd is a low-level utility, no re-mount of the stick filesystem is needed/wanted, and my example again assumes the USB is mounted at /dev/sdX, with the user supplying the 'X':

sudo dd if=[Full path to ISO file, no wildcarding permitted] of=/dev/sdX bs=1M oflag=sync

On completion, 'sudo fdisk -l /dev/sdX' shows /dev/sdX1 as bootable (the * under 'Boot') and 'Empty'. In my case it also showed a nonbootable partition at /dev/sdb2, which we can ignore:
Code:
	Device     Boot    Start     End Sectors  Size Id Type
	/dev/sdb1  *           0 4812191 4812192  2.3G  0 Empty
	/dev/sdb2        4073124 4081059    7936  3.9M ef EFI (FAT-12/16/32)
Oddly, in the above the start of sdb2 lies inside the sdb1 range, but that appears to be ignorable. I've used the same boot-USB to install Ubuntu on multiple devices, without any problems.

In my resulting files-view window the previous contents of the USB had vanished and been replaced by 'Ubuntu 19.10 amd64', which showed 10 dirs - [boot,casper,dists,EFI,install,isolinux,pics,pool,preseed,ubuntu] - and 2 files, md5sum.txt [34.8 kB] and README.diskdefines [225 bytes].

4. After copying the .iso, the USB may or may not (this is OS-dependent) end up mounted on /dev/sdX1. To be sure unmount the filesystem with 'sudo umount /dev/sdX1'. (If it was not left so mounted, you'll simply get a "umount: /dev/sdX1: not mounted" error message.) Remove the stick from the system used to burn the .iso, and after doing any needed file backups of the target system, insert the stick into that, reboot and, at the appropriate prompt, press <f1> to enter the Boot Options menu.

5. Fiddle the boot order in the target system BIOS to put the USB at #1 (note this may not be needed, so feel free to first try starting from here:) then shut down, insert bootable USB, power up, use the up/down-arrow keys to scroll through resulting boot-options menu, which includes items like "try without installing" and "install alongside existing OS installation". I chose "install now". Next it detected an existing Debian install, asked if I wanted to keep ... this was on a mere 30GB SSD, 2 installs too cramped, so chose Ubuntu-only. 5 mins later: done, restarting ... "Please remove the installation medium, then press ENTER:".

6. If you fiddled the boot order in the BIOS in the preceding step, the next time you reboot, use the BIOS to move the hard drive back to #1 boot option.

Installing and setting up for gpuowl running:

o 'sudo passwd root' to set root pwd [I make same as user-pwd on my I-am-sole-user systems]
o sudo apt update
o sudo apt install -y build-essential clinfo git libgmp-dev libncurses5 libnuma-dev python ssh openssh-server
[build-essential instead is a meta package that installs gcc/g++/make and a few other packages commonly used in a standard libc toolchain; optional nice-to-haves include the 'multitail' and 'screen' packages]
o Edit /etc/default/grub to add amdgpu.ppfeaturemask=0xffffffff to GRUB_CMDLINE_LINUX_DEFAULT
o sudo update-grub
o wget -qO - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -
o echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list
o sudo apt update && sudo apt install rocm-dev
o Add yourself to the video group. There are 2 options for doing this:
1. The AMD rocm installation guide suggests using the command 'sudo usermod -a -G video $LOGNAME'
2. Should [1] fail for some reason, add yourself manually:
echo 'SUBSYSTEM=="kfd", KERNEL=="kfd", TAG+="uaccess", GROUP="video"' | sudo tee /etc/udev/rules.d/70-kfd.rules
o reboot
o git clone https://github.com/preda/gpuowl && cd gpuowl && make
['clone' only on initial setup - subsequent updates can use 'git pull' from within the existing gpuowl-dir:
'cd ~/gpuowl && git pull https://github.com/preda/gpuowl && make']

Queueing up work and reporting results:

o Read through the README.md file for basic background on running the code and various command-line options. To queue up GIMPS work, from within the gpuowl executable directory, run './tools/primenet.py -u [primenet uid] -p [primenet pwd] -w [your preferred worktype] --tasks [number of assignments to fetch] &'. This will periodically run an automated python work-management script to grab new work and report any results generated since the last such run of the script.

On my R7 I generally choose '-w PRP --tasks 10', since --tasks does not differentiate based on task type, e.g. if my current worktodo has, say, 5 p-1 jobs queued up, the work-fetch will only grab 5 new PRP assignments. I do weekly results-checkins/new-work-fetches and even running 2 jobs per card as suggested below for the R7, each PRP assignment completes in under 40 hours, thus I want at least 5 PRP assignments queued up at all times. Note that for PRP and LL-test assignments needing some prior p-1 trial factoring, the program will automatically split the original PRP or LL assignment into 2, inserting a p-1 one ("PFactor=...") before the PRP/LL one. Thus an original worktodo.txt file consisting of 10 new PRP assignments might end up with as many as 20 assignments, consisting of 10 such Pfactor/PRP pairs.

o Once the worktodo.txt file has been created and populated with 1 or more assignments, start the program: 'sudo ./gpuowl' should be all that is needed for most users. That will run 1 instance in the terminal in "live progress display" mode; to run in silent background mode or to manage more than one instance from a terminal session, prepend 'nohup' (this diverts all the ensuing screen output to the nohup.out file) and append ' &' to the program name. To target a specific device on a multi-GPU system, use the '-d [device id]' flag, with numeric device id taken from the output of the /opt/rocm/bin/rocm-smi command.

Use -maxAlloc to avoid out-of-memory with multi-jobs per card:

If you run multiple gpuowl instances per card as suggested in general for both performance and should-one-job-crash reasons, you need to take care to add '-maxAlloc [(0.9)*(Card memory in MB)/(#instances)]' to your program-invocation command line. That limits the program instances to using at most 90% of the card HBM in total; without it, if your multiple jobs happen to find themselves in the memory-hungry stage 2 of p-1 factoring at the same time, since OpenCL does not provide a reliable "how much HBM remains available" functionality, they will combine to allocate more memory than is on the card, causing them to swap out and slow to a crawl.

The default amount (around 90% of what is available on the card in question) gpuowl uses per job is well into the "diminishing returns" part of the stage 2 memory-vs-speed equation for typical modern cards having multi-gigabytes of HBM, so limiting the mem-alloc thusly should not incur a noticeable performance penalty, especially compared to the nearly-infinite performance penalty resulting from the above-described out-of-memory state.

Another good reason to run 2 instances per card - even on cards where this does not give a total-throughput boost - is fault insurance. For example, shortly after midnight last night one of the 2 jobs I had running on the R7 in my Haswell system coredumped with this obscure internal-fault message:

double free or corruption (!prev)
Aborted (core dumped)

No problem - Run #2 continued merrily on its way, the only hit to total-throughput was the single-digit-percentage one resulting from switching from 2-job to 1-job mode on this card. As soon as I saw what had happened on checking the status of my runs this morning, I restarted the aborted job with no problems. Had I been running just 1 job, a whole night's computing would have been lost.

Radeon VII specific:

o On R7, to maximize throughput you want to run 2 instances per card - in my experience this gives a roughly 6-8% total-throughput boost. I find the easiest way to do this is to create 2 subdirs under the gpuowl-dir, say run0 and run1 for card 0, cd into each and use '../tools/primenet.py [above options] &' to queue up work, and '../gpuowl [-flags] -maxAlloc 7500 &' to start a run.

If managing work remotely I precede each of the executable invocations with 'nohup', and use multitail -N 2 ~/gpuowl/run*/*log' to view the latest progress of my various runs.

o To maximize throughput per watt and keep card temperatures reasonable, you'll want to manually adjust each card's SCLK and MCLK settings - on my single-card system I get best FLOPS/Watt while avoiding huge-slowdown-levels of downclocking via the following bash script, which must be executed as root:
Code:
#!/bin/bash
# EWM: This is a basic single-GPU setup script ... customize to suit:

if [ "$EUID" -ne 0 ]; then echo "Radeon VII init script needs to be executed as root" && exit; fi

#Allow manual control
echo "manual" >/sys/class/drm/card0/device/power_dpm_force_performance_level
#Undervolt by setting max voltage
#               V Set this to 50mV less than the max stock voltage of your card (which varies from card to card), then optionally tune it down
echo "vc 2 1801 1010" >/sys/class/drm/card0/device/pp_od_clk_voltage
#Overclock mclk to 1200
echo "m 1 1200" >/sys/class/drm/card0/device/pp_od_clk_voltage
#Push a dummy sclk change for the undervolt to stick
echo "s 1 1801" >/sys/class/drm/card0/device/pp_od_clk_voltage
#Push everything to the card
echo "c" >/sys/class/drm/card0/device/pp_od_clk_voltage
#Put card into desired performance level
/opt/rocm/bin/rocm-smi --setsclk 3 --setfan 120
Setting SCLK = 3 rather than 4 saves ~50W with a modest ~6% timing hit; going to SCLK = 2 saves another 50W but incurs a further ~15% timing hit. If you find overclocking MCLK to 1200 is unstable (gives 'EE' error-line outputs and possibly causes the run to halt), try a lower 1150 - I've found that to be the maximum safe setting based on what works on all 4 of my R7s. You'll want to use rocm-smi to monitor the temp of your various cards and adjust the settings as needed.

o On my 3-R7 system, I use the elaborated setup script copied in the attachment to this post. Note the inline comments re. sclk and fan settings, and the actual job-start commands at end of the file, which put 2 jobs on each card. After a system reboot, I only need to do a single 'sudo bash *sh' to be up and running.

o Mihai Preda comments on running on multiple R7s:
"I find running gpuowl with -uid <16-hex-char id> much more useful than running with -d <position> .
This way the identity of the card is preserved even when swapping it around the PCIe slots.
And the script tools/device.py can be used to convert the UID to -d "position" for rocm-smi ."

Troubleshooting:

o If, after successfully building gpuowl, 'clinfo' does not find the GPU, try sudo apt install rocm-dkms, and assuming that package-install succeeds, reboot.

o If you installed an OpenCL-supporting GPU on a system which previously had an nVidia one, you may need to remove the nVidia drivers like so... . Such a previous-card install may also have left one or more /sys/class/drm/card* entries, which if they exist mean that the card-settings-init script below needs to have its 'card0' entries fiddled to replace 0 with the most-recently-added (= largest) index in the list of /sys/class/drm/card* entries.

o If you get a files-owned-by-root error on an attempted work-fetch using primenet.py, do 'sudo chown -R [uid]:[iud] *' and manually append the downloaded (but not written to the worktodo.txt file) assignments to worktodo.txt .

Setting PRP-proof Power to Restrict Disk Usage:

The PRP-proof mechanism in place since 2021 tends to use a lot of disk space. On systems with restricted disk space such as cloud accounts or with small SSDs, reining in disk usage is important. Here the basic math: for a PRP-with-proof of M(p) and a given integer proof power, gpuowl will create 2^power interim proof-related files, each of roughly (p/8) bytes. Example: my old Haswell quad hosts 2 Radeon VIIs, on each of which I run 2 gpuowl jobs. Said system has just a 40GB SSD, with perhaps 10GB available for gpuowl runtime data. At the current GIMPS PRP-wavefront, each of the aforementioned interim files needs ~15MB. With the default power = 8 setting, each PRP test would generate 256 such files for a total of nearly 4GB, times 4 jobs means up to ~15GB, more disk space than I have available. So I run with '-proof 7', which cuts the usage in half, to something manageable.

Even with that expedient, though, I find myself having to manually remove some leftover proof-related files from recently completed jobs, roughly every 2 days, which corresponds to each fresh batch of 4 PRP tests from the 4 gpuowl runs. I use 'df' to check disk usage daily, and when it gets over 90%, use the 'sudo chown' command sequence below to clean out the leftovers. On my Ubuntu system a single sudo-with-password 'lasts' ~15 minutes (in the sense that subsequent sudo-prefixed commands no longer ask for a password within that timeframe), so in order to string together multiple sudo-prefixed commands, I first do a 'sudo date' just to "prime the pump" as it were. I run each gpuowl instance from a separate run[0-3]-numbered directory under the master gpuowl-dir rather than using the 'pool' option, whence the 'run*' wildcarded bit:
Code:
df|grep sda
4:/dev/sda2       37688900 33023496   2721188  93% /
28:/dev/sda1         523248     7932    515316   2% /boot/efi

sudo date
[enter password]
sudo chown -R ewmayer:ewmayer ~/gpuowl/run*/* && sudo rm -f gpuowl/run*/uploaded/*oof && rm -fr gpuowl/run*/trashbin/*

df|grep sda
4:/dev/sda2       37688900 26583552   9161132  75% /
28:/dev/sda1         523248     7932    515316   2% /boot/efi
Advanced Usage:

o To clean-kill all gpuowl instances running on a system (say, prior to a system shutdown): Mihai explains that gpuowl expects SIGINT for clean shutdown. Simply doing e.g. "shutdown -h now" sends a SIGTERM followed - after an unspecified (and unspecifiable) delay - by a SIGKILL, and can lead to corrupted gpuowl savefile. The proper sequence is to precede the system shutdown with:

sudo kill -INT $(pgrep gpuowl)

o If using 'screen' and working remotely, once gpuowl is up and running, detach screen (ctrl-a --> d) prior to logout.

o For subsequent ROCm-updates, use the following sequence: sudo apt autoremove rocm-dev
[George adds: "I don't think these 2 are required, but I don't see how they'd hurt:
wget -qO - https://repo.radeon.com/rocm/rocm.gpg.key | sudo apt-key add -
echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list
]
sudo apt update
sudo apt install libncurses5|clinfo|rocm-dev
[reboot]
Attached Files
File Type: bz2 radeon_setup_3card.sh.bz2 (750 Bytes, 251 views)

Last fiddled with by ewmayer on 2022-07-01 at 20:37 Reason: Added small PRP-proof-files disk-usage-related section
ewmayer is offline   Reply With Quote