mersenneforum.org Gpuowl v6 issue on free-tier Colab
 Register FAQ Search Today's Posts Mark Forums Read

2023-03-07, 18:16   #34
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·3·643 Posts

Quote:
 Originally Posted by mrh One of my favorite paradoxes! ... you can estimate the number of hashes you can do before having a 50% chance of collision, by sqrt(2*2^N * ln(2)) or close enough by sqrt(2^n). If the short hash is 28 bits, then we could expect about 19k commits before a collision.
sqrt(2*2^N * ln(2)) = 19290.685808...

From the perl shorty:
Probability of no collision 0.500034307479006;
Probability of at least one collision among 19290 hashes of hex length 7 = 49.9965692520994%.

Probability of no collision 0.49999837458871;
Probability of at least one collision among 19291 hashes of hex length 7 = 50.000162541129%.

50% seems rather too high to allow in this usage; even 0.1 to 1.% seems to me on the high side.
Six hex chars would have been much too short:
Probability of no collision 0.887670717319739;
Probability of at least one collision among 2000 hashes of hex length 6 = 11.2329282680261%.

tdulcet has provided additional commits and counts by pm. Gpuowl has more than 1500, even without including below v6.5.
Probability of no collision 0.995781563400284;
Probability of at least one collision among 1507 hashes of hex length 7 = 0.421843659971566%.
That's good odds of no collision, but not what I consider extremely low, especially for various probabilities that come up in GIMPS. (Consider the odds of random errors landing on the same wrong res64...)

Last fiddled with by kriesel on 2023-03-07 at 18:30

2023-03-08, 11:23   #35
tdulcet

"Teal Dulcet"
Jun 2018

10710 Posts

Quote:
 Originally Posted by kriesel That's good odds of no collision, but not what I consider extremely low, especially for various probabilities that come up in GIMPS.
OK, I was wrong about the odds in general being "extremely low", as I was only considering a single commit. Thanks for correcting me. The odds are still low though, as of course very few public git repositories have anywhere near 19,290 commits.

However, in this case the odds of a collision are actually 0%. From looking at the official documentation for the git describe command that GpuOwl uses to generate these version strings, it will automatically increase the number of characters in the hash prefix if a collision occurs, so the hashes it outputs should always be unique (at least until it is using all 40 characters of the full SHA-1 hash). For reference, here is the pertinent part from the documentation:
Quote:
 The hash suffix is "-g" + an unambigous abbreviation for the tip commit of parent. The length of the abbreviation scales as the repository grows, using the approximate number of objects in the repository and a bit of math around the birthday paradox, and defaults to a minimum of 7.
Note that git is planning to transition to longer and much more secure SHA-256 hashes, which will provide 64 characters.

Quote:
 Originally Posted by kriesel Gpuowl has more than 1500, even without including below v6.5.
The GpuOwl master branch has exactly 1,507 commits. The v6 branch has 1,259 commits, but only two of those are unique (the rest are also in the master branch). All 38 branches have a total of 1,615 unique commits.

2023-03-08, 15:26   #36
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22×3×643 Posts

Quote:
 Originally Posted by tdulcet the odds of a collision are actually 0%. From looking at the official documentation for the git describe command that GpuOwl uses to generate these version strings, it will automatically increase the number of characters in the hash prefix if a collision occurs, so the hashes it outputs should always be unique (at least until it is using all 40 characters of the full SHA-1 hash).
Quote:
 The hash suffix is "-g" + an unambigous abbreviation for the tip commit of parent. The length of the abbreviation scales as the repository grows, using the approximate number of objects in the repository and a bit of math around the birthday paradox, and defaults to a minimum of 7.
Well of course it does. We should have known. Thanks for checking and posting that.
Given that it was created by Linus in 10 days, I'd like to see what he could do in certain other technical areas in a month or two.
https://www.linuxfoundation.org/blog...linus-torvalds

Now, I wonder whether gpuowl gracefully handles hashes longer than 7. But it probably won't come up for a very long time.

Separately, I wrote a program to check the minimum difference in gpuowl commit hashes among a sorted list of 890, treating them as hex constants. Output was:
Max delta 2577757 at line 83, min delta 302 at line 656, in 890 sorted hashes, average value 134633427.446067 /268435456 = 0.501548601113511.
That min delta is 0x12E. That's about 1/1000 of what it would be if they were equally spaced across 228. 268435456/890 = 301612.8719... (890 not 889, because the last must also be separated from the first, after considering overflow/underflow)

The nearest pair are
v7.2 82 gbe3d396
v7.2 52 gbe3d4c4

Second, difference 0x34d (845):
v7.2 92 g5fb55ca
v7.2 56 g5fb5917

2023-03-11, 13:25   #37
tdulcet

"Teal Dulcet"
Jun 2018

1538 Posts
Help wanted

Quote:
 Originally Posted by kriesel Now, I wonder whether gpuowl gracefully handles hashes longer than 7.
GpuOwl does not do any processing the of version strings, so that should not be an issue. I would be more concerned about the PrimeNet server, especially if they were to add support for normalizing the GpuOwl version numbers as you requested...

---

Anyway, back to the topic of this thread... As some of the people here know, we did hear back from @preda and he requested that I create a pull request (PR). I then attempted to backport his commit that fixed the issue to the v6 branch, but unfortunately this was unsuccessful. While I was able to apply the commit to the v6 branch, it still does not work correctly on Colab. ๐ I am not an OpenCL programmer, so I obviously did not resolve the merge conflicts correctly.

As requested, I did create a draft PR with the changes: https://github.com/preda/gpuowl/pull/267. If anyone here has any OpenCl experience, any help to finish this PR and fix the v6 branch would be greatly appreciated. Any ideas of what I did wrong?

Here is a minimal notebook to test the changes from the draft PR:
Code:
import os
%cd "/content/drive/My Drive"
os.makedirs('GIMPS/temp', exist_ok=True)
%cd "GIMPS/temp"

print("Installing the GNU Multiple Precision (GMP) library")
!sudo apt update -y
!sudo apt install libgmp3-dev -y

!git clone https://github.com/tdulcet/gpuowl.git
%cd gpuowl
!git checkout v6 -f
!make -j "\$(nproc)"

!./gpuowl -prp 106928347 -iters 100000 -device 0 -cleanup -log 10000 -maxAlloc 13590M
!./gpuowl -ll 106928347 -iters 100000 -device 0 -cleanup -log 10000 -maxAlloc 13590M
If anyone wants to try applying the commit that fixed the issue to the v6 branch themselves, run these commands:
Code:
git clone https://github.com/preda/gpuowl.git
cd gpuowl
git checkout v6
git cherry-pick 677f43a
and then follow the instructions to resolve the resulting merge conflicts.

2023-04-07, 21:53   #38
moebius

Jul 2009
Germany

10101101012 Posts

Quote:
 Originally Posted by moebius Here it is in 7-zip format
Seems like gpuOwl 7.2-131 binary no longer works at colab again!

Code:
GPU 0: Tesla T4 (UUID: GPU-e8e7a262-0b4f-f390-28e3-7125f9506c61)

Linux Distribution:		Ubuntu 20.04.5 LTS
Linux Kernel:			5.10.147+
Processor (CPU):		Intel(R) Xeon(R) CPU @ 2.00GHz
Architecture:			x86_64 (64-bit)
Total memory (RAM):		12,985 MiB (13GiB) (13,616 MB (14GB))
Total swap space:		0 MiB (0 MB)
Disk space:			sda: 83,968 MiB (82GiB) (88,046 MB (89GB))
Computer name:			89978b6b72d7
Hostname:			89978b6b72d7
Computer ID:			e28958e8eafc4b0ebddcf68912dd30f9
Time zone:
Language:			en_US.UTF-8
Virtualization container:	docker
Virtual Machine (VM) hypervisor:kvm
Bash Version:			5.0.17(1)-release
bash: line 171: /dev/tty: No such device or address
bash: line 172: /dev/tty: No such device or address
bash: line 173: /dev/tty: No such device or address
Terminal:			xterm-color

20230407 21:41:50  GpuOwl VERSION v7.2-131-gca22dce-dirty
20230407 21:41:50  GpuOwl VERSION v7.2-131-gca22dce-dirty
20230407 21:41:50  config:  -maxAlloc 1000
20230407 21:41:50  config:  -proof 6
20230407 21:41:50  config:  -user geschwen
20230407 21:41:50  config:
20230407 21:41:50  config:
20230407 21:41:50  device 0, unique id ''
20230407 21:41:50  Exception gpu_error:  clGetPlatformIDs(16, platforms, (unsigned *) &nPlatforms) at clwrap.cpp:71 getDeviceIDs
20230407 21:41:50  Bye

Last fiddled with by moebius on 2023-04-07 at 21:54

2023-04-07, 22:30   #39
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

22·3·643 Posts

Quote:
 Originally Posted by moebius Code: 20230407 21:41:50 GpuOwl VERSION v7.2-131-gca22dce-dirty 20230407 21:41:50 GpuOwl VERSION v7.2-131-gca22dce-dirty 20230407 21:41:50 config: -maxAlloc 1000 20230407 21:41:50 config: -proof 6 20230407 21:41:50 config: -user geschwen 20230407 21:41:50 config: 20230407 21:41:50 config: 20230407 21:41:50 device 0, unique id '' 20230407 21:41:50 Exception gpu_error: clGetPlatformIDs(16, platforms, (unsigned *) &nPlatforms) at clwrap.cpp:71 getDeviceIDs 20230407 21:41:50 Bye
-maxAlloc 1000 seems rather skimpy. The Tesla T4 has 16 GB onboard ram, & typically 15GB free before Colab app launch. https://www.mersenneforum.org/showpo...5&postcount=15
From the v7.2-129 help output:
Code:
-maxAlloc <size>   : limit GPU memory usage to size, which is a value with suffix M for MB and G for GB.
e.g. -maxAlloc 2048M or -maxAlloc 3.5G
I think if both M and G are missing it will default to M, per this fragment of args.cpp:
Code:
    else if (key == "-maxAlloc" || key == "-maxalloc") {
assert(!s.empty());
u32 multiple = (s.back() == 'G') ? (1u << 30) : (1u << 20);
maxAlloc = size_t(stod(s) * multiple + .5);
}
Also, a T4 is more effective at TF (mfaktc or mmff) than PRP etc. due to low DP performance relative to SP (1/32).

Last fiddled with by kriesel on 2023-04-07 at 22:36

2023-04-07, 22:50   #40
moebius

Jul 2009
Germany

32·7·11 Posts

Quote:
 Originally Posted by kriesel -maxAlloc 1000 seems rather skimpy. The Tesla T4 has 16 GB onboard ram, & typically 15GB free before Colab app launch.
-max Alloc 1000 is sufficient for PRP
the problem is that colab seems broken again and I don't know why.

 2023-04-07, 23:05 #41 paulunderwood     Sep 2002 Database er0rr 7·23·29 Posts Perhaps there is no "device 0" and the box has a video card in another slot.
 2023-04-07, 23:17 #42 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 22·3·643 Posts If there is a GPU attached to the Colab Free VM, and the OpenCL driver is working, gpuowl should see it as device 0. Gpuowl is smart enough to deal with multiple opencl devices, multiple platforms (NVIDIA, AMD, Intel) on the same system, and enumerate them with consecutive integers beginning from zero, differently than OpenClplatform x, device(s) y on that platform x, or hardware slot number. For example, on a quite-mixed-hardware system in front of me, the following is included in gpuowl -h output, preceding the fft lengths table: Code: -device : select a specific device: 0 : NVIDIA GeForce GTX 1650 SUPER- not-AMD 1 : gfx803-Radeon RX550/550 Series AMD 2 : gfx906-AMD Radeon VII AMD 3 : gfx906-AMD Radeon VII AMD 4 : Intel(R) HD Graphics 4600- not-AMD 5 : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz- not-AMD This system is a little quirky; sometimes the NVIDIA leads the parade, as listed above, sometimes it follows the AMD devices, AND IT OCCASIONALLY CHANGES WHILE THE SYSTEM IS RUNNING. Last fiddled with by kriesel on 2023-04-07 at 23:23
2023-04-08, 00:23   #43
Xyzzy

Aug 2002

26×33×5 Posts

Quote:
 Originally Posted by kriesel AND IT OCCASIONALLY CHANGES WHILE THE SYSTEM IS RUNNING.

2023-04-08, 09:26   #44
tdulcet

"Teal Dulcet"
Jun 2018

107 Posts

Quote:
 Originally Posted by moebius the problem is that colab seems broken again and I don't know why.
I have had this problem as well for the last three days. Luckily this time it is not a GpuOwl issue, as the OpenCL driver on the Colab VM is just not working. This can be seen from the clinfo command, which just outputs:
Code:
Number of platforms                               0
It does not look like anyone has reported the issue to Google yet: https://github.com/googlecolab/colabtools/issues, but hopefully they will be able to fix this soon...

BTW, as everyone probably saw from @preda's post on Monday, I have been working with him to find a solution to the GpuOwl v6 issue with the latest Nvidia driver on Colab, so hopefully we will have a fix for that soon as well... ๐ค

 Similar Threads Thread Thread Starter Forum Replies Last Post mognuts Cloud Computing 44 2022-11-16 12:12 LaurV LaurV 7 2022-10-03 16:34 David703 GPU to 72 302 2022-07-01 03:41 mrk74 GPU to 72 21 2021-06-17 20:00 kriesel Cloud Computing 11 2020-01-14 18:45

All times are UTC. The time now is 22:42.

Fri Jun 2 22:42:38 UTC 2023 up 288 days, 20:11, 0 users, load averages: 1.07, 1.18, 1.05