mersenneforum.org New Google Colab Notebooks For Primality Testing
 Register FAQ Search Today's Posts Mark Forums Read

2021-02-23, 19:55   #23
danc2

Dec 2019

5×7 Posts
~1 Month Results

Quote:
 Note that with Colab Pro, first time primality tests on the GPU will only take around 2-3 days.
To add to Teal's point about how quickly results are returned with the Pro version, attached is a picture of the results from approximately one month and 8 days of testing using Colab Pro. I ran mostly 1 Colab Pro machine and later ran a second Colab Pro machine I purchased. I was using the Colab Extension (though had some issues, which slowed me down; so, think: "even more results are possible").

The results from Oracle* and pdxEmail, and Windows CPUs can be ignored as they are not from Colab, but everything else is. Please also note that I was 5 days without power and thus missed out on those 6 days of results. In tottal, the sum of total primality results up to the current date returned in this timeframe of Januuary 16 - February 23rd is 46. This is a significant number due to the fact that most of these are not small DC or CERTS.
Attached Thumbnails

2021-02-23, 21:20   #24
chalsall
If I May

"Chris Halsall"
Sep 2002

2×5,333 Posts

Quote:
 Originally Posted by danc2 This is a significant number due to the fact that most of these are not small DC or CERTS.
If I may please share, I'm really enjoying this experiment.

1. A reasonable amount of compute can be "harvested" from Colab.

2. There seem to be quite a few "dimensions" to the compute allotments.

3. While those running the GPU72 Notebook were shut-out, others were reporting 12 hours or so of GPU.

3.1. The Google Gods (which may simply be Humans directing machines) act in mysterious ways.

4. Recently, those running the GPU72 Notebook have been getting a bit of compute each day.

4.1. My thirteen (13#) instances (spread across five machines in three countries) have to be interacted with, but they always get at least CPU compute for at least 20 minutes.

To be honest, I've been as fascinated with watching the experimenters experiment with the Subjects as much as anything else.

(I'm reminded of Douglas Adams, and the Mice and the Dolphins (or was it the whales)).

2021-02-23, 22:26   #25
Uncwilly
6809 > 6502

"""""""""""""""""""
Aug 2003
101×103 Posts

247638 Posts

Quote:
 Originally Posted by chalsall To be honest, I've been as fascinated with watching the experimenters experiment with the Subjects as much as anything else. (I'm reminded of Douglas Adams, and the Mice and the Dolphins (or was it the whales)).
Not Milgram?

2021-02-23, 22:48   #26
chalsall
If I May

"Chris Halsall"
Sep 2002

1066610 Posts

Quote:
 Originally Posted by Uncwilly Not Milgram?
While seminal, in my opinion "lightweight".

That study didn't bring the profit driver function into the equation (although it might have identified psychopaths as interesting subjects).

2021-02-24, 15:54   #27
tdulcet

"Teal Dulcet"
Jun 2018

4716 Posts

Quote:
 Originally Posted by kriesel I'm always happy to see someone chip in and contribute to development.
No problem, we are happy to help.

Quote:
 Originally Posted by kriesel Google Colaboratory has resorted at times to requiring ostensibly human image analysis before authorizing a Colab session. Three by three arrays of little and sometimes unclear images, with a requirement to select each image that contains bicycles, or palm trees, or hills, or buses, etc. (One object category per challenge session.) Sometimes selected images are replaced with additional until no qualifying images remain; sometimes it's only the initial set of 9. And there have sometimes been child windows specifying it is for human interactive use, not bots, and requiring click confirmation that yes it's a human at the keyboard.
I have only seen this once and only with the free Colab. However, even if a notebook disconnects, our extension will just automatically reconnect it. I added a new optional feature to our extension which will automatically rotate through the users Colab tabs when there system is idle or locked (similar to a screen saver, but the screen does not need to be on). This should help prevent the notebooks from being perceived as inactive, particularly for users who are using a dedicated device such as a Raspberry Pi to run their notebooks.

Quote:
 Originally Posted by kriesel Gpuowl reportedly is faster than CUDALucas on the same gpu model and exponent task.
Yeah, I have seen a few posts that claim this, but I do not think anyone has tested yet with all five GPUs currently available on Colab and I am not sure exactly what procedure they followed to come to that conclusion. Our GPU notebook (which uses my CUDALucas install script) makes several changes to the Makefile before building CUDALucas, which likely affects the resulting performance, including enabling the -O3 optimization and correctly setting the --generate-code flag for every GPU available on Colab. We also did in advance the cufftbench and threadbench tuning for all five GPUs to the 32768K FFT length, which covers exponents up to 580,225,813. You can see the resulting *fft.txt and *threads.txt files in our repository here, which lists the ms/iter speeds at every FFT length.

2021-02-24, 17:50   #28
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

1ABA16 Posts

Quote:
 Originally Posted by tdulcet I have only seen this once and only with the free Colab.
In support of chalsall's statement that Google offers Colab for interactive use, not bot use, the image interpretation task used to occur at least daily on one of my several Colab free accounts; same account every time. It doesn't happen often now, but it still comes up.

re gpuowl faster than cudalucas:
Quote:
 Yeah, I have seen a few posts that claim this, but I do not think anyone has tested yet with all five GPUs currently available on Colab and I am not sure exactly what procedure they followed to come to that conclusion
I don't have the time now to respond thoroughly to that. But I did enough testing to decide that all my local gpus that could run gpuowl would completely transition from already-established CUDALucas. I had thoroughly tested and tuned for numerous gpu models from Quadro2000 to GTX1080Ti in CUDALucas before that. Here's a recent quick compare on GTX1080.

Compare LL on CUDALucas to PRP on gpuowl. Same exponent, same host, same gpu, same hour, same environmental and clocking conditions, a GTX1080 for this quick benchmark.

CUDALucas v2.06 May 5 2017 version compiled by flashjh; Windows 10 run environment
Code:
Starting M240110503 fft length = 13824K
|   Date     Time    |   Test Num     Iter        Residue        |    FFT   Error     ms/It     Time  |       ETA      Done   |
|  Feb 23  16:14:28  | M240110503     10000  0x5b6b7cbec1bdc015  | 13824K  0.08594  13.4883  134.88s  |  37:11:35:46   0.00%  |
|  Feb 23  16:16:43  | M240110503     20000  0xde34ff2ddb2080a4  | 13824K  0.08789  13.5358  135.35s  |  37:13:08:45   0.00%  |
|  Feb 23  16:18:58  | M240110503     30000  0x14e2c4cd92c29164  | 13824K  0.09180  13.5395  135.39s  |  37:13:43:10   0.01%  |
|  Feb 23  16:21:14  | M240110503     40000  0x5256dd82035447c4  | 13824K  0.08594  13.5488  135.48s  |  37:14:08:29   0.01%  |
|  Feb 23  16:23:29  | M240110503     50000  0xe89ddd5520561b21  | 13824K  0.08594  13.5361  135.36s  |  37:14:12:38   0.02%  |
average ms/it 13.5297
ETA 240110503 * .0135297 sec /3600/24 days/sec =~ 37.600 days

Gpuowl v6.11-380 excerpt mid-run of PRP/GEC/proof, 13M fft (1k:13:512):
Code:
2021-02-23 15:37:13 asr3/gtx1080 240110503 OK 131700000  54.85%; 11875 us/it; ETA 14d 21:36; a5f295da6eddc0a1 (check 5.17s)
2021-02-23 15:47:13 asr3/gtx1080 240110503 OK 131750000  54.87%; 11877 us/it; ETA 14d 21:30; f20a694bd0c842de (check 5.71s)
2021-02-23 15:57:12 asr3/gtx1080 240110503 OK 131800000  54.89%; 11883 us/it; ETA 14d 21:31; 7ddaab01bbd26fcd (check 5.20s)
2021-02-23 16:07:11 asr3/gtx1080 240110503 OK 131850000  54.91%; 11866 us/it; ETA 14d 20:50; 38b6acb7773f3896 (check 5.28s)
average ms/it 11.875
ETA start to finish 240110503 * .011875 sec /3600/24 days/sec =~ 33.001 days

Raw iteration speed ratio gpuowl PRP / CUDALucas LL = 37.6/33.001 =~ 1.1394

The fft length difference (13.5M CUDALucas vs 13M gpuowl) only accounts for ~4% out of the observed 14% difference favoring gpuowl (like getting 8 days per week!)

What's omitted above is the slightly more than 2:1 overall project speed advantage of PRP/GEC/proof vs. LL, LLDC, and typically 4% LLTC, that's lost by using CUDALucas. And the loss of error checking; not even the relatively weaker Jacobi symbol check in CUDALucas, unless you've added it in your builds. The higher the exponent, the longer the run, and the less likely a run will complete correctly without GEC.

In P-1, you could perhaps compare my CUDAPm1 fft and threads file timings and estimate P-1 run times. If you try running P-1 tests on Colab I'd be interested in learning how to resolve the zero-residue issue I ran into. https://www.mersenneforum.org/showpo...28&postcount=5

Gpuowl P-1 run time scaling for various gpus including 2 Colab models can be found here. Benchmarking on V100 has been a nonissue since I don't recall ever encountering one. Lately it's almost entirely T4s, more suitable for TF.
Attached Thumbnails

2021-02-24, 19:31   #29
danc2

Dec 2019

3510 Posts

Quote:
 [image interpretation task] doesn't happen often now, but it still comes up.
I would be curious if Teal has seen this when using the extension or not. The extension can check (clicks on the play button of the first cell) every 5 seconds IIRC (customizable by the user). With this setup, I've never seen the interpretation task.

GPUOwl stuff:
Yes, it would be great if we could use GPUOwl instead of CUDALucas as it sounds like there is more that can be done, as great as CUDALucas is.

Last fiddled with by danc2 on 2021-02-24 at 19:31

 2021-02-25, 12:46 #30 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 2×11×311 Posts GTX1060 gpuowl vs. CUDALucas ~58M LL DC Executive summary: Gpuowl 5.8 ms/iter with Jacobi check, CUDALuca 6.25-6.5 ms/iter (no Jacobi check) Gpuowl v6.11-380 on GTX1060 ~5.806 ms/iter in 58.75M LL DC with Jacobi check: Code: 2021-02-22 21:04:36 condor/gtx1060 58755607 FFT: 3M 1K:6:256 (18.68 bpw) 2021-02-22 21:04:36 condor/gtx1060 Expected maximum carry32: 50550000 2021-02-22 21:04:36 condor/gtx1060 OpenCL args "-DEXP=58755607u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=6u -DPM1=0 -DMM_CHAIN=1u -DMM2_CHAIN=2u -DMAX_ACCURAC Y=1 -DWEIGHT_STEP_MINUS_1=0x8.01304be8dc228p-5 -DIWEIGHT_STEP_MINUS_1=-0xc.ce52411c70cep-6 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2021-02-22 21:04:39 condor/gtx1060 2021-02-22 21:04:39 condor/gtx1060 OpenCL compilation in 2.52 s 2021-02-22 21:04:39 condor/gtx1060 58755607 LL 0 loaded: 0000000000000004 2021-02-22 21:06:00 condor/gtx1060 102714151 P2 GCD: no factor 2021-02-22 21:06:00 condor/gtx1060 {"status":"NF", "exponent":"102714151", "worktype":"PM1", "B1":"1000000", "B2":"30000000", "fft-length":"5767168", "program": {"name":"gpuowl", "version":"v6.11-380-g79ea0cc"}, "user":"kriesel", "computer":"condor/gtx1060", "aid":"7DAA6CA7DFF308D0DF638276AF9B5028", "timestamp":"2021-02 -23 03:06:00 UTC"} 2021-02-22 21:14:20 condor/gtx1060 58755607 LL 100000 0.17%; 5807 us/it; ETA 3d 22:37; 39c251c47f602a3d 2021-02-22 21:24:01 condor/gtx1060 58755607 LL 200000 0.34%; 5807 us/it; ETA 3d 22:28; eb46c0fb8d0e94f8 2021-02-22 21:33:41 condor/gtx1060 58755607 LL 300000 0.51%; 5807 us/it; ETA 3d 22:18; ed993c4bb040ddef 2021-02-22 21:43:22 condor/gtx1060 58755607 LL 400000 0.68%; 5807 us/it; ETA 3d 22:07; 54e2c2904288419d 2021-02-22 21:53:03 condor/gtx1060 58755607 LL 500000 0.85%; 5808 us/it; ETA 3d 21:59; 16657e0fba393f7f 2021-02-22 22:02:43 condor/gtx1060 58755607 LL 600000 1.02%; 5808 us/it; ETA 3d 21:49; 7ca0fe4b4db9c724 2021-02-22 22:02:43 condor/gtx1060 58755607 OK 500000 (jacobi == -1) 2021-02-22 22:12:24 condor/gtx1060 58755607 LL 700000 1.19%; 5808 us/it; ETA 3d 21:40; 22aa1cb83c55294c ... 2021-02-25 05:41:26 condor/gtx1060 58755607 LL 35100000 59.74%; 5805 us/it; ETA 1d 14:09; 7810938d88993295 2021-02-25 05:41:26 condor/gtx1060 58755607 OK 35000000 (jacobi == -1) 2021-02-25 05:51:06 condor/gtx1060 58755607 LL 35200000 59.91%; 5804 us/it; ETA 1d 13:59; 5d55d69ab7ca60a9 2021-02-25 06:00:46 condor/gtx1060 58755607 LL 35300000 60.08%; 5804 us/it; ETA 1d 13:49; 5635fb50dc776ab9 2021-02-25 06:10:27 condor/gtx1060 58755607 LL 35400000 60.25%; 5804 us/it; ETA 1d 13:39; 2ef462f9a00916b2 2021-02-25 06:14:25 condor/gtx1060 Stopping, please wait.. 2021-02-25 06:14:25 condor/gtx1060 58755607 LL 35441000 60.32%; 5813 us/it; ETA 1d 13:39; bdb95405e8027916 2021-02-25 06:14:25 condor/gtx1060 waiting for the Jacobi check to finish.. 2021-02-25 06:15:12 condor/gtx1060 58755607 OK 35441000 (jacobi == -1) Cudalucas v2.06 May 5 2017, same everything else, nominally 6.248 ms/iter, but actually higher because of oscillation between 3136K and 3200K fft length; 10:51 / 100k iterations = 6.51 msec/iter, 12% longer than Gpuowl, and no Jacobi check: Code: Using threads: square 512, splice 128. Starting M58755607 fft length = 3200K | Date Time | Test Num Iter Residue | FFT Error ms/It Time | ETA Done | | Feb 25 06:20:15 | M58755607 50000 0x6b790995614a3aa2 | 3200K 0.19189 6.2483 312.41s | 4:05:53:31 0.08% | Resettng fft. Using threads: square 512, splice 32. Continuing M58755607 @ iteration 50001 with fft length 3136K, 0.09% done Round off error at iteration = 51500, err = 0.35938 > 0.35, fft = 3136K. Restarting from last checkpoint to see if the error is repeatable. Using threads: square 512, splice 32. Continuing M58755607 @ iteration 50001 with fft length 3136K, 0.09% done Round off error at iteration = 51500, err = 0.35938 > 0.35, fft = 3136K. The error persists. Trying a larger fft until the next checkpoint. Using threads: square 512, splice 128. Continuing M58755607 @ iteration 50001 with fft length 3200K, 0.09% done | Feb 25 06:25:45 | M58755607 100000 0x39c251c47f602a3d | 3200K 0.18750 6.2484 312.41s | 4:05:48:26 0.17% | Resettng fft. Using threads: square 512, splice 32. Continuing M58755607 @ iteration 100001 with fft length 3136K, 0.17% done Round off error at iteration = 100700, err = 0.35156 > 0.35, fft = 3136K. Restarting from last checkpoint to see if the error is repeatable. Using threads: square 512, splice 32. Continuing M58755607 @ iteration 100001 with fft length 3136K, 0.17% done Round off error at iteration = 100700, err = 0.35156 > 0.35, fft = 3136K. The error persists. Trying a larger fft until the next checkpoint. Using threads: square 512, splice 128. Continuing M58755607 @ iteration 100001 with fft length 3200K, 0.17% done | Feb 25 06:31:06 | M58755607 150000 0x71a49982b1d8c05d | 3200K 0.17969 6.2493 312.46s | 4:05:44:06 0.25% | Resettng fft. Using threads: square 512, splice 32. Continuing M58755607 @ iteration 150001 with fft length 3136K, 0.26% done Round off error at iteration = 158700, err = 0.375 > 0.35, fft = 3136K. Restarting from last checkpoint to see if the error is repeatable. Using threads: square 512, splice 32. Continuing M58755607 @ iteration 150001 with fft length 3136K, 0.26% done Round off error at iteration = 158700, err = 0.375 > 0.35, fft = 3136K. The error persists. Trying a larger fft until the next checkpoint. CUDALucas was a great program. Had a lot of fun with it. It has been surpassed and is not being actively maintained. Last fiddled with by kriesel on 2021-02-25 at 12:47
2021-02-25, 14:44   #31
tdulcet

"Teal Dulcet"
Jun 2018

71 Posts

Quote:
 Originally Posted by kriesel In support of chalsall's statement that Google offers Colab for interactive use, not bot use, the image interpretation task used to occur at least daily on one of my several Colab free accounts; same account every time. It doesn't happen often now, but it still comes up.
Our extension is not designed to act like a bot and I would actually consider that an abuse of it. It is only to assist users with the otherwise tedious task of checking if their notebooks will connect/reconnect, to help them maximize their runtime. It is also not designed to be used noninteractivity. By default, it will display a desktop notification whenever a notebook connects, reconnects or disconnects due to usage limits. Clicking these notifications opens the tab/window with the notebook so the user can easily monitor the progress and after it connects, they can check which GPU/CPU they got. Even with our extension installed, I still manually check my Colab tabs at least hourly to monitor the progress and check our notebooks for errors, as often as I would without the extension.

Note that there are existing add-ons that claim to be able to automatically solve these reCAPTCHAs (I have never tried any of them), such as Buster: Captcha Solver for Humans, which could potentially be used if this ever becomes problematic in Colab.

Quote:
 Originally Posted by kriesel But I did enough testing to decide that all my local gpus that could run gpuowl would completely transition from already-established CUDALucas. I had thoroughly tested and tuned for numerous gpu models from Quadro2000 to GTX1080Ti in CUDALucas before that.
OK, I have no doubt that GpuOwl is faster on some Nvidia GPUs than CUDALucas and your results show that for your GTX 1080 and GTX 1060 GPUs. However, I was specifically referring the Tesla V100, P100, K80, T4 and P4 GPUs available on Colab and using my install script to build CUDALucas. I do not think anyone has tested yet with all of those.

For a wavefront first time primality test (with an exponent up to 115,080,019), here are the ms/iter speeds with CUDALucas on Colab using our GPU notebook (all 6272K FFT length):
• Tesla V100: 1.14 ms/iter
• Tesla P100: 1.74 ms/iter
• Tesla K80: 6.66 - 7.36 ms/iter
• Tesla T4: 7.95 - 8.48 ms/iter
• Tesla P4: 10.24 ms/iter
We would be interested if someone had these ms/iter speeds with GpuOwl on Colab.

Quote:
 Originally Posted by kriesel And the loss of error checking; not even the relatively weaker Jacobi symbol check in CUDALucas, unless you've added it in your builds. The higher the exponent, the longer the run, and the less likely a run will complete correctly without GEC.
All the Tesla GPUs on Colab have ECC memory enabled, so Jacobi and Gerbicz error checking is not needed. You can see this from the ECC Support? line near the top of the CUDALucas output. Adding Jacobi error checking to CUDALucas is listed in the Contributing section of the main README, but it would have no effect on Colab.

Quote:
 Originally Posted by kriesel Gpuowl v6.11-380 on GTX1060 ~5.806 ms/iter in 58.75M LL DC with Jacobi check
Note that the latest version of GpuOwl is v7.2, although it no longer supports any LL tests or the Jacobi error check. This would add a lot of complexity to our GPU notebook, if it were to support GpuOwl, as it would have to download and build both v6 and v7 to support both LL DC and PRP tests respectively and then someone would have to write a wrapper to run the correct version based on the next assignment in the worktodo file.

Quote:
 Originally Posted by kriesel CUDALucas was a great program. Had a lot of fun with it. It has been surpassed and is not being actively maintained.
As Daniel said in post #17, pull requests are welcome!

Quote:
 Originally Posted by danc2 I would be curious if Teal has seen this when using the extension or not. The extension can check (clicks on the play button of the first cell) every 5 seconds IIRC (customizable by the user). With this setup, I've never seen the interpretation task.
Yeah, I am not sure if the reason I have only seen this once is because of our extension. It dismisses all other popups, so it is possible that our extension just dismisses this popup, which would explain why Daniel and I never see it. I would need to see it again to know for sure, so that I can inspect it.

When our extension is set to automatically run the first cell of the notebook (disabled by default), it will check if the cell is running every minute by default. This is configurable, but I would not recommend that users use a value less than one minute to prevent Google from thinking they/we are DoSing their servers.

2021-02-25, 16:59   #32
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

29·277 Posts

Quote:
 Originally Posted by tdulcet All the Tesla GPUs on Colab have ECC memory enabled, so Jacobi and Gerbicz error checking is not needed.
There are other sources of hardware error than memory. Thus, Gerbicz error checking is still beneficial.

Quote:
 GpuOwl is v7.2, no longer supports any LL tests. This would add a lot of complexity to our GPU notebook, if it were to support GpuOwl, as it would have to download and build both v6 and v7 to support both LL DC and PRP tests respectively and then someone would have to write a wrapper to run the correct version based on the next assignment in the worktodo file.
The PrimeNet server will happily accept a PRP test with proof for LL-DC work. So, you only need to download one gpuowl version.
Another gpuowl advantage is it will run P-1 if necessary, potentially saving a lengthy PRP test altogether.

Also, in prime95 you can cut the amount of disk space required in half. I'll bet gpuowl has a similar option.

Last fiddled with by Prime95 on 2021-02-25 at 17:01

2021-02-25, 19:04   #33
PhilF

"6800 descendent"
Feb 2005

52×29 Posts

Quote:
 Originally Posted by Prime95 The PrimeNet server will happily accept a PRP test with proof for LL-DC work.
I didn't know that! So, would one just manually reserve a LL-DC exponent, PRP test it, and then manually submit the result?

 Similar Threads Thread Thread Starter Forum Replies Last Post Corbeau Cloud Computing 1225 2022-07-31 13:51 Viliam Furik Math 3 2020-08-18 01:51 kriesel Cloud Computing 11 2020-01-14 18:45 chalsall Cloud Computing 3 2019-10-13 20:03 jasong Math 1 2007-11-06 21:46

All times are UTC. The time now is 11:53.

Fri Sep 30 11:53:03 UTC 2022 up 43 days, 9:21, 0 users, load averages: 0.65, 0.92, 1.05