mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > Cloud Computing

Reply
 
Thread Tools
Old 2021-04-12, 12:33   #78
tdulcet
 
tdulcet's Avatar
 
"Teal Dulcet"
Jun 2018

71 Posts
Default

Quote:
Originally Posted by LaurV View Post
Wow! it works! You (two) are my heroes for this weekend!
Great! We are glad it works for you.

Quote:
Originally Posted by LaurV View Post
Albeit a little bit too complicate, first it didn't work, as I had the "CPU and GPU" output (sure! I want to see what BOTH of them are doing!), then I looked in the code and seen that you use the "-k" switch only when the output is "GPU Only"
Yes, sorry, I guess I should have mentioned that. I did not realize anyone was using the "GPU and CPU" output type, as it is very verbose. I added it shortly before we officially announced the notebooks, as I saw it was requested a few times on the main Colab thread and it was easy to implement. When using that option, both CUDALucas and MPrime are run in the background, while the tail -f command is run the foreground, so there is no easy way to pass input to CUDALucas.

I updated our PrimeNet script on Saturday to support still getting first time LL tests using the method described by @Prime95 above, so that users can still use CUDALucas while we work on upgrading our GPU notebook to use GpuOwl. (@LaurV - You will no longer have to do this manually. ) Anyone who wants to continue doing first time LL tests on the GPU would need to reset up their GPU notebooks after they finish any current assignments. I also included many of the changes needed for our PrimeNet script to support GpuOwl, including adding support for reporting LL/PRP and P-1 results. Going forward we decided we are going to recommend users do PRP tests, which will be the default, although we will still provide the option of doing LL tests on the GPU for users with very limited Drive space, as explained above. Prime95/MPrime of course has its PrimeNet functionality builtin, so unfortunately there is not much we can do about the CPU for users with limited Drive space. Those users will need to do LL DC tests on the CPU, although as George said, there is "a chance that a new Mersenne prime is hidden in all those double-checks".
tdulcet is offline   Reply With Quote
Old 2021-05-07, 21:36   #79
moebius
 
moebius's Avatar
 
Jul 2009
Germany

54 Posts
Lightbulb

Quote:
Originally Posted by danc2 View Post
I realize we did not post any output or pictures, just links.

Since we have this dedicated thread, here is example output from a GPU notebook running the Tesla V100-SMX2-16GB (a $6,195.00 GPU according to Amazon).
LL-test runs much slower than with gpuowl -LL, the same Exponent and the Tesla V100 gpu
moebius is offline   Reply With Quote
Old 2021-07-07, 11:04   #80
mognuts
 
mognuts's Avatar
 
Sep 2008
Bromley, England

32×5 Posts
Default Colab now using AMD CPUs

This is the first time I've ever had an AMD!!

Quote:
Previous CPU counts
15 Intel(R) Xeon(R) CPU @ 2.30GHz 63
9 Intel(R) Xeon(R) CPU @ 2.00GHz 85
8 Intel(R) Xeon(R) CPU @ 2.20GHz 79
1 AMD EPYC 7B12 49
mognuts is offline   Reply With Quote
Old 2021-07-07, 18:34   #81
danc2
 
Dec 2019

3510 Posts
Default

@mognuts
Yeah, I was pretty surprised when I first saw that on my machines also!

Quote:
Previous CPU counts
111 Intel(R) Xeon(R) CPU @ 2.30GHz 63
97 Intel(R) Xeon(R) CPU @ 2.20GHz 79
29 Intel(R) Xeon(R) CPU @ 2.00GHz 85
15 AMD EPYC 7B12 49
danc2 is offline   Reply With Quote
Old 2021-07-07, 20:05   #82
PhilF
 
PhilF's Avatar
 
"6800 descendent"
Feb 2005
Colorado

2D516 Posts
Default

Quote:
Originally Posted by mognuts View Post
This is the first time I've ever had an AMD!!
I was told if you snag one of those to throw it back because the performance is lower than the others. But that was a while back, that advice might have been referring to a different AMD model.
PhilF is offline   Reply With Quote
Old 2021-07-07, 21:02   #83
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

29B816 Posts
Default

Quote:
Originally Posted by PhilF View Post
I was told if you snag one of those to throw it back because the performance is lower than the others. But that was a while back, that advice might have been referring to a different AMD model.
Busy, but quickly...

The AMD CPUs have been given out for quite a while now. And, at least for P-1'ing, they're faster than all the Intel instances (~20% or so).
chalsall is online now   Reply With Quote
Old 2021-07-07, 21:50   #84
Flaukrotist
 
Sep 2020
Germany

22·11 Posts
Default

Quote:
Originally Posted by chalsall View Post
And, at least for P-1'ing, they're faster than all the Intel instances (~20% or so).
I cannot confirm that. Using Prime95 v30.4 and exponents in range 104M with bounds determined by Prime95, I get the following ranking for the time needed for P-1 stage 1 + 2 in total:

Code:
Model 63, Intel(R) Xeon(R) CPU @ 2.30GHz: 36.09 h 
Model 79, Intel(R) Xeon(R) CPU @ 2.20GHz: 31.58 h
Model 49, AMD EPYC 7B12:                  31.36 h
Model 85, Intel(R) Xeon(R) CPU @ 2.00GHz: 25.27 h
So, the Intel Model 85 is clearly fastest.
Flaukrotist is offline   Reply With Quote
Old 2021-07-07, 22:18   #85
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

23×3×5×89 Posts
Default

Quote:
Originally Posted by Flaukrotist View Post
I cannot confirm that. ...snip... So, the Intel Model 85 is clearly fastest.
I could very well be wrong. My observations were subjective. Would be worth collecting hard data on this.
chalsall is online now   Reply With Quote
Old 2021-07-08, 22:06   #86
slandrum
 
Jan 2021
California

22·5·23 Posts
Default

There are 3 versions of the Intel chipset on Colab (that I've received on free accounts). The 2.30 GHz model 63 is the worst, followed by the 2.20 GHz model 79, and the 2.00 GHz model 85 with AVX512 is by far the best. The AMD chipset's times overlap with the times I get with the 2.00GHz Intel - the worst times for the 2.00Ghz model 85 Intel are slightly worse that the worst times with the AMD, but the best times with the 2.00 Ghz model 85 Intel are much better than the best times with the AMD. This is for running tests with mprime (LL, PRP, PM1, CERT).

For around 110M PRP, iteration times on 2.30 and 2.20 GHz Intels are around 40ms ranging from the mid 30 to mid 40 - timings on the two overlap but the 2.30 GHz model 63 averages the worst. For the 2.00 GHz model 85 Intel I've seen from 21ms to 32ms. For the AMD I see 26 to 31ms. The iterations times can vary through 6-12 hour session, sometimes by a lot, but most instances seem to stay pretty close to the same ms/iteration throughout the session. The average times on the model 85 are better than the average times on the AMD model 49.

There are far more 2.30 and 2.20 GHz Intels available to me at any given time than either the 2.00 GHz Intel or the AMD.

Last fiddled with by slandrum on 2021-07-08 at 22:34
slandrum is online now   Reply With Quote
Old 2022-05-25, 13:54   #87
tdulcet
 
tdulcet's Avatar
 
"Teal Dulcet"
Jun 2018

71 Posts
Default

Quote:
Originally Posted by tdulcet View Post
Quote:
Originally Posted by LaurV View Post
Here attached there is a digest of the FFT sizes, with times per iteration, for all the five cards that colab offers.
Thanks, your spreadsheet does make it easier to compare the ms/iter times. It looks like you created it from the *fft.txt and *threads.txt files in our repository.
I regenerated these *fft.txt and *threads.txt files for FFT lengths 1K to 32768K using twice as many iterations for better accuracy and also added them for the A100 GPU. Anyone using our GPU notebook should consider upgrading, as you would likely get a performance improvement.

Since my last update over a year ago, here are the notable changes I have made to the notebooks:
  • June 3, 2021
    • Updated CPU notebook to default to the 150 worktype (first time PRP tests).
    • Added a warning when Google Drive is not mounted.
  • August 1, 2021 - Updated MPrime install script to use 80% of available memory for stage 2. The notebooks will thus use up to around 10.1 GiB of RAM instead of just 6 GiB.
  • August 31, 2021
    • Added support for computer_numbers greater than 9.
    • Updated notebooks to configure MPrime to not preallocate disk space for the proof interim residues files to reduce Google Drive storage use.
  • November 1, 2021 - Updated the CPU on the GPU notebook to default to the 150 worktype.
  • December 1, 2021 - Updated MPrime install script to use the latest Prime95/MPrime v30.7.
  • January 2 - Added support for the 154 (first time PRP tests that need P-1 factoring) and 155 (double-check tests using PRP with proof) worktypes on the CPU.
  • May 5 - Updated GPU notebook to also compile CUDALucas for the A100 GPU.
  • Today - Regenerated the optimization files using twice as many iterations and also added them for the A100 GPU.
For the improvements to our PrimeNet script, please see here and the below post in the dedicated thread. Feedback is welcome!

I know it has been over a year now, but we are still patiently waiting for Colab to upgrade to Ubuntu 20.04 (probably now 22.04) so we can finally switch to GpuOwl...

Quote:
Originally Posted by tdulcet View Post
We of course still need to test with the other Tesla GPUs available on Colab and with the latest version of GpuOwl.
Quote:
Originally Posted by moebius View Post
LL-test runs much slower than with gpuowl -LL, the same Exponent and the Tesla V100 gpu
We had some Google Cloud credits that were expiring, so I was able to confirm that GpuOwl is indeed faster than CUDALucas on all six GPUs available on Colab. However, I also found that the GpuOwl performance on these GPUs has slowly degraded up to 15% across all FFT lengths over the last few years. While the v6 branch is faster than the master branch, it is not the fastest version. For example, for a wavefront first time exponent on the Tesla V100 GPU, the master branch is 654 us/iter, the v6 branch is 641 us/iter and the fastest version is 599 us/iter. See the issue I created on the GpuOwl repository for more information and several graphs. Hopefully someone will be able to fix these performance regressions before we are able to switch...

Quote:
Originally Posted by LaurV View Post
The issue will remain with gpuOwl. Moreover, gpuOwl doesn't provide a way to switch to another FFT size on the fly.
I tested all FFT lengths from 1M to 32M in GpuOwl for both the master and v6 branches on all six GPUs and the smallest FFT length selected by default seemed to always be the fastest, so that should not be an issue. However, for the FFT lengths that support multiple variants, the variant selected default is not always the fastest/optimal one, which is obviously another issue. I would be happy to share this data if anyone is interested.
tdulcet is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Google Diet Colab Notebook Corbeau Cloud Computing 1225 2022-07-31 13:51
Primality testing of numbers k*b^n+c Viliam Furik Math 3 2020-08-18 01:51
Alternatives to Google Colab kriesel Cloud Computing 11 2020-01-14 18:45
Google Notebooks -- Free GPUs!!! -- Deployment discussions... chalsall Cloud Computing 3 2019-10-13 20:03
a new primality testing method jasong Math 1 2007-11-06 21:46

All times are UTC. The time now is 16:51.


Thu Oct 6 16:51:19 UTC 2022 up 49 days, 14:19, 0 users, load averages: 0.85, 1.23, 1.24

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔