mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > Cloud Computing

Reply
 
Thread Tools
Old 2021-03-22, 15:08   #67
tdulcet
 
tdulcet's Avatar
 
"Teal Dulcet"
Jun 2018

31 Posts
Default

Quote:
Originally Posted by LaurV View Post
cudaLucas taking one full CPU core and reducing the CPU performance to half
The Colab VMs have one CPU core with two CPU threads and CUDALucas only uses one of those threads. As an experiment, you can always try commenting out one of the lines in our GPU notebook that starts CUDALucas and seeing if the performance improves. If you have the output_type set to "CPU (Prime95)", then temporally comment out this line and rerun the cell.

Quote:
Originally Posted by LaurV View Post
that is what I experience, regardless of what you, being in US and using Pro account, say
We have tested with both the free Colab and Colab Pro. Colab Pro seems to make no difference on the CPUs assigned, as I get the AVX-512 CPUs as often either way.

Quote:
Originally Posted by LaurV View Post
The part with using both notebooks in the same time does not apply to me, I can't do that unless I use multiple accounts, and that is what I was referring to when I said "headache".
Users can run both notebooks with the free Colab and a single account. I am actually running both notebooks right now with the free Colab. Colab Pro just allows users to consistently run more than one copy of both the notebooks, usually up to four copies of each.

Quote:
Originally Posted by LaurV View Post
Also, "a gpu will complete whatever work in two weeks" if you get it. If you get two hours today, two after 3 days, then that work will never complete, and (due to separate CPU pools of assignments) bottleneck the CPU work. That is what I was referring as "combine them together" in one of my first posts in this thread, and if I get a GPU, do that, if not, do this. But use a common pool. Then the GPU work comes as additional, the 101st mile, not as a showstopper.
Oh, OK, I understand what you are requesting now. We will consider it for the next version of our notebooks. It would be trivial to implement, but would likely be confusing for users, as they would not be able to run both the notebooks at the same time without selecting different computer_number values for each. BTW, I am not sure if you saw, but I did implement your last requested change to support all the worktypes for the CPU that MPrime currently supports.

Quote:
Originally Posted by LaurV View Post
Anyhow, thanks a lot for the notebooks, and for the answers. Good job.
Thanks for the feedback! No problem, happy to clear up any confusion.
tdulcet is offline   Reply With Quote
Old 2021-03-22, 15:16   #68
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

223458 Posts
Default

Quote:
Originally Posted by tdulcet View Post
BTW, I am not sure if you saw, but I did implement your last requested change to support all the worktypes for the CPU that MPrime currently supports.
Saw, saw... Big thumb up!

Already reported different types of work completed (including a GPU LLDC done in the 60M). The things are not so bad as I describe them, but if I paint them as minor things, you will never care. Now, if I paint them in black, I will make you angry, and you will try to prove me wrong...
But we like the toys, otherwise we would just ignore them and not use them. We also learned a couple of things or two from them.

So,
LaurV is offline   Reply With Quote
Old 2021-03-22, 21:11   #69
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

100101011001102 Posts
Default

Quote:
Originally Posted by danc2 View Post
I personally gain no benefit from advocating a free extension. Users are busy, including myself, and I was hoping to avoid reading 6+ pages of forum data, in which non-related topics are discussed.
To put on the table...

I, like most, am very busy. But I have to read hundreds of pages of language (some human, some deterministic) every single day.

One /possible/ motivation of you promoting your Free Extension which many of us have argued is again the "spirit" of the Colab Terms of Service is it /might/ assist in getting your Notebook to find the next MP by someone who is both using your Notebook and your extension.

I could, of course, be entirely incorrect in that assessment. I'm simply posting based on my own position, and what I observe.

Personally, I tend to error on the side of caution in situtations like this.

Last fiddled with by chalsall on 2021-03-22 at 21:12 Reason: s/MB/MP/; # Typing too quickly...
chalsall is offline   Reply With Quote
Old 2021-04-05, 07:12   #70
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

5·1,889 Posts
Default

Hey Teal, Dan,

How do I pass a key from my keyboard to your colab cudaLucas copy? Beside of taking the pliers, pull the key out of the keyboard and throw it hard around the globe to reach google hq's.

Why do I bother? Well...

Here attached there is a digest of the FFT sizes, with times per iteration, for all the five cards that colab offers. The cards are so different, and the optimum FFT sizes for them are different too. If you start a LL test with some card, but later you got offered another card, you may lose up to 50% for the speed, because the FFT chosen by the first card is not the optimum value for the second one, and there is no (easy) way to change it.

For example (see excel file inside the zip), your K80 just finished a test, and starts the next one, which by any chance is an exponent in, say, 112M, the K80 will start doing this with a FFT=6144, as that is the best choice for a K80 for this exponent size, with about 7.25 ms/iter (line 61 in the excel file). Then your time expires, and next time you are extremely lucky to get a P100, the P100 will continue the test with FFT=6144, which is a terrible unlucky choice of a size for it, getting about 2.1 ms/iter, when a larger FFT could be used: FFT=6272 with 1.7ms/iter. If you continue the test with the P100, then you got a huge penalty.

This happens the other way around too. If you start a 65M test with P100, it will chose the size 3584, but after few minutes you are out, and next time you get a K80, you will continue with this size, at about 4.2 ms/iter, when a smaller FFT could be used for this card, for only 3.8 ms/iter.

Another example, say you pay your money to Gugu, and get only good cards, and you decide to do a current 100M-digits assignment. Then you get a P100 which will chose the FFT=19683 (line 111 in the table), the smallest and fastest it can use for a 332M exponent, for which it spends about 5.8 ms/iter. Next time you get a V100, which will continue testing at this size, getting about 4.6 ms/iter for the next 20 days (line 286 in the "Threads" table in the excel file, second sheet), when you could use a larger FFT=20736 at 3.73 ms/iter and finish your job in 16 days instead of 20. On the reverse case, I can find a much worse example, but you got the idea.

Now, cudaLucas is very clever, when it runs locally with the "-k" command line switch, we can use the keyboard to increase/decrease FFT size (and other parameters, like how often the screen output, how often to save checkpoints, etc), and we always can chose the best FFT on the fly! by pressing few keys (uppercase F, lowercase f, ). In fact, in the past, before gpuOwl era, I was using it intensively like that, always trying to push the FFT as low as possible, to get the fastest times, and back off when the rounding got in the dangerous area. Most of the tests can be run with lower/faster FFT, if you know what you are doing, the limits are "for safety", and to cover strange cases, but in real life, strange cases are few.

So.

Can you implement a similar feature, for example, I can write some text file directly in the drive's folders from which cudaLucas will read (as it can't read my keyboard) periodically, and adjust its parameters? Or offer a way to pass the text I type to it (yes, I can click in the window and type some commands in the square box that appears, but I 'ave no idea where those commands go, if that's actually possible, please enlighten me/us).
Attached Files
File Type: zip colab FFT.zip (38.1 KB, 15 views)

Last fiddled with by LaurV on 2021-04-05 at 07:19
LaurV is offline   Reply With Quote
Old 2021-04-05, 07:26   #71
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

3·11·227 Posts
Default

Warning: Server will soon refuse to give first time LL tests. I haven't thought through all the details, most likely a double-check will be assigned.

The server will still accept first-time LL results.
Prime95 is online now   Reply With Quote
Old 2021-04-05, 13:57   #72
tdulcet
 
tdulcet's Avatar
 
"Teal Dulcet"
Jun 2018

31 Posts
Default

Quote:
Originally Posted by LaurV View Post
Here attached there is a digest of the FFT sizes, with times per iteration, for all the five cards that colab offers.
Thanks, your spreadsheet does make it easier to compare the ms/iter times. It looks like you created it from the *fft.txt and *threads.txt files in our repository.

Quote:
Originally Posted by LaurV View Post
The cards are so different, and the optimum FFT sizes for them are different too. If you start a LL test with some card, but later you got offered another card, you may lose up to 50% for the speed, because the FFT chosen by the first card is not the optimum value for the second one, and there is no (easy) way to change it.
Yeah, this is a bug with CUDALucas. It does not redetermine the fastest FFT length when the GPU changes. This is actually the only bug we are aware of that is effecting our notebooks. I was going to try to find a solution, but that was around the time Daniel officially announced the notebooks in this thread and people said that GpuOwl was potentially faster. I decided my limited time was better spent working on updating our GPU notebook to use GpuOwl.

Your examples provide another good reason to switch to GpuOwl. We did not initially notice this issue with CUDALucas, since when doing wavefront first time primality tests on Colab Pro, both the P100 and V100 GPUs happen to be optimal at the 6272K FFT length.

Quote:
Originally Posted by LaurV View Post
Can you implement a similar feature, for example, I can write some text file directly in the drive's folders from which cudaLucas will read (as it can't read my keyboard) periodically, and adjust its parameters? Or offer a way to pass the text I type to it (yes, I can click in the window and type some commands in the square box that appears, but I 'ave no idea where those commands go, if that's actually possible, please enlighten me/us).
Yes, you can trivially update the GPU notebook to pass the -k flag to CUDALucas and then type any keys into that box and press enter. We will include this change with the next version of our notebooks, as it is a good workaround for the issue with CUDALucas for advanced users. Thanks for the feedback!

Quote:
Originally Posted by Prime95 View Post
Warning: Server will soon refuse to give first time LL tests. I haven't thought through all the details, most likely a double-check will be assigned.
Thanks for the warning. This would be extremely unfortunate, especially for Colab Pro users...

Quote:
Originally Posted by Prime95 View Post
The server will still accept first-time LL results.
I am assuming you are referring to already assigned first time LL tests or do you mean our PrimeNet script could rewrite new first time PRP assignments into LL tests and the server would still accept the results? We completely understand that this is not what you want users to do, but as I explained, unfortunately many Colab Pro users and people doing 100 million digit tests do not have much other choice. Our only other option would be to allow users to set the proof power as you suggested. However, that would obviously be very unfair to whoever has to do the proof certifications since these users would need to use proof powers of 5 or 6, which is why our notebooks currently do not support it.
tdulcet is offline   Reply With Quote
Old 2021-04-05, 16:43   #73
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

3·11·227 Posts
Default

Quote:
Originally Posted by tdulcet View Post
or do you mean our PrimeNet script could rewrite new first time PRP assignments into LL tests and the server would still accept the results? We completely understand that this is not what you want users to do,
...
Our only other option would be to allow users to set the proof power as you suggested. However, that would obviously be very unfair to whoever has to do the proof certifications since these users would need to use proof powers of 5 or 6, which is why our notebooks currently do not support it.
You read between the lines well. The server cannot prevent someone from taking a PRP assignment and turning it into an LL test.

I'd prefer you do double-checks instead -- first time LL requests will get turned into LL double-check assignments.

Proof power 5 or 6 is still an excellent option for the disk-constrained. A certification at 1/32 or 1/64th the cost of a first time test is still a huge savings.
Prime95 is online now   Reply With Quote
Old 2021-04-06, 04:01   #74
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

5×1,889 Posts
Default

Quote:
Originally Posted by tdulcet View Post
Thanks, your spreadsheet does make it easier to compare the ms/iter times. It looks like you created it from the *fft.txt and *threads.txt files in our repository.
Sure, it was only parsing the text in the files you provided, there was no re-run of the -cufftbench on my side. What for? I trust your run
Quote:
Yeah, this is a bug with CUDALucas.
Nope. No bug. It works as intended, it should keep the FFT as I tell it to use, and don't change it on the fly, unless mandatory (keyboard command, rounding error, etc). Where did you get the idea that I am complaining about a bug in cudaLucas?

Quote:
Your examples provide another good reason to switch to GpuOwl. We did not initially notice this issue with CUDALucas, since when doing wavefront first time primality tests on Colab Pro, both the P100 and V100 GPUs happen to be optimal at the 6272K FFT length.
The issue will remain with gpuOwl. Moreover, gpuOwl doesn't provide a way to switch to another FFT size on the fly.

Quote:
Yes, you can trivially update the GPU notebook to pass the -k flag to CUDALucas and then type any keys into that box and press enter. We will include this change with the next version of our notebooks, as it is a good workaround for the issue with CUDALucas for advanced users. Thanks for the feedback!
Thanks! Waiting for it. I don''t know how to do that by myself, my skill there is null.

Quote:
Originally Posted by Prime95 View Post
A certification at 1/32 or 1/64th the cost of a first time test is still a huge savings.
If it is credited accordingly, per time spent. Otherwise, if CERTs take too long, people will prefer to do PRP instead. BTW, how are CERTs credited right now? as PRP-DCs?
And for PRP-CF CERTs? PRP-CF-DC CERTs?

Last fiddled with by LaurV on 2021-04-06 at 04:04
LaurV is offline   Reply With Quote
Old 2021-04-06, 04:44   #75
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

3·11·227 Posts
Default

Quote:
Originally Posted by LaurV View Post
If it is credited accordingly, per time spent. Otherwise, if CERTs take too long, people will prefer to do PRP instead. BTW, how are CERTs credited right now? as PRP-DCs?
And for PRP-CF CERTs? PRP-CF-DC CERTs?
It is credited as PRP-DC based on the time spent.
Prime95 is online now   Reply With Quote
Old 2021-04-06, 15:43   #76
tdulcet
 
tdulcet's Avatar
 
"Teal Dulcet"
Jun 2018

111112 Posts
Default

Quote:
Originally Posted by Prime95 View Post
You read between the lines well. The server cannot prevent someone from taking a PRP assignment and turning it into an LL test.
...
Proof power 5 or 6 is still an excellent option for the disk-constrained. A certification at 1/32 or 1/64th the cost of a first time test is still a huge savings.
Thanks for the info. Daniel and I will have to decide what approach to take if/when you make the change...

Quote:
Originally Posted by LaurV View Post
Nope. No bug. It works as intended, it should keep the FFT as I tell it to use, and don't change it on the fly, unless mandatory (keyboard command, rounding error, etc). Where did you get the idea that I am complaining about a bug in cudaLucas?
All other GIMPS programs that I have used will automatically redetermine the optimal FFT length when you switch devices, including Prime95/MPrime...

Quote:
Originally Posted by LaurV View Post
The issue will remain with gpuOwl. Moreover, gpuOwl doesn't provide a way to switch to another FFT size on the fly.
Interesting, we have not yet been able to do any testing of GpuOwl on Colab... Hopefully it will be less of an issue with GpuOwl, since there are significantly fewer available FFT lengths.

Quote:
Originally Posted by LaurV View Post
Thanks! Waiting for it. I don''t know how to do that by myself, my skill there is null.
No problem. I updated our GPU notebook with your requested change. As suggested by @Prime95, I also added an option to both notebooks so users can select the PRP proof power. Feedback is welcome.
tdulcet is offline   Reply With Quote
Old 2021-04-11, 17:03   #77
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

5×1,889 Posts
Default

Wow! it works! You (two) are my heroes for this weekend!

Albeit a little bit too complicate, first it didn't work, as I had the "CPU and GPU" output (sure! I want to see what BOTH of them are doing!), then I looked in the code and seen that you use the "-k" switch only when the output is "GPU Only", so, ok, stop the test, switch to "GPU Only" mode, restart the test, press the "f/F/t/T/etc" until "OCD satisfied", then let it run for 20 minutes to see that the output and speed is indeed what I want, stop, switch back to "CPU and GPU" output, restart the test. It works a marvell, as Dave would say! Now the tests will be in average ~10% to ~15% faster if I am clever enough to tune the FFT every time the GPU changes. I didn't want to modify the code, as I don't understood the implications, it may be an omission on your side, or you may have a very good reason why the "-k" is active only for the "GPU Only" output, but I didn't have the time (and skill) to look deeper into it.

It works. Full stop.

Thanks.

Last fiddled with by LaurV on 2021-04-11 at 17:09 Reason: link
LaurV is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Google Diet Colab Notebook Corbeau Cloud Computing 1154 2021-05-09 17:58
Primality testing of numbers k*b^n+c Viliam Furik Math 3 2020-08-18 01:51
Alternatives to Google Colab kriesel Cloud Computing 11 2020-01-14 18:45
Google Notebooks -- Free GPUs!!! -- Deployment discussions... chalsall Cloud Computing 3 2019-10-13 20:03
a new primality testing method jasong Math 1 2007-11-06 21:46

All times are UTC. The time now is 02:23.

Wed May 12 02:23:49 UTC 2021 up 33 days, 21:04, 0 users, load averages: 3.02, 2.66, 2.51

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.