mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet > GPU to 72

Reply
 
Thread Tools
Old 2019-11-11, 22:10   #34
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×7×163 Posts
Default

Quote:
Originally Posted by storm5510 View Post
It might actually take much longer to run a DC this way instead of on a local GPU with CUDALucas.
It's not instead of, it's in addition to.
kriesel is offline   Reply With Quote
Old 2019-11-11, 23:57   #35
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

2·19·29 Posts
Default

Quote:
Originally Posted by kriesel View Post
It's not instead of, it's in addition to.
This implies a person could jump from one to another with one, or more, checkpoint files. Is this correct?
storm5510 is offline   Reply With Quote
Old 2019-11-12, 03:11   #36
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

83·101 Posts
Default

Quote:
Originally Posted by storm5510 View Post
This implies a person could jump from one to another with one, or more, checkpoint files. Is this correct?
Yes. Checkpoints are compatible, assuming you run a mfaktc newer than 1.18 or so, when they were changed, and assuming you do not interchange "special" versions (like less classes). You can freely move assignments and checkpoint files between computers, colab included. However keep in mind that moving the checkpoint alone means nothing, unless you have the assignment in worktodo too. This is how mfaktX works, it gets the work from the worktodo file and then it checks for checkpoint. The checkpoint only stores the last class that was done for an exponent, and it will not do again the classes already done. Each class is sieved and powmoded separate. I used this method to split huge assignments (like M666666667 to 86 bits or so) between more computers/cards, by creating "fake" checkpoints so each computer/card does different classes.

Last fiddled with by LaurV on 2019-11-12 at 03:18
LaurV is offline   Reply With Quote
Old 2019-11-12, 03:25   #37
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×7×163 Posts
Default

Quote:
Originally Posted by storm5510 View Post
This implies a person could jump from one to another with one, or more, checkpoint files. Is this correct?
Not necessary. Start and finish each exponent separately works. Colab finishes what it starts, owned gpu finishes what it starts. Completely parallel. If you decide to move an mprime or gpuowl run off Colab, yes, the same app run on your own pc can finish what was started on Colab and stored on Google drive, and vice versa (as long as the application versions are compatible). https://www.mersenneforum.org/showpo...7&postcount=12
kriesel is offline   Reply With Quote
Old 2019-11-13, 16:27   #38
dcheuk
 
dcheuk's Avatar
 
Jan 2019
Iowa, US

22·53 Posts
Default

Hmm got the following error while running colab on TF.

Code:
Failed to execute cell. Could not send execute message to runtime: TypeError: Cannot read property 'getKernelInfo' of null
Cannot read property 'getKernelInfo' of null
TypeError: Cannot read property 'getKernelInfo' of null
    at d (https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab-20191111-080000-RC00_279737042:3474:140)
    at w8 (https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab-20191111-080000-RC00_279737042:3474:275)
    at za.program_ (https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab-20191111-080000-RC00_279737042:3467:302)
    at Ba (https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab-20191111-080000-RC00_279737042:12:336)
    at za.next_ (https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab-20191111-080000-RC00_279737042:10:453)
    at Da.next (https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab-20191111-080000-RC00_279737042:13:206)
    at b (https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab-20191111-080000-RC00_279737042:22:43)
dcheuk is offline   Reply With Quote
Old 2019-11-13, 16:34   #39
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

23×1,103 Posts
Default

Quote:
Originally Posted by dcheuk View Post
Hmm got the following error while running colab on TF.
Hmmm... That is a *deep* error. Never seen it before myself.

Taking a stab in the dark, this looks like the "supervisor" hosting the VM is undergoing maintaince, or having a hardware issue.

Edit: Actually, maybe not really that deep an error. Did you try reconnecting?

Last fiddled with by chalsall on 2019-11-13 at 16:39
chalsall is offline   Reply With Quote
Old 2019-11-13, 16:41   #40
dcheuk
 
dcheuk's Avatar
 
Jan 2019
Iowa, US

22×53 Posts
Default

Quote:
Originally Posted by storm5510 View Post
Sometimes, I think maybe Colab sees all we do as crypto-mining because of the high utilization.
While I lived in the university apartments, they (the university IT department) thought I was `mining crypto currency' due to comparatively larger electricity consumption and `suspicious network activites,' and tried to discipline me for such misbehavior.

I had to send them a bunch of friendly emails explaining that I was using it to parallel computing data for a research project.

Last fiddled with by dcheuk on 2019-11-13 at 16:42
dcheuk is offline   Reply With Quote
Old 2019-11-13, 16:42   #41
dcheuk
 
dcheuk's Avatar
 
Jan 2019
Iowa, US

3248 Posts
Default

Quote:
Originally Posted by chalsall View Post
Hmmm... That is a *deep* error. Never seen it before myself.

Taking a stab in the dark, this looks like the "supervisor" hosting the VM is undergoing maintaince, or having a hardware issue.

Edit: Actually, maybe not really that deep an error. Did you try reconnecting?
Yes, after reconnecting every seems to work fine. Only saw this error message once. Error codes are scary.

I noticed the colab now halts my session every couple hours instead of full 12 hours now. I guess they're onto us hehehe

Last fiddled with by dcheuk on 2019-11-13 at 16:44
dcheuk is offline   Reply With Quote
Old 2019-11-19, 03:46   #42
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

17×251 Posts
Default Colab Exiting on "Getting Initial Work" Phase.

I tried several times including restarting the tunnels.
petrw1 is offline   Reply With Quote
Old 2019-11-19, 05:12   #43
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

23×1,103 Posts
Default

Quote:
Originally Posted by petrw1 View Post
I tried several times including restarting the tunnels.
There appears to have been a change in the underlying VM on Colab.

The mfaktc executable which has worked since the beginning of September is no longer working on Colab (but still is under Kaggle). Absolutely no changes to the bootstrap payload nor server code.

I'm currently seriously handicapped wrt workstation capability. If anyone can build a mfaktc which works in the new environment, please post it here or email it to me.

An exceptionally unhappy day today. Tomorrow (or, actually, now, today) us unlikely to be much more fun...
chalsall is offline   Reply With Quote
Old 2019-11-19, 14:56   #44
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3·7·163 Posts
Default

Quote:
Originally Posted by chalsall View Post
There appears to have been a change in the underlying VM on Colab.

The mfaktc executable which has worked since the beginning of September is no longer working on Colab (but still is under Kaggle). Absolutely no changes to the bootstrap payload nor server code.
I can confirm that it is an issue with Colab, not an issue with chalsall's creation. Mfaktc has stopped working for me on Colab, and I don't use chalsall's tunneling approach. It went from
Code:
ERROR: get_next_assignment(): no valid assignment found in "worktodo.txt"
to
Code:
 ./mfaktc.exe: error while loading shared libraries: libcudart.so.10.0: cannot open shared object file: No such file or directory
somewhere in Nov 16 to Nov 18, after I replenished an exhausted worktodo file. Meanwhile gpuowl and mprime continue to work.
https://www.mersenneforum.org/showth...911#post527911

Unfortunately, while https://download.mersenne.ca/ has NVIDIA dlls for Windows, it does not have the corresponding .so files for linux, perhaps because there are so many flavors. So off to NVIDIA for a download for x86_64 ubuntu: https://developer.nvidia.com/cuda-do...t_version=1804

Last fiddled with by kriesel on 2019-11-19 at 15:45
kriesel is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 10:52.

Tue Mar 31 10:52:39 UTC 2020 up 6 days, 8:25, 0 users, load averages: 0.95, 1.13, 1.13

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.