mersenneforum.org Colab question
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

2019-11-11, 22:10   #34
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

29·167 Posts

Quote:
 Originally Posted by storm5510 It might actually take much longer to run a DC this way instead of on a local GPU with CUDALucas.
It's not instead of, it's in addition to.

2019-11-11, 23:57   #35
storm5510
Random Account

Aug 2009
U.S.A.

28·7 Posts

Quote:
 Originally Posted by kriesel It's not instead of, it's in addition to.
This implies a person could jump from one to another with one, or more, checkpoint files. Is this correct?

2019-11-12, 03:11   #36
LaurV
Romulan Interpreter

Jun 2011
Thailand

22×2,287 Posts

Quote:
 Originally Posted by storm5510 This implies a person could jump from one to another with one, or more, checkpoint files. Is this correct?
Yes. Checkpoints are compatible, assuming you run a mfaktc newer than 1.18 or so, when they were changed, and assuming you do not interchange "special" versions (like less classes). You can freely move assignments and checkpoint files between computers, colab included. However keep in mind that moving the checkpoint alone means nothing, unless you have the assignment in worktodo too. This is how mfaktX works, it gets the work from the worktodo file and then it checks for checkpoint. The checkpoint only stores the last class that was done for an exponent, and it will not do again the classes already done. Each class is sieved and powmoded separate. I used this method to split huge assignments (like M666666667 to 86 bits or so) between more computers/cards, by creating "fake" checkpoints so each computer/card does different classes.

Last fiddled with by LaurV on 2019-11-12 at 03:18

2019-11-12, 03:25   #37
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

29·167 Posts

Quote:
 Originally Posted by storm5510 This implies a person could jump from one to another with one, or more, checkpoint files. Is this correct?
Not necessary. Start and finish each exponent separately works. Colab finishes what it starts, owned gpu finishes what it starts. Completely parallel. If you decide to move an mprime or gpuowl run off Colab, yes, the same app run on your own pc can finish what was started on Colab and stored on Google drive, and vice versa (as long as the application versions are compatible). https://www.mersenneforum.org/showpo...7&postcount=12

 2019-11-13, 16:27 #38 dcheuk     Jan 2019 Pittsburgh, PA 13×19 Posts Hmm got the following error while running colab on TF. Code: Failed to execute cell. Could not send execute message to runtime: TypeError: Cannot read property 'getKernelInfo' of null Cannot read property 'getKernelInfo' of null TypeError: Cannot read property 'getKernelInfo' of null at d (https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab-20191111-080000-RC00_279737042:3474:140) at w8 (https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab-20191111-080000-RC00_279737042:3474:275) at za.program_ (https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab-20191111-080000-RC00_279737042:3467:302) at Ba (https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab-20191111-080000-RC00_279737042:12:336) at za.next_ (https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab-20191111-080000-RC00_279737042:10:453) at Da.next (https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab-20191111-080000-RC00_279737042:13:206) at b (https://colab.research.google.com/v2/external/external_polymer_binary.js?vrz=colab-20191111-080000-RC00_279737042:22:43)
2019-11-13, 16:34   #39
chalsall
If I May

"Chris Halsall"
Sep 2002

2·3·1,567 Posts

Quote:
 Originally Posted by dcheuk Hmm got the following error while running colab on TF.
Hmmm... That is a *deep* error. Never seen it before myself.

Taking a stab in the dark, this looks like the "supervisor" hosting the VM is undergoing maintaince, or having a hardware issue.

Edit: Actually, maybe not really that deep an error. Did you try reconnecting?

Last fiddled with by chalsall on 2019-11-13 at 16:39

2019-11-13, 16:41   #40
dcheuk

Jan 2019
Pittsburgh, PA

13×19 Posts

Quote:
 Originally Posted by storm5510 Sometimes, I think maybe Colab sees all we do as crypto-mining because of the high utilization.
While I lived in the university apartments, they (the university IT department) thought I was mining crypto currency' due to comparatively larger electricity consumption and suspicious network activites,' and tried to discipline me for such misbehavior.

I had to send them a bunch of friendly emails explaining that I was using it to parallel computing data for a research project.

Last fiddled with by dcheuk on 2019-11-13 at 16:42

2019-11-13, 16:42   #41
dcheuk

Jan 2019
Pittsburgh, PA

3678 Posts

Quote:
 Originally Posted by chalsall Hmmm... That is a *deep* error. Never seen it before myself. Taking a stab in the dark, this looks like the "supervisor" hosting the VM is undergoing maintaince, or having a hardware issue. Edit: Actually, maybe not really that deep an error. Did you try reconnecting?
Yes, after reconnecting every seems to work fine. Only saw this error message once. Error codes are scary.

I noticed the colab now halts my session every couple hours instead of full 12 hours now. I guess they're onto us hehehe

Last fiddled with by dcheuk on 2019-11-13 at 16:44

 2019-11-19, 03:46 #42 petrw1 1976 Toyota Corona years forever!     "Wayne" Nov 2006 Saskatchewan, Canada 106708 Posts Colab Exiting on "Getting Initial Work" Phase. I tried several times including restarting the tunnels.
2019-11-19, 05:12   #43
chalsall
If I May

"Chris Halsall"
Sep 2002

2·3·1,567 Posts

Quote:
 Originally Posted by petrw1 I tried several times including restarting the tunnels.
There appears to have been a change in the underlying VM on Colab.

The mfaktc executable which has worked since the beginning of September is no longer working on Colab (but still is under Kaggle). Absolutely no changes to the bootstrap payload nor server code.

I'm currently seriously handicapped wrt workstation capability. If anyone can build a mfaktc which works in the new environment, please post it here or email it to me.

An exceptionally unhappy day today. Tomorrow (or, actually, now, today) us unlikely to be much more fun...

2019-11-19, 14:56   #44
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

12EB16 Posts

Quote:
 Originally Posted by chalsall There appears to have been a change in the underlying VM on Colab. The mfaktc executable which has worked since the beginning of September is no longer working on Colab (but still is under Kaggle). Absolutely no changes to the bootstrap payload nor server code.
I can confirm that it is an issue with Colab, not an issue with chalsall's creation. Mfaktc has stopped working for me on Colab, and I don't use chalsall's tunneling approach. It went from
Code:
ERROR: get_next_assignment(): no valid assignment found in "worktodo.txt"
to
Code:
 ./mfaktc.exe: error while loading shared libraries: libcudart.so.10.0: cannot open shared object file: No such file or directory
somewhere in Nov 16 to Nov 18, after I replenished an exhausted worktodo file. Meanwhile gpuowl and mprime continue to work.
https://www.mersenneforum.org/showth...911#post527911

Unfortunately, while https://download.mersenne.ca/ has NVIDIA dlls for Windows, it does not have the corresponding .so files for linux, perhaps because there are so many flavors. So off to NVIDIA for a download for x86_64 ubuntu: https://developer.nvidia.com/cuda-do...t_version=1804

Last fiddled with by kriesel on 2019-11-19 at 15:45