mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > Cloud Computing

Reply
 
Thread Tools
Old 2019-09-12, 23:31   #45
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

22·32·5·53 Posts
Default

Quote:
Originally Posted by De Wandelaar View Post
From now on, GPU usage is limited to 30 hours per week.
It never fails...

On the other hand, we can hardly complain about them ***giving*** each of us 1,500 GHzD of free compute every week!!!

Also, in some ways, this is comforting. It means they're OK with us doing what we're doing.
chalsall is online now   Reply With Quote
Old 2019-09-12, 23:45   #46
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

2×3×1,021 Posts
Default

Quote:
Originally Posted by chalsall View Post
Also, in some ways, this is comforting. It means they're OK with us doing what we're doing.
Until they discover they can't monetise the work and ban all of your accounts.

You don't really think it is free do you?
retina is offline   Reply With Quote
Old 2019-09-13, 03:58   #47
Dylan14
 
Dylan14's Avatar
 
"Dylan"
Mar 2017

23E16 Posts
Default

I have been running the Colab script for the GPU72 project, and while it runs well, there should be a way for it not to request more assignments than the time allots. I was thinking something along these lines (in psuedocode):


Code:
* upon running the script, get the current timestamp (in Unix time) and detect what platform we are on (Colab or Kaggle) and call it start
* while the timestamp < start + 12 hours (Colab) or 9 hours (Kaggle), fetch an assignment
* estimate the amount of time it would take to run the assignment, and add to the current timestamp
* if the estimated time of completion would be after the deadline, drop the assignment and break
Dylan14 is offline   Reply With Quote
Old 2019-09-13, 11:52   #48
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

22×32×5×53 Posts
Default

Quote:
Originally Posted by Dylan14 View Post
I have been running the Colab script for the GPU72 project, and while it runs well, there should be a way for it not to request more assignments than the time allots. I was thinking something along these lines (in psuedocode):
Thanks for the idea; I've been modeling different strategies in my head, trying to come to convergence.

The problem with your suggestion is with mfaktc you can't know when a factor will be found, and so it's important to keep a bit of a buffer of work queued. This is more of an issue with Colab than Kaggle, in that with the former if you run out of work and mfaktc stops, the GPU's availability will be wasted.

My current methodology is to keep three candidates in the worktodo file. The checkpoint file for the currently being worked candidate is uploaded to the server every two minutes.

If a candidate assigned to a Colab / Kaggle instance is more than 12 hours old it is recycled if no checkpoint file was returned.

What I'm currently working on is reissuing unfinished candidates back to the same GPU72 worker with the checkpoint file so they can continue, and finish off the work.

By the end of the weekend I'll have the UI stuff built on GPU72, to let anyone participate. However, if anyone else would like to give this a whirl, please PM me with your GPU72 account details (UN, Display Name or email).

And thanks to the current beta-testers. Lots of great feedback (and factors found!).
chalsall is online now   Reply With Quote
Old 2019-09-13, 13:33   #49
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

2·443 Posts
Default

I was assigned a Tesla T4 (1720 GHzD/D) for my first two 12-hour sessions (it disconnects automatically after that time), but this morning I am running on a much slower K80 (410 GHzD/D)

The checkpoint capability is much more important with this much slower GPU. The estimated time to complete a 69M 74->75 TF is a little over three hours, so there is potential for a loss of three hours computing time. Maybe I should consider stopping the run and reconnecting after three exponents have been processed (assuming no factor found) until the checkpoint code is in place.

Last fiddled with by Chuck on 2019-09-13 at 14:22 Reason: Restart
Chuck is offline   Reply With Quote
Old 2019-09-13, 16:38   #50
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

100101010001002 Posts
Default

Quote:
Originally Posted by Chuck View Post
I was assigned a Tesla T4 (1720 GHzD/D) for my first two 12-hour sessions (it disconnects automatically after that time), but this morning I am running on a much slower K80 (410 GHzD/D)
LOL... Yeah, a bit ironic, but... It's now disappointing to only get a K80 for free!

Quote:
Originally Posted by Chuck View Post
Maybe I should consider stopping the run and reconnecting after three exponents have been processed (assuming no factor found) until the checkpoint code is in place.
The most important part of the checkpoint code ***is*** in place; everyone is now running this. The most work which will be lost is two minutes (and, thus, on average only one minute).

By EOD today I'll have the code implemented to be able to give back the assignment to complete. Importantly, the previous worker will be given back the assignment.

Edit: BTW, you can see the checkpoint file status by looking at your Assignments page. The percentage completed is calculated from the submitted checkpoint files.

Last fiddled with by chalsall on 2019-09-13 at 16:47
chalsall is online now   Reply With Quote
Old 2019-09-14, 13:06   #51
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

15668 Posts
Default No backend with GPU available

This morning my notebook disconnected in the middle of a run and when I attempted to reconnect I got the message

Code:
Failed to assign a backend
No backend with GPU available. Would you like to use a runtime with no accelerator?
I don't know if this is due to overuse of GPUs by my account or general unavailability of hardware. Is the Colab honeymoon over?
Chuck is offline   Reply With Quote
Old 2019-09-14, 14:17   #52
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

2×443 Posts
Default Running again...

An hour later I tried reloading the notebook and it is running again with a T4.

Last fiddled with by Chuck on 2019-09-14 at 14:17 Reason: 1 hour
Chuck is offline   Reply With Quote
Old 2019-09-14, 15:22   #53
pinhodecarlos
 
pinhodecarlos's Avatar
 
"Carlos Pinho"
Oct 2011
Milton Keynes, UK

33·181 Posts
Default

Can I use two accounts from the same IP address?
pinhodecarlos is online now   Reply With Quote
Old 2019-09-14, 15:34   #54
De Wandelaar
 
De Wandelaar's Avatar
 
"Yves"
Jul 2017
Belgium

3×17 Posts
Default

Quote:
Originally Posted by pinhodecarlos View Post
Can I use two accounts from the same IP address?
I'm doing so and until now no problem ...
De Wandelaar is offline   Reply With Quote
Old 2019-09-14, 15:36   #55
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

22×32×5×53 Posts
Default

Quote:
Originally Posted by pinhodecarlos View Post
Can I use two accounts from the same IP address?
Yup. I'm currently running two different accounts concurrently, from the same IP. And, in fact, from the same browser (different tabs).

I've found the GPU backend availability can vary considerably. Sometimes one account can get a GPU, while the other can't. Sometimes neither can, and sometimes both can.

A small sample-set suggests the GPUs are in high demand during "working hours" Eastern time, and then opens up at around 1800 (2200 UTC).
chalsall is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Alternatives to Google Colab kriesel Cloud Computing 11 2020-01-14 18:45
Notebook enzocreti enzocreti 0 2019-02-15 08:20
Computer Diet causes Machine Check Exception -- need heuristics help Christenson Hardware 32 2011-12-25 08:17
Computer diet - Need help garo Hardware 41 2011-10-06 04:06
Workunit diet ? dsouza123 NFSNET Discussion 5 2004-02-27 00:42

All times are UTC. The time now is 21:00.

Wed Apr 21 21:00:01 UTC 2021 up 13 days, 15:40, 0 users, load averages: 3.00, 2.29, 1.96

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.