mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > Cloud Computing

Reply
 
Thread Tools
Old 2019-09-18, 03:26   #78
Dylan14
 
Dylan14's Avatar
 
"Dylan"
Mar 2017

2·293 Posts
Default

Quote:
Originally Posted by Dylan14 View Post
This error would suggest that perhaps / is read only on Kaggle (as per the replies here: https://www.linuxquestions.org/quest...es-4175619721/). But that doesn't make sense, since we are able to write to the disk to run the bootstrap script.

I figured the issue out. Just have to call

Code:
!chmod 777 /tmp
before calling the apt-get command and then it works fine on Kaggle:


Code:
Ign:1 http://deb.debian.org/debian stretch InRelease 
Get:2 http://security.debian.org/debian-security stretch/updates InRelease [94.3 kB] 
Get:3 http://deb.debian.org/debian stretch-updates InRelease [91.0 kB]          
Get:4 http://deb.debian.org/debian stretch Release [118 kB] 
Get:5 http://packages.cloud.google.com/apt cloud-sdk InRelease [6337 B]         
Get:6 http://deb.debian.org/debian stretch Release.gpg [2365 B]                 
Get:7 http://security.debian.org/debian-security stretch/updates/main amd64 Packages [503 kB] 
Get:8 http://packages.cloud.google.com/apt cloud-sdk/main amd64 Packages [86.7 kB] 
Get:9 http://deb.debian.org/debian stretch/main amd64 Packages [7086 kB] 
Fetched 7678 kB in 2s (3565 kB/s)   
Reading package lists... Done
Dylan14 is offline   Reply With Quote
Old 2019-09-18, 12:23   #79
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2·2,969 Posts
Default

I believe most people recommend running apt-get update as root with sudo. I don't know whether that is an option on this system. It might struggle for permissions on writing the final files as well as the temporary files(I believe this is the normal reason for the root permissions).
henryzz is online now   Reply With Quote
Old 2019-09-18, 13:39   #80
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

100111010000012 Posts
Default

Quote:
Originally Posted by Chuck View Post
I have just observed what is causing the "wide" Kaggle output. When the uptime goes beyond "23:59", it starts outputting "1 day, 4 min" etc. These additional characters are causing the line wrap.
OK, thanks for bringing that forward. I'm now getting the raw uptime from /proc/, and rendering it as HH:MM.
chalsall is offline   Reply With Quote
Old 2019-09-18, 13:40   #81
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

274116 Posts
Default

Quote:
Originally Posted by Dylan14 View Post
I figured the issue out. Just have to call ... before calling the apt-get command and then it works fine on Kaggle:
OK, thanks for this improvement. Applied.
chalsall is offline   Reply With Quote
Old 2019-09-18, 15:32   #82
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

38216 Posts
Default Kaggle checkpoints

Since we are only allowed 30 hours of GPU time per week, and 9 hours of connect time per session, if checkpoint restarts are going to work they will have to be saved for about five days.

This assumes I will use my 30 GPU hours the first two days of each week.

And shouldn't the process begin with looking for checkpoint files instead of assigning new work? I am getting a lot of abandoned checkpoints building up. (I noticed this morning that Colab started out with a checkpoint file; perhaps this has already been addressed).

Last fiddled with by Chuck on 2019-09-18 at 15:47
Chuck is offline   Reply With Quote
Old 2019-09-18, 15:47   #83
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

13·773 Posts
Default

Quote:
Originally Posted by Chuck View Post
Since we are only allowed 30 hours of GPU time per week, and 9 hours of connect time per session, if checkpoint restarts are going to work they will have to be saved for about five days. This assumes I will use my 30 GPU hours the first two days of each week.
Assignments with checkpoint data will never be expired. Or, at least, there's no code for that currently -- will probably be needed in the future to deal with abandoned "Anonymous" assignments.

So if you eat your 30 hour allotement quickly, the assignments with work done will stick around, for you to pick up whenever you next launch an instance.

Keep in mind also that your Colab worker(s) will be given any assignments not reported on for 12 hours, so old assignments handing around shouldn't really be an issue.
chalsall is offline   Reply With Quote
Old 2019-09-18, 15:50   #84
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

13·773 Posts
Default

Quote:
Originally Posted by Chuck View Post
I am getting a lot of abandoned checkpoints building up. (I noticed this morning that Colab started out with a checkpoint file; perhaps this has already been addressed).
OK... It's entirely possible I've done something stupid.

I'm watching the logs; let me observe what's happening...
chalsall is offline   Reply With Quote
Old 2019-09-18, 21:51   #85
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

235018 Posts
Default

Quote:
Originally Posted by chalsall View Post
I'm watching the logs; let me observe what's happening...
OK... Something weird is going on with regards to reassigning you your old candidates; haven't figured out why yet. The work definitely isn't "lost" -- I just need to figure out the stupid mistake I've made in the SQL. Still working it.

For anyone running an instance (or two...), I have just "pushed" the lastest production Bootstrap package. This has been regression tested, and it's sane.

I've tightened up the log output, to be as dense as it can be, while still containing the data. I've moved the "ETA" field to be immediately after "% Done" -- seemed more logical.

The spider is now returning the observed GHzD and ItrTime data to the server. This is to be able to calculate estimated completions (not coded yet on the server).

Anyone launching future instances will pick up this new code. For anyone currently running an instance, it is safe to stop and then relaunch.

This is ***so*** cool!
chalsall is offline   Reply With Quote
Old 2019-09-18, 22:48   #86
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

29·349 Posts
Default

So, I am not a Linux or Python person (my bad). I do have access to the Coloboraory through a corporate g-mail / g-suite package.
If I want to set up to run, is there some hand-holding instructions on how to? I have been paying some attention, but much of the code is lost on me.
Uncwilly is offline   Reply With Quote
Old 2019-09-18, 23:28   #87
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

13×773 Posts
Default

Quote:
Originally Posted by Uncwilly View Post
If I want to set up to run, is there some hand-holding instructions on how to? I have been paying some attention, but much of the code is lost on me.
We've gotten to the point that code isn't really involved. Other than one copy-and-paste.

Just Create a new Assignment Key, and then log into Colab and/or Kaggle to paste the code, and then click Run. That's it.

This presumes you already have a GPU72 account. And, of course, a Primenet account to which to submit results (that part isn't automated yet).

Please give it a whirl. I like to see the code paths exercised, to find those corner cases!
chalsall is offline   Reply With Quote
Old 2019-09-19, 00:01   #88
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

3×1,619 Posts
Default

Last evening I started my weekly 30 hours. 2 commits and 1 run all. Before lunch today my 30 hours were all gone.
petrw1 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Alternatives to Google Colab kriesel Cloud Computing 11 2020-01-14 18:45
Notebook enzocreti enzocreti 0 2019-02-15 08:20
Computer Diet causes Machine Check Exception -- need heuristics help Christenson Hardware 32 2011-12-25 08:17
Computer diet - Need help garo Hardware 41 2011-10-06 04:06
Workunit diet ? dsouza123 NFSNET Discussion 5 2004-02-27 00:42

All times are UTC. The time now is 12:36.


Mon Dec 6 12:36:42 UTC 2021 up 136 days, 7:05, 0 users, load averages: 2.32, 1.94, 1.63

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.