mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Closed Thread
 
Thread Tools
Old 2019-10-13, 14:54   #1
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10010011001012 Posts
Default Google Colaboratory reference thread

There is a web browser interfaced Python based capability for limited duration (12 hour) cloud computing on Ubuntu Linux and a couple of cpu cores and optionally a GPU. Note that there is also a paid Colab Pro with longer duration.
See https://www.mersenneforum.org/showthread.php?t=24646
All the following are drafts in progress. They will remain somewhat so, as long as (a) Google makes occasional changes in the VM configuration, or , (b) application behavior including total runtime required is affected by the advancing wavefront, and (c) availability or duration of cpu or gpu sessions fluctuate. Which are likely to occur indefinitely. This thread is primarily informed by experiment and experience with the free tier, and somewhat applicable also to the paid tier.

Please post any responses in the reference discussion thread, not in this thread.
Responses posted here may be moved or removed without notice or recourse.
  1. Intro and table of contents (this post)
  2. How to https://www.mersenneforum.org/showpo...09&postcount=2
  3. Mprime attempt https://www.mersenneforum.org/showpo...10&postcount=3
  4. Mfaktc attempt https://www.mersenneforum.org/showpo...11&postcount=4
  5. CUDAPm1 attempt https://www.mersenneforum.org/showpo...28&postcount=5
  6. CUDALucas attempt https://www.mersenneforum.org/showpo...29&postcount=6
  7. GpuOwL attempt https://www.mersenneforum.org/showpo...30&postcount=7
  8. Combined cpu and gpu usage https://www.mersenneforum.org/showpo...73&postcount=8
  9. Worktodo replenishment and result reporting https://www.mersenneforum.org/showpo...75&postcount=9
  10. Mlucas attempt https://www.mersenneforum.org/showpo...7&postcount=10
  11. Notebook instance reverse ssh and http tunnels https://www.mersenneforum.org/showpo...2&postcount=11
  12. The Google drive access authorization sequence https://www.mersenneforum.org/showpo...4&postcount=12
  13. When a VM or GPU is not available https://www.mersenneforum.org/showpo...5&postcount=13
  14. Issues, questions, support https://www.mersenneforum.org/showpo...9&postcount=14
  15. Gpu models available through Google Colab https://www.mersenneforum.org/showpo...5&postcount=15
  16. Multiple branches for cpu-only, or various gpu models https://www.mersenneforum.org/showpo...5&postcount=16
  17. etc tbd
Cllucas is not advisable, because it's for OpenCl, and if the NVIDIA gpus available through the Colaboratory are capable of running Cllucas, they are probably much better employed running GpuOwl. GpuOwL is about twice as fast and has the excellent Gerbicz error check and PRP proof capability, while Cllucas can't have the Gerbicz check and does not have the Jacobi check either, and has no proof output.

Mfakto may be possible. It would be worthwhile only if it runs on NVIDIA OpenCl and has higher performance than Mfaktc.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-09-29 at 16:50
kriesel is offline  
Old 2019-10-13, 15:18   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

470910 Posts
Default How to

(Derived beginning from chalsall's directions at https://www.mersenneforum.org/showpo...&postcount=194 and reading lots of other posts in that thread, followed by a lot of experimentation and head-scratching plus some help from PhilF)

Step by step:
  1. If the application script you plan to use makes use of files to be stored on your Google drive, and does not build the necessary content structure for you as the mprime script originated by Dylan14 does, put into a suitably named folder on your Google drive, all the files required to run the application on Ubuntu Linux:
    application file (Mfaktc, CUDAPm1, Gpuowl, CUDALucas, Mprime, whatever)
    a suitable cudart if applicable (Mfaktc, CUAPm1, CUDALucas)
    a suitable cufft if applicable (CUDALucas, CUDAPm1)
    ini file if needed
    worktodo.txt
    config.txt (Gpuowl optional), etc.
  2. In a modern web browser, such as the recommended Chrome, Firefox or Safari, or Microsoft Edge which worked in a test here (but not Internet Explorer) go to https://colab.research.google.com/no.../welcome.ipynb (Internet Explorer results in a message "This site may not work in your browser. Please use a supported browser. More info") It shouldn't matter what the web browser's host OS is. I've successfully run from Windows 7 x64 Pro, and Windows 10. If I recall correctly, others have run from Linux or a smartphone.
  3. On that same Welcome page, click on the "Connect" button in the upper right-hand corner.
  4. You will need to sign-in with your Google credentials. This connects your browser with a running Virtual Machine (VM) somewhere "in the cloud".
  5. Next, click on the "+ Code" button in the upper left-hand corner. This inserts a new "Code Section" into the "Notebook". You should see a "Play" (">") button (don't click it yet), and then to the immediate right a blinking vertical cursor inviting you to type in code.
  6. Paste in a notebook script for the application(s) you want to run. These may require some alteration. Make sure it includes connection to your google drive, and error detection. (Unless you don't mind everything staying on the VM and being lost after the run ends.)
    See either following posts in this thread, or https://www.mersenneforum.org/showthread.php?t=24646, post numbers by application:
    Mfaktc #2; 19
    perl 11
    misfit 25
    CUDAPm1 26; 158
    Gpuowl 29
    boinc 39
    cado-nfs 131; 137
    CUDALucas 178
    Mprime 208 (or use the build it from scratch script below)
    step by step howto with some system commands 194
    (none yet for mlucas)
    Note, for Mfaktc, due to a lot of console output, there are reports of increased browser memory consumption. It may be helpful to redirect its console output to a file on the VM drive or Google drive, or to nul. It has also been suggested to avoid low bit levels, which generate more frequent output.
  7. If it's a gpu application, at the upper left of the web page, select Edit, Notebook Settings, hardware accelerator, GPU, save.
  8. Now, you can click that "Play" button, and either provide information it requests (such as connecting to Google drive and providing the authorization code as in https://www.mersenneforum.org/showpo...4&postcount=12), watch it run, or read error messages and debug.
  9. Keep the browser tab around for the planned duration of the run. If you close the tab, the VM detects the disconnection and shuts down early. It's ok to minimize the whole browser while the notebook script runs on the Colab VM.
  10. Repeat the above, with appropriate application scripts, from step 5 onward, to occupy both the GPU and the cpu cores available in the Colab VM, to make maximal use of the expected 12 hour run time. This is done by lirst reconnecting to the Google drive if applicable, then launching the resume of one as a background task, and finally a section launching the other application, since it appears only one Code block can run on the notebook page at a time; not even !top or !nvidia-smi in another. See examples at https://www.mersenneforum.org/showpo...73&postcount=8
  11. Repeat as often as you like and can get resources allocated by Colaboratory. Note, some report being able to run multiple VMs with multiple google credential sets.
  12. To end a session before it times out, try stopping the code by clicking the "Play" button again, then at the upper left of the web page, Runtime, Manage Sessions, Terminate, and confirm
Caveat: this is gleaned from two sources. The forum thread about Google Colaboratory https://www.mersenneforum.org/showthread.php?t=24646, and my own attempts to use it from Windows 7 or 10 systems. So it is only as right as I know how to make it.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-09-29 at 17:12 Reason: embedded links to post numbers in step 6
kriesel is offline  
Old 2019-10-13, 15:20   #3
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

111458 Posts
Default Mprime attempt

This was my first foray into colab computing, so it is long, sort of a journal as I stumbled around in the dark.
Quote:
Originally Posted by Dylan14 View Post
Indeed it is possible:

Code:
#Notebook to run mprime on a Colab thing
import os.path
from google.colab import drive
if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')

%cd 'content/drive/My Drive//'
if not os.path.exists('/content/drive/My Drive/mprime/'):
  !mkdir mprime
%cd '/content/drive/My Drive/mprime//'
#fetch mprime executable if we don't have it
if not os.path.exists('mprime'):
  !wget http://www.mersenne.org/ftp_root/gimps/p95v298b6.linux64.tar.gz
  !tar -zxvf p95v298b6.linux64.tar.gz
#!ls
#run mprime
#first, create local.txt and prime.txt if they don't already exist
if not os.path.exists('prime.txt'):
  !echo V24OptionsConverted=1 > prime.txt
  !echo WGUID_version=2 >> prime.txt
  !echo StressTester=0 >> prime.txt
  !echo UsePrimenet=1 >> prime.txt
  !echo DialUp=0 >> prime.txt
  #change the user ID to your own or use ANOYNOMUS to work anoynomusly
  !echo V5UserID=Dylan14 >> prime.txt
  !echo Priority=1 >> prime.txt
  #Since Drive is persistant, can set DaysOfWork as desired:
  !echo DaysOfWork=1 >> prime.txt
  #This comes from undoc.txt.
  !echo MaxExponents=1 >> prime.txt
  !echo RunOnBattery=1 >> prime.txt
  #This sets the work preference. In this case it's set to 5, ECM on Mersennes with no known factors
  !echo WorkPreference=5 >> prime.txt
  !echo [PrimeNet] >> prime.txt
  !echo Debug=0 >> prime.txt
  !echo ProxyHost= >> prime.txt
if not os.path.exists('local.txt'):
  !echo WorkerThreads=1 >> local.txt
  !echo CoresPerTest=2 >> local.txt
  !echo ComputerID=colab >> local.txt
  !echo Memory=8192 during 7:30-23:30 else 8192 >> local.txt
#now run
!chmod +x mprime
!cat prime.txt
!cat local.txt
!./mprime
Of course, change the UserID as desired and the work preference as desired. This also allows us to get around the welcome text and start compute right away.
And I can confirm that it works: I had an ECM assignment to me assigned to a computer called colab.
I switched a copy of the preceding to PRP first time tests and added comments for all the mprime work types I could find, and riddled it with echo statements to show progress and current command lines, ls -l etc.
Script following has been updated for recent mprime/prime95 changes also:
Code:
#Notebook to run mprime on a Colab session
import os.path
from google.colab import drive
import sys
#print (sys.path)
if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')

!echo marker1 cd /content/drive/My Drive/ next
%cd '/content/drive/My Drive//' 
!echo marker2 past cd to google drive, next is ls -l of current default drive
!ls -l
!chmod +w '/content/drive/My Drive'
!echo at mprime exist test& mkdir
if not os.path.exists('/content/drive/My Drive/mprime'):
  !mkdir '/content/drive/My Drive/mprime'
!echo marker3 md mprime next
#was !cd '/content/drive/My Drive/mprime/'
%cd '/content/drive/My Drive/mprime//' 
!ls -l
#fetch mprime executable if we don't have it
if not os.path.exists('/content/drive/My Drive/mprime/mprime'):
  !wget http://www.mersenne.org/ftp_root/gimps/p95v303b6.linux64.tar.gz
  !tar -zxvf p95v303b6.linux64.tar.gz
  #probably ought add also, !rm ./p95v303b6.linux64.tar.gz
!echo marker4 past retrieve mprime, ls -l next
!ls -l
!echo ls -l '/content/drive/My Drive/mprime/'
!ls -l '/content/drive/My Drive/mprime/'
#next, create local.txt and prime.txt if they don't already exist
if not os.path.exists('prime.txt'):
  !echo V24OptionsConverted=1 > prime.txt
  !echo WGUID_version=2 >> prime.txt
  !echo StressTester=0 >> prime.txt
  !echo UsePrimenet=1 >> prime.txt
  !echo DialUp=0 >> prime.txt
  #change the user ID to your own or use ANONYMOUS to work anonymously
  !echo V5UserID=Kriesel >> prime.txt
  !echo Priority=1 >> prime.txt
  #Since Drive is persistent, can set DaysOfWork as desired:
  !echo DaysOfWork=1 >> prime.txt
  #This comes from undoc.txt.
  !echo MaxExponents=1 >> prime.txt
  !echo RunOnBattery=1 >> prime.txt
  # (see http://v5.mersenne.org/v5design/v5webAPI_0.97.html, 7.3 GIMPS Work Preferences)
  # see also https://www.mersenneforum.org/showpost.php?p=505770&postcount=1 or prime95/mprime source code (primenet.h file)
  # 0 whatever makes sense (server decides)
  # 1 trial factoring LMH, not recommended for cpus, leave it to the much faster gpus
  # 2 trial factoring LMH, not recommended for cpus, leave it to the much faster gpus
  # 3 P-1 factoring small
  # 4 optimal P-1 factoring, large
  # 5 ECM factoring, smallish Mersennes
  # 6 factoring Fermat ECM
  # 7 factoring Cunningham ECM
  # 8 ECM of Mersenne Cofactor
  #9-99 reserved
  # 100 LL first time test
  # 101 LL Double check
  # 102 LL test world-record
  # 103 LL test 10M digits (no longer relevant since even DC wavefront is ~54M bits, 16M+ digits)
  # 104 LL test 100M digits (~333M bits)
  # 105 LL first time test with no trial or P-1 factoring
  #106-149 reserved
  # 150 PRP first time test, may take too long for free account
  # 151 PRP double check
  # 152 PRP world record
  # 153 PRP 100M digit test
  #154-159 reserved
  # 160 PRP cofactor test
  # 161 PRP cofactor double check
  #162-199
  # 200 PRP CERT (I have NOT tested that mprime accepts this as a specified type!)
  #201-255 reserved
  #Choose a work type that will complete in a reasonable duration and before expiration.
  #The next line sets the work preference. In this case it's set to 101 LLDC, since 150, PRP first time test may expire before completion at current wavefront exponents
  !echo WorkPreference=101 >> prime.txt
  #Fix the HardwareUID, or otherwise there is an additional "colab" cpu entry in my mersenne.org cpus page for every launch of a 12 hour mprime  session in Colaboratory
  !echo FixedHardwareUID=1 >>prime.txt
  !echo [PrimeNet] >> prime.txt
  !echo Debug=0 >> prime.txt
  !echo ProxyHost= >> prime.txt
if not os.path.exists('local.txt'):
  !echo WorkerThreads=1 >> local.txt
  !echo CoresPerTest=2 >> local.txt
  !echo ComputerID=colab >> local.txt
  !echo Memory=8192 during 7:30-23:30 else 8192 >> local.txt
#now run
!chmod +x mprime
!echo cat prime.txt:
!cat prime.txt
!echo
!echo cat local.txt:
!cat local.txt
!echo run ./mprime 
!./mprime -d >>mprimelog.txt
Results:
Colab computer appearing in my Account Info cpus list;
Google drive authorization is prompted and completed. After some tweaking to arrive at the above, google drive is affected by wget, tar, and ./mprime launch. File times are UTC.

Code:
Go to this URL in a browser:  https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive
marker1 cd /content/drive/My Drive/ next
/content/drive/My Drive
marker2 past cd to google drive, next is ls -l of current default drive
total 14
drwx------ 2 root root 4096 Oct 11 18:13 cudapm1
drwx------ 2 root root 4096 Oct 11 17:21 mfaktc
-rw------- 1 root root 1042 Oct 11 17:34 mfaktc.ipynb
drwx------ 2 root root 4096 Oct 13 16:36 mprime
at mprime exist test
mkdir: missing operand
Try 'mkdir --help' for more information.
marker3 md mprime next
/content/drive/My Drive/mprime
total 44488
-rw------- 1 root root   644436 Aug 18 19:59 libgmp.so.10.3.2
-rw------- 1 root root     2110 Aug 18 19:59 license.txt
-rw------- 1 root root      429 Oct 13 16:56 local.txt
-rw------- 1 root root 37748900 Aug 18 19:59 mprime
-rw------- 1 root root        4 Oct 13 16:56 mprime.pid
-rw------- 1 root root        0 Oct 13 17:07 p87092557.write
-rw------- 1 root root  7032597 Aug 18 20:03 p95v298b6.linux64.tar.gz
-rw------- 1 root root      345 Oct 13 16:56 prime.log
-rw------- 1 root root      196 Oct 13 16:55 prime.txt
-rw------- 1 root root    20019 Aug 18 19:59 readme.txt
-rw------- 1 root root       54 Oct 13 17:07 results.txt
-rw------- 1 root root     9134 Aug 18 19:59 stress.txt
-rw------- 1 root root    37331 Aug 18 19:59 undoc.txt
-rw------- 1 root root    56172 Aug 18 19:59 whatsnew.txt
-rw------- 1 root root       70 Oct 13 16:56 worktodo.txt
marker4 past retrieve mprime, ls -l next
total 44488
-rw------- 1 root root   644436 Aug 18 19:59 libgmp.so.10.3.2
-rw------- 1 root root     2110 Aug 18 19:59 license.txt
-rw------- 1 root root      429 Oct 13 16:56 local.txt
-rw------- 1 root root 37748900 Aug 18 19:59 mprime
-rw------- 1 root root        4 Oct 13 16:56 mprime.pid
-rw------- 1 root root        0 Oct 13 17:07 p87092557.write
-rw------- 1 root root  7032597 Aug 18 20:03 p95v298b6.linux64.tar.gz
-rw------- 1 root root      345 Oct 13 16:56 prime.log
-rw------- 1 root root      196 Oct 13 16:55 prime.txt
-rw------- 1 root root    20019 Aug 18 19:59 readme.txt
-rw------- 1 root root       54 Oct 13 17:07 results.txt
-rw------- 1 root root     9134 Aug 18 19:59 stress.txt
-rw------- 1 root root    37331 Aug 18 19:59 undoc.txt
-rw------- 1 root root    56172 Aug 18 19:59 whatsnew.txt
-rw------- 1 root root       70 Oct 13 16:56 worktodo.txt
ls -l /content/drive/My Drive/mprime/
total 44488
-rw------- 1 root root   644436 Aug 18 19:59 libgmp.so.10.3.2
-rw------- 1 root root     2110 Aug 18 19:59 license.txt
-rw------- 1 root root      429 Oct 13 16:56 local.txt
-rw------- 1 root root 37748900 Aug 18 19:59 mprime
-rw------- 1 root root        4 Oct 13 16:56 mprime.pid
-rw------- 1 root root        0 Oct 13 17:07 p87092557.write
-rw------- 1 root root  7032597 Aug 18 20:03 p95v298b6.linux64.tar.gz
-rw------- 1 root root      345 Oct 13 16:56 prime.log
-rw------- 1 root root      196 Oct 13 16:55 prime.txt
-rw------- 1 root root    20019 Aug 18 19:59 readme.txt
-rw------- 1 root root       54 Oct 13 17:07 results.txt
-rw------- 1 root root     9134 Aug 18 19:59 stress.txt
-rw------- 1 root root    37331 Aug 18 19:59 undoc.txt
-rw------- 1 root root    56172 Aug 18 19:59 whatsnew.txt
-rw------- 1 root root       70 Oct 13 16:56 worktodo.txt
cat prime.txt:
V24OptionsConverted=1
WGUID_version=2
StressTester=0
UsePrimenet=1
DialUp=0
V5UserID=Kriesel
Priority=1
DaysOfWork=1
MaxExponents=1
RunOnBattery=1
WorkPreference=150
[PrimeNet]
Debug=0
ProxyHost=

cat local.txt:
WorkerThreads=1
CoresPerTest=2
ComputerID=colab
Memory=8192 during 7:30-23:30 else 8192
run ./mprime
Since I'm unfamiliar with Python, Linux, and Colab, and it does not show what line of code results in an error message or other output, trying to sort this out is a bit like trying to traverse an unknown maze blindfolded, with earplugs, handcuffed.

So now it seems to sort of work. There was an issue with a zero size savefile. This is repeatable when the running code is stopped.
results.txt contained:
Code:
[Sun Oct 13 17:07:34 2019]
Iteration 21959 / 87092557
[Sun Oct 13 17:20:47 2019]
Trying backup intermediate file: p87092557.write
Error reading intermediate file: p87092557.write
Renaming p87092557.write to p87092557.bad1
All intermediate files bad.  Temporarily abandoning work unit.
Trying backup intermediate file: p87092557.bad1
Error reading intermediate file: p87092557.bad1
All intermediate files bad.  Temporarily abandoning work unit.
After an hour of running, I have a 7MB save file write to the google drive, so it may finally be actually working.

A tested draft script for resuming an existing google-drive-resident mprime run on Colaboratory follows.
Code:
#Notebook to resume a run of mprime on a Colab session
import os.path
from google.colab import drive
import sys
if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')
%cd '/content/drive/My Drive//'
!chmod +w '/content/drive/My Drive'
%cd '/content/drive/My Drive/mprime//'
!chmod +x ./mprime
!echo run ./mprime
!./mprime -d | tee -a ./mprimelog.txt
A stop and restart of the long code block creates an awful mess:
Code:
shell-init: error retrieving current directory: getcwd: cannot access parent directories: Transport endpoint is not connected
marker1 cd /content/drive/My Drive/ next

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-6-758b66790d6f>", line 10, in <module>
    get_ipython().magic("cd '/content/drive/My Drive//'")
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2160, in magic
    return self.run_line_magic(magic_name, magic_arg_s)
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2081, in run_line_magic
    result = fn(*args,**kwargs)
  File "</usr/local/lib/python3.6/dist-packages/decorator.py:decorator-gen-91>", line 2, in cd
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/magic.py", line 188, in <lambda>
    call = lambda f, *a, **k: f(*a, **k)
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/magics/osm.py", line 288, in cd
    oldcwd = py3compat.getcwd()
OSError: [Errno 107] Transport endpoint is not connected

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 1823, in showtraceback
    stb = value._render_traceback_()
AttributeError: 'OSError' object has no attribute '_render_traceback_'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/ultratb.py", line 1132, in get_records
    return _fixed_getinnerframes(etb, number_of_lines_of_context, tb_offset)
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/ultratb.py", line 313, in wrapped
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/ultratb.py", line 358, in _fixed_getinnerframes
    records = fix_frame_records_filenames(inspect.getinnerframes(etb, context))
  File "/usr/lib/python3.6/inspect.py", line 1490, in getinnerframes
    frameinfo = (tb.tb_frame,) + getframeinfo(tb, context)
  File "/usr/lib/python3.6/inspect.py", line 1448, in getframeinfo
    filename = getsourcefile(frame) or getfile(frame)
  File "/usr/lib/python3.6/inspect.py", line 696, in getsourcefile
    if getattr(getmodule(object, filename), '__loader__', None) is not None:
  File "/usr/lib/python3.6/inspect.py", line 725, in getmodule
    file = getabsfile(object, _filename)
  File "/usr/lib/python3.6/inspect.py", line 709, in getabsfile
    return os.path.normcase(os.path.abspath(_filename))
  File "/usr/lib/python3.6/posixpath.py", line 383, in abspath
    cwd = os.getcwd()
OSError: [Errno 107] Transport endpoint is not connected

---------------------------------------------------------------------------
And after that, even a freshly created and connected session can't do even !pwd. This can persist even after closing the web browser entirely, if it is set to reopen the same set of pages. See step 12 of post 2 regarding Manage Sessions for a solution.

Note, that will run the mfaktc program with default ini file settings. See the gpu application specific reference thread on mfaktc for information on tuning to a specific gpu model.

See also the post on running a cpu task such as mprime in a background process, a gpu task in another background task, and monitoring such as periodic top in foreground. And on branching depending on gpu model and availability.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-09-29 at 23:20 Reason: change script to encourage LLDC, discourage first time PRP, update for Cert worktype & mprime 30.3b6
kriesel is offline  
Old 2019-10-13, 15:20   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

17×277 Posts
Default Mfaktc attempt

Script:
Code:
#script to run mfaktc on a Colab session; assumes mfaktc, mfaktc.ini, worktodo.txt are set up 
#in folder mfaktc in the Google drive of the account used to run the colab session
import os.path
from google.colab import drive
if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')
%cd '/content/drive/My Drive/mfaktc//'
!chmod 755 '/content/drive/My Drive/mfaktc/mfaktc.exe'
#let's see what gpu we got
!nvidia-smi
#as of about 2019 November 17, we need to install a cuda lib also
!apt-get install -y cuda-cudart-10-0
!cd '.' && /content/drive/My\ Drive/mfaktc/mfaktc.exe | tee -a mfaktc-run.txt
#following will only execute if worktodo empties before the time limit is reached
!cat 'results.txt'
The preceding version, with >> redirect rather than tee, also works, although the console output seems to become available in the destination file on the google drive only after a considerable time delay. The executable apparently had the necessary NVIDIA library linked in, or they were already present somewhere in the VM's path; putting it on the Google drive was not required. Completed one exponent's bit level and terminated since it was then out of work. Reload and restart.

Since ~November 17 2019, it had begun failing with the following message:
Code:
./mfaktc.exe: error while loading shared libraries: libcudart.so.10.0: cannot open shared object file: No such file or directory
This should be fixable by putting the required file in the Google drive folder and perhaps modifying the path and permissions, which could be more efficient than using the apt-get install every time.

Draft code segment to set up the mfaktc folder
Code:
#draft Notebook to set up an mfaktc Google drive folder for a future Colab session
import os.path
from google.colab import drive
import sys
if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')
%cd '/content/drive/My Drive//'
!chmod +w '/content/drive/My Drive'

if not os.path.exists('/content/drive/My Drive/mfaktc'):
  !mkdir '/content/drive/My Drive/mfaktc'

# that's all for this script draft; 
# drag and drop a suitable copy of the following into the new mfaktc folder
#  mfaktc.exe suitable for linux & Tesla GPUs, CUDA10, preferably 2047M GPUSieveSize compatible
#  mfaktc.ini customized to suit you
#  worktodo.txt
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-09-12 at 20:30
kriesel is offline  
Old 2019-10-13, 18:49   #5
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

17·277 Posts
Default CUDAPm1 attempt

(partly tested draft) An NVIDIA K80 with 11GB of ram would be good for CUDAPm1, or for GpuOwL P-1. An NVIDIA P100 or T4 with 16GB would be even better.

Note, each code section must be compact enough to run in the available time limit, for example the Colab 12-hour limit. That's a concern with threads benchmarking over wide ranges or high iteration counts, but fortunately that can be split up into multiple sessions if needed.


(initial setup and fft benchmark section)
This was of order an hour run time on a K80.

Code:
#Notebook to set up CUDAPm1 v0.20 on a Google drive folder for Colab
import os.path
from google.colab import drive
import sys
if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')
%cd '/content/drive/My Drive//'
!chmod +w '/content/drive/My Drive'
if not os.path.exists('/content/drive/My Drive/cudapm1'):
  !mkdir '/content/drive/My Drive/cudapm1'

%cd '/content/drive/My Drive/cudapm1//'
#fetch cudapm1 executable if we don't have it
if not os.path.exists('/content/drive/My Drive/cudapm1/cudapm1'):
  !wget https://download.mersenne.ca/CUDAPm1/old-experimental/cudapm1-0.20.tar.gz
  !tar -zxvf cudapm1-0.20.tar.gz

#as of about 2019 November 17, we need to install cuda libs also
!apt-get install -y cuda-cudart-10-0
!apt-get install -y cuda-cufft-dev-10-0
!sleep 15

!ls -l '/content/drive/My Drive/cudapm1/'
%cd './cudapm1-0.20//'
#now run some cudapm1 setup, in chunks, not to exceed the VM's time limit
!chmod +x '/content/drive/My Drive/cudapm1/cudapm1-0.20/CUDAPm1'
#for production running at the GIMPS wavefront, 2048k to 8192k would be sufficient
#and would take only a few minutes. I'm running more broadly for test purposes.
!echo run ./CUDAPm1 -cufftbench 1k to 65536k 2 times 
!cd '.' && LD_LIBRARY_PATH=".:lib:${LD_LIBRARY_PATH}" ./CUDAPm1 -cufftbench 1 65536 2 >>cudapm1-setup.txt 
!echo cat readme.txt, cudapm1-setup.txt, etc
!cat *.txt
!echo  
!echo cat CUDAPm1.ini
!cat CUDAPm1.ini
!echo  
!echo customize cudapm1.ini manually before continuing
!echo do threadbench later, then verification run
(threads benchmark section)
Two small perl programs are used in the Colab script for this step. First, Chalsall's little time program time.pl:
Code:
#!/usr/bin/perl -w
# from https://www.mersenneforum.org/showpost.php?p=525182&postcount=11
@DT = gmtime(time);
$Now = sprintf("%04d.%02d.%02d %02d:%02d", 1900+$DT[5],$DT[4],$DT[3],$DT[2],$DT[1]);

print "The time is: ${Now}\n\n";
Next, a perl program to read the fft benchmark file and write a Colab script to run CUDAPm1 for each fft length found in it, to perform threads benchmarking, called threadbenchlist.pl:
Code:
#!/usr/bin/perl -w
# reads fftfile, writes corresponding list of thread benchmark program invokations

sub dprint {  #dual print
  print STDOUT @_;
  print FILEOUT @_;
}

$fftfile = "Tesla K80 fft.txt"; # change file name here to suit
$thdfile = "threadbenchscript.txt"; # will include setup header python code and footer
$repeat=2; # number of iterations of threadbench timings, to average together

#following 12 line block is one perl statement
$headerstring= '#Colab script to run CUDAPm1 v0.20 threadbench for each fft file entry
import os.path
from google.colab import drive
import sys
if not os.path.exists(\'/content/drive/My Drive\'):
  drive.mount(\'/content/drive\')
%cd \'/content/drive/My Drive/cudapm1/cudapm1-0.20//\'
!pwd
!chmod +x ./time.pl
!chmod +x ./CUDAPm1
!./time.pl >>cudapm1-threadbench.txt
';

#following 3 line block is one perl statement
$footerstring = '!./time.pl >>cudapm1-threadbench.txt
!echo done
';

if (open(FILEOUT,'>'.$thdfile) == 0 ) {
  print "Failed to open $thdfile for writing. Terminating run.\n";
  die;
}
if (open(FILE,'<'.$fftfile) == 0 ) {
  print "Failed to open file $fftfile for reading.  Terminating run.\n";
  die ;
} else {
  $fftlines = 0;  #counter to skip 6 lines of fft file header to suppress error messages
  dprint $headerstring;
  while( <FILE>  ) { 
    my $line=$_; 
    $fftlines++;
    ($fftl) = $line =~ /^\s*(\d+)\s+/;
    if ( $fftlines > 6 ) { 
      my $string = "!./CUDAPm1 -cufftbench $fftl $fftl $repeat";
      dprint "!echo $string\n";
      dprint "!cd \'.\' \&\& LD_LIBRARY_PATH=\".:\$\{LD_LIBRARY_PATH\}\" ./CUDAPm1 -cufftbench $fftl $fftl $repeat \| tee -a >>cudapm1threadbench.txt\n";
    } #cd '.' && LD_LIBRARY_PATH=".:${LD_LIBRARY_PATH}"
  }
  close(FILE);
  dprint $footerstring;
  close (FILEOUT);
}
This runs the larger perl code:
Code:
# Colab script section to run the perl code threadbenchlist.pl,
# which generates a Colab code section from an fft file
# to run CUDAPm1 v0.20 threadbench for each fft file entry.
# all is presumed to reside on a Google drive folder.

import os.path
from google.colab import drive
import sys
if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')
%cd '/content/drive/My Drive/cudapm1/cudapm1-0.20//'
!pwd
#as of about 2019 November 17, we need to install cuda libs also
!apt-get install -y cuda-cudart-10-0
!apt-get install -y cuda-cufft-dev-10-0
!sleep 15
!./threadbenchlist.pl

!echo done writing threadbenchscript.txt; review it, then
!echo copy and paste it into a Colab section and run it.
!echo If it completes, great.
!echo If it times out or errors out before completion, 
!echo .. delete the fft lengths that completed from threadbenchscript.txt
!echo .. or debug as applicable and rerun; 
!echo .. repeat until you get what you want in the threads file.
The generated threadbenchscript.txt looks something like:
Code:
#Colab script to run CUDAPm1 v0.20 threadbench for each fft file entry
import os.path
from google.colab import drive
import sys
if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')
%cd '/content/drive/My Drive/cudapm1/cudapm1-0.20//'
!pwd
!chmod +x ./time.pl
!chmod +x ./CUDAPm1
!./time.pl >>cudapm1-threadbench.txt
!echo !./CUDAPm1 -cufftbench 1 1 2
!cd '.' && LD_LIBRARY_PATH=".:${LD_LIBRARY_PATH}" ./CUDAPm1 -cufftbench 1 1 2 | tee -a >>cudapm1threadbench.txt
!echo !./CUDAPm1 -cufftbench 2 2 2
!cd '.' && LD_LIBRARY_PATH=".:${LD_LIBRARY_PATH}" ./CUDAPm1 -cufftbench 2 2 2 | tee -a >>cudapm1threadbench.txt
!echo !./CUDAPm1 -cufftbench 4 4 2
!cd '.' && LD_LIBRARY_PATH=".:${LD_LIBRARY_PATH}" ./CUDAPm1 -cufftbench 4 4 2 | tee -a >>cudapm1threadbench.txt
!echo !./CUDAPm1 -cufftbench 8 8 2
!cd '.' && LD_LIBRARY_PATH=".:${LD_LIBRARY_PATH}" ./CUDAPm1 -cufftbench 8 8 2 | tee -a >>cudapm1threadbench.txt
!echo !./CUDAPm1 -cufftbench 9 9 2
!cd '.' && LD_LIBRARY_PATH=".:${LD_LIBRARY_PATH}" ./CUDAPm1 -cufftbench 9 9 2 | tee -a >>cudapm1threadbench.txt
 ...
!echo !./CUDAPm1 -cufftbench 64000 64000 2
!cd '.' && LD_LIBRARY_PATH=".:${LD_LIBRARY_PATH}" ./CUDAPm1 -cufftbench 64000 64000 2 | tee -a >>cudapm1threadbench.txt
!echo !./CUDAPm1 -cufftbench 65536 65536 2
!cd '.' && LD_LIBRARY_PATH=".:${LD_LIBRARY_PATH}" ./CUDAPm1 -cufftbench 65536 65536 2 | tee -a >>cudapm1threadbench.txt
!./time.pl >>cudapm1-threadbench.txt
!echo done
This can be a large chunk of text. Copy paste and run in a Colab code section. Add the following near the front of it before running it.
Code:
#as of about 2019 November 17, we need to install cuda libs also
!apt-get install -y cuda-cudart-10-0
!apt-get install -y cuda-cufft-dev-10-0
!sleep 15
Such broad threadbench is very time consuming. In my case the run timed out after completing the 1K-40000K span. Another session is running the >40M fft lengths. P-1 on 100Mdigit exponents only needs ~19M fft length. It's unlikely to be necessary to threadbench as high as the script goes, since in my experience CUDAPm1 does not run successfully on exponents large enough to require such large fft lengths. But maybe the K80 can beat the highest I've seen so far in CUDAPm1 completing both stages, 432.5M on the GTX1060 model, at 25088K fft length. It's possible to run the fft and threads benchmarks much higher. the NVIDIA fft library implements lengths supporting up to256M Some gpu models will go to 128M or even more, but CUDAPm1 is capped at exponent 231-1, due to signed 32-bit integer implementation, which fits in 128M fft length.


(selftest)
This little section of Colab code ran properly (but the CUDAPm1 within it did not)

Code:
#Colab section to selftest CUDAPm1 v0.20 on a Google drive folder already prepared
import os.path
from google.colab import drive
import sys
if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')

%cd '/content/drive/My Drive/cudapm1/cudapm1-0.20//'
#as of about 2019 November 17, we need to install cuda libs also
!apt-get install -y cuda-cudart-10-0
!apt-get install -y cuda-cufft-dev-10-0
!sleep 15

!cd '.' && LD_LIBRARY_PATH=".:lib:${LD_LIBRARY_PATH}" ./CUDAPm1 -b2 5000000 -f 2688k 50001781 >>cudapm1-run.txt 
!cat ./results.txt
This runs in under an hour. However, it failed to find the expected factor. A look in its log reveals why. It's powering zero instead of showing changing res64s every time as is normal.
Code:
CUDAPm1 v0.20
Warning: Couldn't parse ini file option UnusedMem; using default.
------- DEVICE 0 -------
name                Tesla K80
Compatibility       3.7
clockRate (MHz)     823
memClockRate (MHz)  2505
totalGlobalMem      11996954624
totalConstMem       65536
l2CacheSize         1572864
sharedMemPerBlock   49152
regsPerBlock        65536
warpSize            32
memPitch            2147483647
maxThreadsPerBlock  1024
maxThreadsPerMP     2048
multiProcessorCount 13
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      2147483647,65535,65535
textureAlignment    512
deviceOverlap       1

CUDA reports 11372M of 11441M GPU memory free.
No entry for fft = 2688k found. Using default thread sizes.
For optimal thread selection, please run
./CUDAPm1 -cufftbench 2688 2688 r
for some small r, 0 < r < 6 e.g.
Using threads: norm1 256, mult 128, norm2 128.
Using up to 11256M GPU memory.
Starting stage 1 P-1, M50001781, B1 = 435000, B2 = 5000000, fft length = 2688K
Doing 627853 iterations
Iteration 5000 M50001781, 0x0000000000000000, n = 2688K, CUDAPm1 v0.20 err = 0.00000 (0:17 real, 3.3667 ms/iter, ETA 34:56)
Iteration 10000 M50001781, 0x0000000000000000, n = 2688K, CUDAPm1 v0.20 err = 0.00000 (0:17 real, 3.3557 ms/iter, ETA 34:33)
Iteration 15000 M50001781, 0x0000000000000000, n = 2688K, CUDAPm1 v0.20 err = 0.00000 (0:16 real, 3.3553 ms/iter, ETA 34:16)
Iteration 20000 M50001781, 0x0000000000000000, n = 2688K, CUDAPm1 v0.20 err = 0.00000 (0:17 real, 3.3540 ms/iter, 

...
Transforms:  9522 M50001781, 0x0000000000000000, n = 2688K, CUDAPm1 v0.20 err = 0.00000 (0:18 real, 1.8551 ms/tran, ETA 0:35)
Transforms:  9608 M50001781, 0x0000000000000000, n = 2688K, CUDAPm1 v0.20 err = 0.00000 (0:18 real, 1.8551 ms/tran, ETA 0:17)
Transforms:  9618 M50001781, 0x0000000000000000, n = 2688K, CUDAPm1 v0.20 err = 0.00000 (0:17 real, 1.8568 ms/tran, ETA 0:00)

Stage 2 complete, 576717 transforms, estimated total time = 17:47
Starting stage 2 gcd.
M50001781 Stage 2 found no factor (P-1, B1=435000, B2=5000000, e=12, n=2688K CUDAPm1 v0.20)
It should produce something like
Code:
M50001781 has a factor: 4392938042637898431087689 (P-1, B1=435000, B2=5000000, e=6, n=2688K CUDAPm1 v0.20)
The CUDAPm1 selftest failed.


(production)
This little section is what would be routinely used for P-1 factoring. There must be adequate work in the worktodo file.
Code:
#untested Colab section to production run CUDAPm1 v0.20 on a Google drive folder already prepared
import os.path
from google.colab import drive
import sys
if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')

%cd '/content/drive/My Drive/cudapm1/cudapm1-0.20//'
#as of about 2019 November 17, we need to install cuda libs also
!apt-get install -y cuda-cudart-10-0
!apt-get install -y cuda-cufft-dev-10-0
!sleep 15

!cat ./worktodo.txt
!cd '.' && LD_LIBRARY_PATH=".:lib:${LD_LIBRARY_PATH}" ./CUDAPm1 >>cudapm1-run.txt 
!cat ./results.txt
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2019-12-20 at 03:20
kriesel is offline  
Old 2019-10-13, 18:49   #6
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

17×277 Posts
Default CUDALucas attempt

(draft in progress)

Here's a script for starting or resuming a run, after installation in a Google drive folder, benchmarking, cudalucas.ini customization, worktodo creation, etc, based on a script provided by ATH posted at https://www.mersenneforum.org/showpo...&postcount=397
Code:
import os.path
from google.colab import drive

if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')

%cd '/content/drive/My Drive/cudalucas/'
!cp 'CUDALucas' /usr/local/bin/
!chmod 755 '/usr/local/bin/CUDALucas'

#as of about 2019 November 17, we need to install cuda libs also
!apt-get install -y cuda-cudart-10-0
!apt-get install -y cuda-cufft-dev-10-0
!sleep 15

!cd '.' && LD_LIBRARY_PATH="lib:${LD_LIBRARY_PATH}" /usr/local/bin/CUDALucas >> outputcudalucas.txt
I'm inclined to use the Google drive folder as the working directory for permanence and full time accessibility.
Code:
import os.path
from google.colab import drive

if not os.path.exists('/content/drive/My Drive'):
   drive.mount('/content/drive')

#as of about 2019 November 17, we need to install cuda libs also
!apt-get install -y cuda-cudart-10-0
!apt-get install -y cuda-cufft-dev-10-0         
!sleep 15

%cd '/content/drive/My Drive/cudalucas/'
!cd '.' && LD_LIBRARY_PATH="lib:${LD_LIBRARY_PATH}" && ./CUDALucas >> ./cl.txt
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2019-11-20 at 22:03
kriesel is offline  
Old 2019-10-13, 18:50   #7
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

17·277 Posts
Default GpuOwL attempt

Tested draft for setup of gpuowl (will create gpuowl folder, git clone and build from latest committed gpuowl source)

Code:
#draft Notebook to set up a gpuowl Google drive folder for a future Colab session
import os.path
from google.colab import drive
import sys
if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')
%cd '/content/drive/My Drive//'
!chmod +w '/content/drive/My Drive'

if not os.path.exists('/content/drive/My Drive/gpuowl'):
  !mkdir '/content/drive/My Drive/gpuowl'

%cd '/content/drive/My Drive/gpuowl//'
!git clone https://github.com/preda/gpuowl

%cd '/content/drive/My Drive/gpuowl/gpuowl//'
!apt install libgmp-dev
!update-alternatives --remove-all gcc 
!update-alternatives --remove-all g++
!apt-get install gcc-8 g++-8
!update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 10
!update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-8 10
!update-alternatives --config gcc
!update-alternatives --config g++
!g++ --version
!make gpuowl

!echo create config.txt, worktodo.txt before continuing
The resulting executable is attached below in a .7z file. It passed a brief test each on PRP3 and P-1, with the following worktodo.txt

Code:
B1=100,B2=300000;PFactor=0,1,2,24000577,-1,70,2
PRP=0,1,2,756839,-1,44,0
From xx00fs at https://www.mersenneforum.org/showpo...6&postcount=29 for production running, (set up the google drive contents first) Note also it goes to a different directory than the first part above sets up. Executables are also available at https://www.mersenneforum.org/showpo...&postcount=487 from Dylan14, and https://www.mersenneforum.org/showpo...&postcount=379 and https://www.mersenneforum.org/showpo...&postcount=670 (containing faster code by prime95) from Fan Ming. So by copying the section below twice into a Colab notebook, and modifying the folder string of one section, you can easily choose between multiple gpuowl builds or versions or worktodo files. This allows such maneuvers as readying one for running, or gathering results, while the other is running.
Code:
from google.colab import drive
drive.mount('/content/drive')
!chmod 777 '/content/drive/My Drive/gpuowl'
!cd '/content/drive/My Drive/gpuowl' && LD_LIBRARY_PATH="lib:${LD_LIBRARY_PATH}" && chmod 777 gpuowl && chmod 777 worktodo.txt && ./gpuowl -use ORIG_X2 -block 200 -log 120000 -maxAlloc 10240 -user kriesel -cpu colab/K80
K80's are reportedly ~60 GHzD/day (in LL or PRP3, and presumably also in P-1).


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: 7z gpuowl-6-11-11-colab.7z (191.4 KB, 80 views)

Last fiddled with by kriesel on 2019-12-09 at 22:53 Reason: add reference to Fan Ming latest build
kriesel is offline  
Old 2019-10-15, 18:41   #8
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

470910 Posts
Default Combined cpu and gpu usage

ath has an example of google drive mprime and colab VM drive mfaktc together at https://www.mersenneforum.org/showpo...&postcount=345

Here's an example that continues mprime and mfaktc, both on google drive. (tested)
Code:
#section to reconnect to Google drive previously readied
import os.path
from google.colab import drive
if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')
  
#section to resume mprime run in background
%cd '/content/drive/My Drive/mprime//'
!chmod +x ./mprime
#otherwise: "/bin/bash: ./mprime: Permission denied"
!./mprime -d >>mprimelog.txt 2>&1 &

#section to continue mfaktc run on a Colab session
%cd '/content/drive/My Drive/mfaktc//'
!nvidia-smi
#as of about 2019 November 17, we need to install a cuda lib also
!apt-get install -y cuda-cudart-10-0
!chmod 755 '/content/drive/My Drive/mfaktc/mfaktc.exe'
!./mfaktc/mfaktc.exe | tee -a mfaktc-run.txt
But it may be better to do it the other way around; occasionally a gpu is not available. Then, I think, the gpu subprocess would terminate, rather than the whole job.
Code:
#section to reconnect to Google drive previously readied
import os.path
from google.colab import drive
if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')
  
#section to resume mfaktc run in background
%cd '/content/drive/My Drive/mfaktc//'
!nvidia-smi
#as of about 2019 November 17, we need to install a cuda lib also
!apt-get install -y cuda-cudart-10-0
!chmod 755 '/content/drive/My Drive/mfaktc/mfaktc.exe'
!./mfaktc.exe >> mfaktc-run.txt 2>&1 &

#section to continue mprime run
%cd '/content/drive/My Drive/mprime//'
!chmod +x ./mprime
 !./mprime -d | tee -a mprimelog.txt
Or perhaps attempt to run them both in the background, and also run top or ps or nvidia-smi, to monitor what's happening on the VM system during the runs. By default top produces frequent output; probably better dial that back to control the browser memory growth rate. Mostly tested draft:
Code:
#section to show what cpu and gpu were allocated to the VM
!lscpu
!nvidia-smi

#section to reconnect to Google drive previously readied
import os.path
from google.colab import drive
if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')

#section to continue mfaktc run in background
%cd '/content/drive/My Drive/mfaktc//'
!nvidia-smi
#as of about 2019 November 17, we need to install a cuda lib also
!apt-get install -y cuda-cudart-10-0
!chmod 755 '/content/drive/My Drive/mfaktc/mfaktc.exe'
!./mfaktc.exe >> mfaktc-run.txt 2>&1 &
  
#section to resume mprime run as a subprocess
%cd '/content/drive/My Drive/mprime//'
!chmod +x ./mprime
!./mprime -d >>mprimelog.txt 2>&1 &

#if running gpuowl, which doesn't accept background and output redirection, put it here; the following including top will go into effect if/when gpuowl halts such as from an emptied worktodo file or lost gpu, and keep the session going for mprime to progress

#section to watch activity; sleep 12 seems normally adequate to start gpu program and show gpu usage
!sleep 12
!nvidia-smi
!top -d 120
If the foreground uses ps or nvidia-smi without any looping, it terminates early; top -d (sizable delay in seconds) keeps the session running for the duration nicely, so that mprime or whatever can run the full session duration.

Here's the top output from a session that had run for several hours and was still going:
Code:
Htop - 15:55:47 up  3:14,  0 users,  load average: 1.02, 1.02, 1.00

%Cpu(s):  1.0 us,  1.2 sy, 49.9 ni, 47.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 13335192 total,  9709212 free,   923880 used,  2702100 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 12119796 avail Mem 


    326 root      30  10  353720 224528   6708 S  99.5  1.7 191:43.87 mprime    
    337 root      20   0 36.335g  93912  82720 S   0.7  0.7   1:25.07 mfaktc.e+ 
    123 root      20   0  640184 150284  61684 S   0.5  1.1   1:09.91 python3   
     19 root      20   0  402900  99204  26220 S   0.1  0.7   0:06.73 jupyter-+ 
      9 root      20   0  686736  55332  24728 S   0.0  0.4   0:03.96 node      
    274 root      20   0 1390368  79120  22480 S   0.0  0.6   0:09.06 drive     
    114 root      20   0   35884   4920   3816 S   0.0  0.0   0:00.49 tail      
    321 root      20   0    4568    768    708 S   0.0  0.0   0:00.44 tail
For those as unfamiliar as I was with top options and the redirection syntax, these may be useful:
https://stackoverflow.com/questions/...output-to-file
https://www.lifewire.com/linux-top-command-2201163

Here's another top output, from after a session terminated, showing uptime of 12hr 4 minutes at the last top output of the session.
Code:
Htop - 14:50:32 up 12:04,  0 users,  load average: 1.00, 1.00, 1.00


KiB Mem : 13335184 total,  9637892 free,   837768 used,  2859524 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 12434060 avail Mem 


    369 root      30  10  364348 232632   7892 S  99.8  1.7 719:07.10 mprime    
    120 root      20   0  640324 151676  62192 S   0.7  1.1   5:16.35 python3   
     20 root      20   0  403932 100360  26032 S   0.0  0.8   0:21.79 jupyter-+ 
    314 root      20   0 1373584  59400  22280 S   0.0  0.4   0:30.43 drive     
     10 root      20   0  689300  59904  24928 S   0.0  0.4   0:11.72 node      
    111 root      20   0   35888   4800   3688 S   0.0  0.0   0:02.09 tail      

    172 root      20   0   18376   1504   1204 S   0.0  0.0   0:00.00 bash      
    173 root      20   0 1117648  14400  12808 S   0.0  0.1   0:00.02 drive     
    174 root      20   0   11596   2224   1956 S   0.0  0.0   0:00.00 grep      
    325 root      20   0       0      0      0 Z   0.0  0.0   0:00.62 fusermou+ 
    363 root      20   0   18376   3008   2764 S   0.0  0.0   0:00.00 bash      
    364 root      20   0    4568    852    792 S   0.0  0.0   0:01.98 tail
We can substitute mlucas on the cpu, or other apps on the gpu, according to taste. Mprime and mfaktc, or mprime and gpuowl work.


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-09-03 at 13:47
kriesel is offline  
Old 2019-10-15, 18:49   #9
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

17×277 Posts
Default Worktodo replenishment and result reporting

Mprime uses PrimeNet.

For Mfaktc, we can use the worktodo.add functionality.
Just drag and drop a worktodo.add file onto the Google drive folder at any time. Or use chalsall's code section in https://www.mersenneforum.org/showpo...&postcount=100
For manual results reporting, drag the results file from the Google drive to your computer. It can then be reported, edited to indicate what's been reported, and copied back, during a time when the application is not going to write a new result; or the Google drive version results.txt file can be deleted or renamed. Or set up Misfit to do it all.

For other gpu applications, that do not implement worktodo.add functionality, there are a few choices.
  1. We can manually update the worktodo.txt on the Google drive, perhaps between Colab run intervals. One of the easier ways of doing that is in the Colab web page, click on Files at left, navigate so you see the worktodo.txt file in the folder/file heirarchy, double-click it, wait for it to open and display contents in the page, paste in new work, click the diskette icon at its upper right to save it, and click the X at its upper right to close that.
  2. We can use an appropriate client management software package or primenet.py script. These tend to be very application-specific. Some are packaged with the application (gpuowl or mlucas for example), while some are from a separate source. See the separate post about those: http://www.mersenneforum.org/showpos...92&postcount=3.
  3. Dylan14's worktodo handling section of his later cudapm1 script. See https://www.mersenneforum.org/showpo...&postcount=158
  4. Apparently, in some cases, GPUto72.
  5. Chalsall's reverse tunnel approach.
  6. Are there more?

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2019-11-19 at 06:32
kriesel is offline  
Old 2019-10-17, 13:59   #10
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

17·277 Posts
Default mlucas attempt

Dylan14's script at https://www.mersenneforum.org/showpo...&postcount=347 is
Code:
#code to compile and run Ernst Mayer's mlucas
import os
from google.colab import drive
if not os.path.exists('/content/drive/My Drive'):
  drive.mount('/content/drive')

!apt-get install gcc-8
%cd 'content/drive/My Drive/'
if not os.path.exists('/content/drive/My Drive/mlucas/'):
  !mkdir mlucas
%cd '/content/drive/My Drive/mlucas//'
if not os.path.exists('mlucas_v18.txz'):
  !wget https://www.mersenneforum.org/mayer/src/C/mlucas_v18.txz
  !tar -xJf mlucas_v18.txz
  
#switch to the mlucas source directory
#check that we have both executables (one for avx2, and one for avx512)
if not os.path.exists('/content/drive/My Drive/mlucas/Mlucasavx512') or not os.path.exists('/content/drive/My Drive/mlucas/Mlucasavx2'):
  %cd '/content/drive/My Drive/mlucas/mlucas_v18/src'
  #we build mlucas twice. Once with avx2, and once with avx512.
  #first, the avx512 build:
  !gcc-8 -c -O3 -DUSE_AVX512 -march=skylake-avx512 -DUSE_THREADS *.c >& build1.log
  !grep error build1.log > erroravx512.log
  if os.stat("erroravx512.log").st_size == 0: #grep came up empty
    !gcc-8 -o Mlucasavx512 *.o -lm -lpthread -lrt
  else: #something went wrong
    !echo "Error in compilation. Check build.log and tell either Dylan14 (if you think Dylan made a mistake) or ewmayer."
    exit()
  #move Mlucasavx512 up a directory and clean up the src directory
  !mv Mlucasavx512 ..
  !rm *.o
  #now build the avx2 executable
  !gcc-8 -c -O3 -DUSE_AVX2 -mavx2 -DUSE_THREADS *.c >& build2.log
  !grep error build2.log > erroravx2.log
  if os.stat("erroravx2.log").st_size == 0: #grep came up empty
    !gcc-8 -o Mlucasavx2 *.o -lm -lpthread -lrt
  else: #something went wrong
    !echo "Error in compilation. Check build.log and tell either Dylan14 (if you think Dylan made a mistake) or ewmayer."
    exit()
  #move Mlucasavx2 up a directory and clean up the src directory
  !mv Mlucasavx2 ..
  !rm *.o

#now we check the processor we have
!echo "Checking processor so we can choose the right executable..."
%cd '/content/drive/My Drive/mlucas/mlucas_v18/'
#by default the permissions are not correct to run the mlucas
!chmod 755 Mlucasavx512
!chmod 755 Mlucasavx2
!grep avx512 /proc/cpuinfo > avx512.txt
!grep avx2 /proc/cpuinfo > avx2.txt
if os.stat("avx512.txt").st_size != 0: #avx512 is available
  !echo "AVX512 detected..."
  #test executable
  !./Mlucasavx512 -fftlen 192 -iters 100 -radset 0
  #performance tune with 2 threads (takes about 10 minutes)
  !./Mlucasavx512 -s m -cpu 0:1 >& selftest.log
  #to do: add code for managing worktodo.txt
  #then run Mlucas
  #!./Mlucasavx512
elif os.stat("avx2.txt").st_size != 0: #avx2 is available
  !echo "AVX2 detected..."
  #test executable
  !./Mlucasavx2 -fftlen 192 -iters 100 -radset 0
  #performance tune with 2 threads (takes about 10 minutes)
  !./Mlucasavx2 -s m -cpu 0:1 >& selftest.log
  #to do: add code for managing worktodo.txt
  #then run Mlucas
  #!./Mlucasavx2
else: #we have some other processor, which I think is fairly unlikely
  !echo "Strange. We don't have avx2 or avx512."
   exit()

Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2019-11-19 at 06:32
kriesel is offline  
Old 2019-10-17, 19:01   #11
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

470910 Posts
Default notebook instance reverse ssh and http tunnels

One of the things that makes it challenging to develop a colab script is there's little or no interactivity or monitoring of what's happening on the VM other than the script application output.

I haven't tried it myself yet, but this looks very interesting: reverse tunnels for interactive use during application script running, and now graphics added.
https://www.mersenneforum.org/showthread.php?t=24840


Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2019-11-19 at 06:32
kriesel is offline  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Google Co-Educational Notebook? Corbeau Cloud Computing 1082 2020-11-21 19:21
Mlucas-specific reference thread kriesel kriesel 4 2020-11-19 22:32
Reference material discussion thread kriesel kriesel 61 2020-10-07 09:06
Alternatives to anesthesia thread (because Google isn't helping) jasong jasong 16 2016-07-14 05:34
The thread for mundane questions that Google fails at jasong jasong 9 2014-02-08 01:54

All times are UTC. The time now is 16:21.

Thu Nov 26 16:21:43 UTC 2020 up 77 days, 13:32, 4 users, load averages: 1.41, 1.55, 1.55

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.