mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   EdH (https://www.mersenneforum.org/forumdisplay.php?f=152)
-   -   How I Create a Colab Session That Factors factordb Composites with YAFU (https://www.mersenneforum.org/showthread.php?t=24927)

EdH 2019-11-09 15:50

How I Create a Colab Session That Factors factordb Composites with YAFU
 
(Note: I expect to keep the first post of each of these "How I..." threads up-to-date with the latest version. Please read the rest of each thread to see what may have led to the current set of instructions.)

I will take the liberty of expecting readers to already be somewhat familiar with Google's Colaboratory sessions. There are several threads already on Colab and these should be reviewed by interested readers:

[URL="https://mersenneforum.org/showthread.php?t=24646"]Google Colaboratory Notebook?[/URL]
[URL="https://www.mersenneforum.org/showthread.php?t=24818"]GPU72 Notebook Integration...[/URL]
[URL="https://mersenneforum.org/showthread.php?p=527912"]Notebook Instance Reverse SSH and HTTP Tunnels.[/URL]
[URL="https://www.mersenneforum.org/showthread.php?t=24875"]Colab question[/URL]

I do not, as of yet, have a github account, so I have not created an upload of this to github. Others may feel free to do so, if desired.

The following is a manner to compile and install a minimally working package of YAFU. For this instance, a repository version of GMP is installed, the current version of GMP-ECM is retrieved and compiled and YAFU is retrieved and compiled. This is not a fully working version of YAFU, in that it does not include any support for NFS. Since the range of composites retrieved from factordb is well less than 95 digits in length, SIQS is used for any composite not factored by ECM.

When run, this session retrieves composites of a chosen size from factordb, factors them and submits the factors back to the db.

To use Colab, you need a Gmail account and will be required to log into that account to run a session.

On to the specifics:

Open a [URL="https://colab.research.google.com/notebooks/welcome.ipynb"]Google Colaboratory[/URL] session.
Sign in with your Google/Gmail account info.
Choose New Python3 notebook:
[code]
Menu->File->New Python3 notebook (or within popup)
[/code]Click Connect to start a session.
Edit title from Untitled... to whatever you like.
Paste the following into the Codeblock:
[code]
#########################################################
### This Colaboratory session is designed to retrieve ###
### composites from factordb.com and factor them with ###
### YAFU. The factors are then sent to factordb. ###
### ###
### To adjust the number of composites to retrieve as ###
### well as the size to retrive, change the variables ###
### below this comment block. The size of the random ###
### number to be used to help avoid collisions (1000) ###
### can also be changed, as well as the offset. ###
#########################################################

compNum = 3 # Number of composites to run
compSize = 70 # Size of composites to run
ranNum = 1000 # Number for random count
offset = 10

import fileinput
import os
import random
import subprocess
import time
import urllib.request

#reports factors to factordb
def send2db(composite, factors):
factorline = str(factors)
sendline = 'report=' + str(composite) + '%3D' + factorline
dbcall = sendline.encode('utf-8')
temp2 = urllib.request.urlopen('http://factordb.com/report.php', dbcall)

#checks to see if yafu already exists
#if it does, this portion is skipped
exists = os.path.isfile('yafu')
if exists < 1:
print("Installing system packages. . .")
subprocess.call(["chmod", "777", "/tmp"])
subprocess.call(["apt", "update"])
subprocess.call(["apt", "install", "g++", "m4", "make", "subversion", "libgmp-dev", "libtool", "p7zip", "autoconf"])
#retrieves ecm
print("Retrieving GMP-ECM. . .")
subprocess.call(["svn", "co", "svn://scm.gforge.inria.fr/svn/ecm/trunk", "ecm"])
os.chdir("/content/ecm")
subprocess.call(["libtoolize"])
subprocess.call(["autoreconf", "-i"])
subprocess.call(["./configure", "--with-gmp=/usr/local/"])
print("Compiling GMP-ECM. . .")
subprocess.call(["make"])
subprocess.call(["make", "install"])
print("Finished installing GMP-ECM. . .")
os.chdir("/content")
#retrieves YAFU
print("Retrieving YAFU. . .")
subprocess.call(["svn", "co", "https://svn.code.sf.net/p/yafu/code/branches/wip", "/content/yafu"])
os.chdir("/content/yafu")
for line in fileinput.input('Makefile', inplace=True):
print(line.rstrip().replace('CC = gcc-7.3.0', 'CC = gcc'))
for line in fileinput.input('yafu.ini', inplace=True):
print(line.rstrip().replace('% threads=1', 'threads=2'))
for line in fileinput.input('yafu.ini', inplace=True):
print(line.rstrip().replace('ecm_path=../gmp-ecm/bin/ecm', 'ecm_path=/usr/local/bin/ecm'))
print("Compiling YAFU. . .")
subprocess.call(["make", "USE_SSE41=1"])
print("Finished compiling YAFU. . .")
print("Starting the factoring of", compNum, "composites. . .\n")

#main loop
for x in range(compNum):
randnum = random.randrange(ranNum) + offset
#fetch a number from factordb
dbcall = 'http://factordb.com/listtype.php?t=3&mindig=' + str(compSize) + '&perpage=1&start=' + str(randnum) + '&download=1'
#some file processing to get the number into a format usable by yafu
temp0 = urllib.request.urlopen(dbcall)
temp1 = temp0.read()
composite = temp1.decode(encoding='UTF-8')
composite = composite.strip("\n")
fstart = time.time()
#print number being worked on
# print("Composite", x + 1,":", composite, "<",len(composite),">")
print("Composite {0}: {1} <{2}>".format( x + 1, composite,len(composite)))
#run yafu
factorT = subprocess.run(['./yafu', '-silent'], stdout=subprocess.PIPE, input=temp1)
#find factors from the yafu run in factor.log
file = open('factor.log', 'r')
string = (", prp")
fcheck = 0
factors = ""
for line in file:
found = line.rfind(string)
if found > 0:
line = line.rstrip("\n")
ind = line.rfind(" = ")
ind += 3
line = line[ind:]
if fcheck > 0:
factors = factors + "*"
line = line.split(" ", 1)[0]
factors = factors + line
fcheck += 1
os.remove("factor.log")
runtime = time.time() - fstart
#print factors found
# print("Factors:", factors)
# print("Factors (%d:%02d):" %(int(runtime / 60), int(runtime % 60)), factors, "\n")
print("Factors ({0:0>1}:{1:0>2}): {2}\n".format(int(runtime / 60), int(runtime % 60), factors))
# print("Elapsed time:", int(runtime / 60), "minutes and", int(runtime % 60), "seconds.\n")
# print("Elapsed time:", runtime("%H:%M:%S"))
#send number and factors to factordb
send2db(composite, factors)
#all numbers are completed
print("Completed all", compNum, "composites!")
[/code]Click on the Run cell icon or use CTRL-Enter.

The compilations will run for about two and a half minutes. When YAFU finishes its compilation, after a couple message blocks, if all went well, the factoring process will begin.

The current default is to factor three, 80 digit composites and stop. The factors are sent to the db automatically, so no other manual intervention is needed. To change the number of composites to work on for each run, edit the compNum variable. To change the size of the composites to work on edit the compSize variable.

Eventually, I hope to add a more detailed description of all the code.

mathwiz 2019-11-09 16:46

Now to automate it...
 
Awesome guide!

Now we just have to figure out how to wire this up to FactorDB.com so composites are factored automatically :smile:

Dylan14 2019-11-09 17:07

I can confirm that the code works. A few things:


1. It suffices to just comment out the lines above the imports after you made the code once.
2. Is there a reason why you use the build option USE_SSE41=1, instead of something that is faster like AVX2? As it appears all of the Colab entities have at least this.
3. I added some more comments to the code below the compilation:


[CODE][FONT=monospace]import random
import subprocess
import urllib.request

compNum = 2# Number of composites to run
compSize = 80# Size of composites to run
ranNum = 1000# Number for random count

defsend2db(composite, lastfactor):
#reports factors to factordb
factorline = str(lastfactor)
sendline = 'report=' + str(composite) + '%3D' + factorline
dbcall = sendline.encode('utf-8')
temp2 = urllib.request.urlopen('http://factordb.com/report.php', dbcall)

#main loop
for x inrange(compNum):#run compnum composites
randnum = random.randrange(ranNum)#pick a random number
#fetch number from factordb
dbcall = 'http://factordb.com/listtype.php?t=3&mindig=' + str(compSize) + '&perpage=1&start=' + str(randnum) + '&download=1'
#some file processing to get the number into a format usable by yafu
temp0 = urllib.request.urlopen(dbcall)
temp1 = temp0.read()
composite = temp1.decode(encoding='UTF-8')
composite = composite.strip("\n")
#print composite to test
print("The composite is", composite)
#run yafu
factorT = subprocess.run(['./yafu'], stdout=subprocess.PIPE,input=temp1)
#find factors from a yafu run
factor = factorT.stdout.decode('utf-8')
factorloc = factor.index('***factors found***')
factorloc += 22
tail = factor[factorloc:]
factors = tail[:-34]
facind = factors.rfind('=')
facind += 2
lastfactor = factors[facind:]
#print last factor found
print("The last factor is", lastfactor)
#send factors to fdb
send2db(composite, lastfactor)
#run complete
print("Completed requested number of composites!")
[/FONT]
[/CODE]

EdH 2019-11-09 18:11

[QUOTE=mathwiz;530116]Awesome guide!

Now we just have to figure out how to wire this up to FactorDB.com so composites are factored automatically :smile:[/QUOTE]Thanks, but I must not understand your comment.

Once started, the composites are retrieved, factored and uploaded to factordb.com automatically. The only manual part is the session start and choosing how many composites to work. Then, all is fully automated.

EdH 2019-11-09 18:26

[QUOTE=Dylan14;530119]I can confirm that the code works. A few things:


1. It suffices to just comment out the lines above the imports after you made the code once.
2. Is there a reason why you use the build option USE_SSE41=1, instead of something that is faster like AVX2? As it appears all of the Colab entities have at least this.
3. I added some more comments to the code below the compilation:

[/QUOTE]Thanks Dylan,

I tried a direct copy/paste and lost some formatting. I had to go back to my original. I'm being pulled away ATM, but plan to address all else later.

1. I considered a block delete easier than commenting out lines.
2. I have experienced segmentation faults with AVX2 in the past.
3. Thanks! I'll work on those later.

EdH 2019-11-09 23:30

I made some changes, but unfortunately, the AVX2 option causes SIQS to return earlier than completion and the zero value for factorloc crashes the run. I'll work on this more later.

LaurV 2019-11-10 03:52

[QUOTE=EdH;530122]Thanks, but I must not understand your comment.

Once started, the composites are retrieved, factored and uploaded to factordb.com automatically. The only manual part is the session start and choosing how many composites to work. Then, all is fully automated.[/QUOTE]
I think he meant more or less in a serious way, something along the lines that factordb itself could be "wired" to run such script on colab by itself too.

EdH 2019-11-10 04:01

[QUOTE=LaurV;530163]I think he meant more or less in a serious way, something along the lines that factordb itself could be "wired" to run such script on colab by itself too.[/QUOTE]AH! Thank you! I was correct that I must not have understood. Indeed, I did not. But now I do see how it was meant, with your assistance. I fear factordb would overrun Colab if such was the case, though. . .

LaurV 2019-11-10 04:18

[QUOTE=EdH;530164] I fear factordb would overrun Colab if such was the case, though. . .[/QUOTE]That for sure. One can not compare 20 or 50 real cores that Syd has, with 1 virtual core that colab gives you. But 101 mile per hour is better than 100 miles per hour (this I learned on this forum!)

EdH 2019-11-11 16:10

I made some major changes, all reflected in the original post.

All comments welcome. . .

bsquared 2019-11-11 17:20

Are colab sessions single threaded? If not it would be helpful to run multithreaded.

EdH 2019-11-11 17:30

[QUOTE=bsquared;530298]Are colab sessions single threaded? If not it would be helpful to run multithreaded.[/QUOTE]
There are two threads. I thought I was running two:
[code]
yafuini.write("threads=2\n")
[/code]should provide two threads. It used to anyway.:smile: I'll check again in a bit.

bsquared 2019-11-11 17:44

[QUOTE=EdH;530303]There are two threads. I thought I was running two:
[code]
yafuini.write("threads=2\n")
[/code]should provide two threads. It used to anyway.:smile: I'll check again in a bit.[/QUOTE]

Ah, I missed that line. Thanks.

Harvey563 2021-07-26 12:59

Warnings in runtime log
 
I am getting the following warning in the runtime log:

Jul 26, 2021, 5:43:30 AM WARNING sh: 1: /content/ecm: Permission denied

Is this a problem?

"!chmod -R 777 /content/ecm" doesn't help.

Thanks.:smile:

Harvey563 2021-07-26 17:27

I'm not seeing an errors in results, just wondering.:hello:

EdH 2021-07-27 13:21

I might be misunderstanding something (probably), but I'm not finding a runtime log or an error message. However, the third candidate in my test run failed and I'm not sure why. That composite is made up of a bunch of p3s and p4s and YAFU factored it in the Colab session when I ran it outside of the script.

sample run of script:[code]
Installing system packages. . .
Retrieving GMP-ECM. . .
Compiling GMP-ECM. . .
Finished installing GMP-ECM. . .
Retrieving YAFU. . .
Compiling YAFU. . .
Finished compiling YAFU. . .
Starting the factoring of 3 composites. . .

Composite 1: 8869475002717536050312782604253426183326192888885941092918900891116352039782449522669 <85>
Factors (6:59): 3211610951880144183669785219693807857*2761690359015982301564877430778307467378692976317

Composite 2: 1128164293201891394597728447518614695546967713433260789629082173322486406149926324143 <85>
Factors (6:02): 29751725740768035469240132235509074973407496022009533350217*37919289221465108716321079

Composite 3: 3895909508598394792966931237542439743139398349882487979187550567767896277846662661167 <85> Factors (0:00):

Completed all 3 composites![/code]factor log of third candidate run in isolation:[code]fac: factoring 3895909508598394792966931237542439743139398349882487979187550567767896277846662661167
fac: using pretesting plan: normal
fac: no tune info: using qs/gnfs crossover of 95 digits
fac: no tune info: using qs/snfs crossover of 75 digits
div: primes less than 10000
Total factoring time = 0.0034 seconds


***factors found***

P3 = 911
P3 = 941
P3 = 953
P3 = 967
P3 = 971
P3 = 977
P3 = 983
P3 = 991
P4 = 1009
P4 = 1019
P4 = 1031
P4 = 1033
P4 = 1039
P4 = 1049
P4 = 1061
P4 = 1063
P4 = 1087
P4 = 1091
P4 = 1093
P4 = 1097
P4 = 1103
P4 = 1109
P4 = 1117
P4 = 1129
P4 = 1151
P4 = 1153
P4 = 1163
P4 = 1171
1[/code]

chris2be8 2021-07-27 15:56

[QUOTE=Harvey563;584009]I am getting the following warning in the runtime log:

Jul 26, 2021, 5:43:30 AM WARNING sh: 1: /content/ecm: Permission denied

Is this a problem?

"!chmod -R 777 /content/ecm" doesn't help.

Thanks.:smile:[/QUOTE]

Run "!ls -l /content/ecm" to see what /content/ecm really is (I hope that's the right syntax). HTH

Harvey563 2021-07-27 22:41

!ls -l / content/ecm results:

total 18512 -rw-r--r-- 1 root root 21167 Jul 27 19:23 acinclude.m4 -rw-r--r-- 1 root root 43125 Jul 27 19:23 aclocal.m4 -rw-r--r-- 1 root root 36281 Jul 27 19:23 addlaws.c -rw-r--r-- 1 root root 2672 Jul 27 19:23 addlaws.h -rwxr-xr-x 1 root root 456488 Jul 27 19:23 aprcl drwxr-xr-x 3 root root 4096 Jul 27 19:23 aprtcle drwxr-xr-x 2 root root 4096 Jul 27 19:23 arm drwxr-xr-x 2 root root 4096 Jul 27 19:23 athlon -rw-r--r-- 1 root root 1758 Jul 27 19:23 AUTHORS drwxr-xr-x 2 root root 4096 Jul 27 19:23 autom4te.cache -rw-r--r-- 1 root root 2149 Jul 27 19:23 auxarith.c -rw-r--r-- 1 root root 8124 Jul 27 19:23 auxi.c -rw-r--r-- 1 root root 7390 Jul 27 19:23 auxlib.c ...

It appears to be source code.

"View runtime log" is the final option under runtime menu in Colaboratory.

EdH 2021-07-28 19:55

[QUOTE=Harvey563;584009]I am getting the following warning in the runtime log:

Jul 26, 2021, 5:43:30 AM WARNING sh: 1: /content/ecm: Permission denied

Is this a problem?

"!chmod -R 777 /content/ecm" doesn't help.

Thanks.:smile:[/QUOTE]OK, I think I have it fixed. Thanks for pointing it out to me. (I guess I'm going to have to look at all my other Colab threads that use ECM, as well.)

To manually fix, change the line:[code]
[COLOR=#000000][FONT=monospace][COLOR=#795e26]print[/COLOR][COLOR=#000000](line.rstrip().replace([/COLOR][COLOR=#a31515]'ecm_path=../gmp-ecm/bin/ecm'[/COLOR][COLOR=#000000], [/COLOR][COLOR=#a31515]'ecm_path=[B]/content/ecm[/B]'[/COLOR][COLOR=#000000]))[/COLOR][/FONT][/COLOR][/code][COLOR=#000000][FONT=monospace][COLOR=#000000]to[/COLOR][code][COLOR=#000000][FONT=monospace][COLOR=#795e26] print[/COLOR][COLOR=#000000](line.rstrip().replace([/COLOR][COLOR=#a31515]'ecm_path=../gmp-ecm/bin/ecm'[/COLOR][COLOR=#000000], [/COLOR][COLOR=#a31515]'ecm_path=[B]/usr/local/bin/ecm[/B]'[/COLOR][COLOR=#000000]))[/COLOR][/FONT][/COLOR][/code]Remember to keep the four leading spaces. Also remember that if you've already run the session without the change, you will need to terminate and reconnect the session to clear all the original work.[COLOR=#000000][FONT=monospace]
[/FONT][/COLOR]

[/FONT][/COLOR]

Harvey563 2021-07-29 22:56

This fixed it. Thanks.:smile:


All times are UTC. The time now is 15:24.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.