View Single Post
Old 2016-07-15, 06:17   #11
GP2
 
GP2's Avatar
 
Sep 2003

5×11×47 Posts
Default Setting up an EFS filesystem: initial setup and configuration

This part will need to be done separately for each AWS region that you use (but for now let's just do one region).

In the previous section, you logged into an instance using your ssh client program.

In this section, some familiarity with the "bash" shell of Linux will be helpful. You will perform initial setup and configuration of the EFS filesystem you created in the "Setting up an EFS filesystem: create the filesystem" section earlier.

In that section, the newly-created filesystem was assigned a File System ID, which you wrote down. The File System ID is of the form fs-xxxxxxxx, where each "x" is a hexadecimal digit.

At the command line prompt, enter a command similar to:

Code:
FILE_SYSTEM_ID=fs-xxxxxxxx    # STOP!! Change the "xxxxxxxx" to the right value
(but instead of xxxxxxxx, use the File System ID from the earlier section as mentioned above)

Enter the following commands:

Code:
availability_zone=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
region=$(echo -n ${availability_zone} | sed 's/[a-z]$//')
sudo mkdir /mnt-efs
sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 ${FILE_SYSTEM_ID}.efs.${region}.amazonaws.com:/ /mnt-efs
(or instead of "mnt-efs", choose some different directory name if you like)

If the "mount" command fails (times out), one possibility is that you did not start the instance with the correct IAM role (IAM instance role) or security group. In this case, you must go back to the "Setting up an EFS filesystem: make sure you have an instance running with the right permissions" section and launch a new instance; you can't change the IAM role or security group of an already-running instance.

Another possibility is that you did not configure the "efs-mount-target" security group with the right permissions to allow NFS access. Go back to the "Create a new security group for mounting EFS" section to do this, then try the "mount" command again.

Enter the commands:

Code:
cd /mnt-efs
sudo mkdir mprime
sudo chown ec2-user:ec2-user mprime
cd mprime
(or instead of "mnt-efs", choose the same name you chose in the previous set of commands)

Go to http://www.mersenne.org/download/ to check what is the most recent version of mprime for Linux 64-bit. The following assumes it is p95v294b5 (version 29.4)

Enter the commands:

Code:
wget https://www.mersenne.org/ftp_root/gimps/p95v294b5.linux64.tar.gz
mkdir p95v294b5
ln -s p95v294b5 prog
cd prog
tar xvzf ../p95v294b5.linux64.tar.gz
cd ..
Enter the commands:

Code:
mkdir instances
cd instances
mkdir c4.large
In general, c4.large will be all you need, because the instances that use it will be the most cost-effective.

However, optionally, if you know that you want to do so, you could also choose to enter the following command:

Code:
mkdir c4.xlarge c4.2xlarge c4.4xlarge c4.8xlarge  # enter names exactly!
Now you need to create a prime-init.txt file and a local-init.txt file in all the c4.* subdirectories you just finished creating above. You can use the sample versions provided below.

If you don't know how to use editor programs like vi on Linux, the simplest way to create a file is by copy-and-pasting an existing file that you created on your own computer.

So to use the prime-init.txt sample version provided below, first edit it on your own computer, then copy the whole thing into your clipboard. Then run the commands:

Code:
cd c4.large
cat > prime-init.txt
Now click the right mouse button in the ssh terminal window to paste the text, then enter Ctrl D (press and hold the Control key and then press "D").

Then enter the command
Code:
cd ..
to return back to the parent directory, then repeat the process for all the other c4.* subdirectories (c4.xlarge, c4.2xlarge, c4.4xlarge, c4.8xlarge).

Sample prime-init.txt file (it is the same for all the subdirectories c4.large, c4.xlarge, etc):

(the V5UserID line is blank, but you can enter a valid user ID as explained below):

Code:
V24OptionsConverted=1
WGUID_version=2
StressTester=0
UsePrimenet=1
DialUp=0
V5UserID=
WorkPreference=0
OutputIterations=10000
ResultsFileIterations=999999999
DiskWriteTime=30
NetworkRetryTime=2
NetworkRetryTime2=70
DaysOfWork=3
UnreserveDays=30
DaysBetweenCheckins=          1
NumBackupFiles=3
SilentVictory=1
Priority=1
RunOnBattery=1

[PrimeNet]
Debug=0
ProxyHost=

[Worker #1]
If you already have a PrimeNet user account (you can optionally create one at http://www.mersenne.org/gettingstarted/ ), then you should enter this in the V5UserID= line. Having a PrimeNet user account will help you keep track of your progress more easily.

You can also change the WorkPreference= line. It can have the following values:

  0 — Whatever makes the most sense
  2 — Trial factoring
100 — First time primality tests
101 — Double-checking
102 — World record primality tests
  4 — P−1 factoring
104 — 100 million digit primality tests
  1 — Trial factoring to low limits
  5 — ECM on small Mersenne numbers
  6 — ECM on Fermat numbers

Here are sample local-init.txt files, one for each subdirectory. You can enter them in the same way as the prime-init.txt file

Sample local-init.txt file for c4.large subdirectory:

Note: If you are using mprime version 28 or earlier, use "ThreadsPerTest" instead of "CoresPerTest". But it is best to use the latest version.

Code:
OldCpuSpeed=2900
NewCpuSpeedCount=0
NewCpuSpeed=0
RollingAverage=1000
RollingAverageIsFromV27=1
ComputerID=C4_L
Memory=3072 during 7:30-23:30 else 3072
WorkerThreads=1
CoresPerTest=1

If you also optionally chose to mkdir the c4.xlarge, c4.2xlarge, etc. subdirectories in a previous step, then you need to create the following for them:

Sample local-init.txt file for c4.xlarge subdirectory:

Note: if you are using mprime version 28 or earlier, change CoresPerTest to ThreadsPerTest and add the line:
AffinityScramble2=0213

Code:
OldCpuSpeed=2900
NewCpuSpeedCount=0
NewCpuSpeed=0
RollingAverage=1000
RollingAverageIsFromV27=1
ComputerID=C4_XL
Memory=6144 during 7:30-23:30 else 6144
WorkerThreads=1
CoresPerTest=2

Sample local-init.txt file for c4.2xlarge subdirectory:

Note: if you are using mprime version 28 or earlier, change CoresPerTest to ThreadsPerTest and add the line:
AffinityScramble2=04152637

Code:
OldCpuSpeed=2900
NewCpuSpeedCount=0
NewCpuSpeed=0
RollingAverage=1000
RollingAverageIsFromV27=1
ComputerID=C4_2XL
Memory=12288 during 7:30-23:30 else 12288
WorkerThreads=1
CoresPerTest=4

Sample local-init.txt file for c4.4xlarge subdirectory:

Note: if you are using mprime version 28 or earlier, change CoresPerTest to ThreadsPerTest and add the line:
AffinityScramble2=08192A3B4C5D6E7F

Code:
OldCpuSpeed=2900
NewCpuSpeedCount=0
NewCpuSpeed=0
RollingAverage=1000
RollingAverageIsFromV27=1
ComputerID=C4_4XL
Memory=26000 during 7:30-23:30 else 26000
WorkerThreads=1
CoresPerTest=8

Sample local-init.txt file for c4.8xlarge subdirectory:

Note: if you are using mprime version 28 or earlier, change CoresPerTest to ThreadsPerTest and add the line:
AffinityScramble2=0I1J2K3L4M5N6O7P8Q9RASBTCUDVEWFXGYHZ

Code:
OldCpuSpeed=2900
NewCpuSpeedCount=0
NewCpuSpeed=0
RollingAverage=1000
RollingAverageIsFromV27=1
ComputerID=C4_8XL
Memory=56000 during 7:30-23:30 else 56000
WorkerThreads=2
CoresPerTest=9


Note that the ComputerID= line naming scheme above is just a suggestion. You could use ComputerID=c4.large for instance, to make it literally match the instance type.

Note the Memory= line is mostly irrelevant unless you do P−1 testing. The instance types have 3.75 GiB, 7.5 GiB, 15 GiB, 30 GiB and 60 GiB of memory respectively, for c4.large through c4.8xlarge respectively.


Don't specify your own ComputerGUID value

This section is intended for more experienced users.

If you choose to copy your own existing files rather than use the above, I recommend you delete any ComputerGUID= line. This line will get automatically added when the mprime program starts up. Also omit any HardwareGUID= or FixedHardwareUID=1 lines.

If multiple instances use the same ComputerGUID or HardwareGUID line, then PrimeNet thinks they are the one and the same computer. If you look at View the CPU's in your account at mersenne.org, there will be fewer entries there than the actual number of computers you have. However, I think it's OK to have multiple instances (of the same instance type) using the same ComputerID line, and we do so.

Note that the DiskWriteTime is set to the default 30 minutes. If you don't run a lot of instances, you might want to reduce it to 10 minutes. There are some circumstances where save files do not get written when instances are terminated, in particular when you yourself terminate the instance from the EC2 console. A smaller setting like 10 minutes helps to ensure that no more than 10 minutes' work maximum is lost under those circumstances. However, if you run many dozens of LL testing instances simultaneously, or if you do the kind of work that creates large savefiles (things other than LL testing, such as P−1 testing or Fermat testing or ECM testing, using large B2 values), then you might want to keep the DiskWriteTime higher. This is because the EFS filesystem will throttle I/O if you only use a relatively small amount of disk space. If you see savefile names ending in .write being written very slowly over several minutes, or if simple Linux commands in your SSH terminal take a long time to execute, then your I/O is being throttled, and you should either do less I/O (use longer DiskWriteTime intervals), or increase your EFS filesystem disk space usage, which involves incurring higher charges.

Multiple subdirectories of the same instance type

This section is intended for more experienced users. If you are going through this procedure for the first time you should skip it.

The above setup is simple and works for most purposes. Directly under the instances directory, we create one subdirectory named c4.large, and (optionally) others named c4.xlarge, c4.2xlarge, etc. and all the instances running under them ask the PrimeNet server to give them whatever work "makes the most sense" (WorkPreference=0 in the prime-init.txt file). This means faster machines will usually get first-time Lucas-Lehmer tests of larger Mersenne exponents, slower machines may work on double-checking of smaller Mersenne exponents, and older machines may do ECM testing to find facts, etc.

However experienced users running multiple instances might want to have some of them doing one work type and others doing a different type.

If you wish, in addition to having subdirectories with names corresponding exactly to an instance type (for example "c4.large"), you can have subdirectories that add a hyphenated suffix (for example "c4.large-doublechecking", "c4.large-ecm", etc). Also these subdirectories don't have to be in the first level directly underneath the "instances" directory (for example "instances/c4.large"), they can be one or more levels further down (for example "instances/doublechecking/c4.large").

Each of the {instance-type} or "{instance-type} + hyphenated suffix" subdirectories should have a "prime-init.txt" and "local-init.txt" file within it. For instance, your "c4.large-doublechecking" subdirectory could have a prime-init.txt file that changes WorkPreference=0 to WorkPreference=101 (for doublechecking), while simultaneously a "c4.large-LL" subdirectory has WorkPreference=100 for first-time Lucas-Lehmer tests.

Typically the prime-init.txt will vary, as described above. Meanwhile the local-init.txt file usually won't vary for instances of the same instance type. It's OK (and even recommended) for instances of the same instance type to have the same ComputerID= line in this file; however, local-init.txt should not have any ComputerGUID= or a HardwareGUID= or FixedHardwareUID=1 line at all. A unique ComputerGUID line will get automatically generated by mprime when the script copies and renames local-init.txt to local.txt and then runs mprime.

You could allocate different amounts of work to each work type by creating dummy subdirectories whose names start with i-. For example:

c4.large-doublechecking could be created with two empty dummy subdirectories with names "i-foo1", "i-foo2".

c4.large-LL could be created with four empty dummy subdirectories with names "i-foo1", "i-foo2", "i-foo3", "i-foo4"

The exact names don't matter, but they must start with i-.

Then after this setup, you can launch six instances of instance type "c4.large". Each instance will find one dummy subdirectory and rename it to its own instance-id, and then mprime will start up and automatically fetch work from the PrimeNet server. The work type will be as specified in the prime-init.txt file.

Recall that at startup, any newly-launched instance first tries to locate and take over orphaned subdirectories left behind by spot instances that terminated for whatever reason (usually because spot prices rose above the limit price we set). Those orphaned subdirectories have names corresponding to the instance-id's of the instances that were running in them (these names begin with i- followed by either 8 or 17 hexadecimal digits). If a suitable orphaned subdirectory is found, the newly-launched instance renames it to its own instance-id; if no orphaned subdirectories are found, then the newly-launched instance simply creates a new subdirectory whose name is its own instance-id. That newly created subdirectory is created as a child of the parent "instance type" directory. If there is only one "instance type" directory (e.g. c4.large), then it will be created there; however if there are several to choose from (e.g., c4.large-doublechecking, c4.large-LL, ecm/c4.large, etc.) then one will be picked, but it might not be the one you want. Creating dummy "i-" subdirectories lets you control which "instance type" parent directory is used for which number of instances, thus allocating amounts of work among different work types.

If you wish, you can even seed the dummy subdirectories with worktodo.txt files containing lines copied and pasted from the http://www.mersenne.org/manual_assignment/ page. The subdirectories only need the worktodo.txt files, all the other files (configuration and executable) will get copied automatically. If the worktodo.txt file is missing, then mprime will request assignments and receive random exponents.


Next section: Setting up an EFS filesystem: terminating the instance you created

Last fiddled with by GP2 on 2017-11-18 at 02:00 Reason: symbolic link "prog" instead of "p95"; increase Memory to less conservative amounts; v 29.4b5; AffinityScramble2
GP2 is offline   Reply With Quote