mersenneforum.org  

Go Back   mersenneforum.org > Other Stuff > Archived Projects > 15k Search

 
 
Thread Tools
Old 2005-10-02, 03:51   #34
TTn
 

2·7·17·23 Posts
Default RMA.NET speed

RMA.NET timings are faster than other methods.
I've used Joss's 15k example, to compare a small range of 10 thousand n.
k = 210885
nMin = 60000
nMax = 70000

----------------------------
RMA.NET:
5 1/12 hours
----------------------------
Sieve until rate is equal to an average test. Then test the file:
5 7/60 hours
---------------------------
Sieve until rate is double an average LLR test Then test the file:
5 43/180 hours
---------------------------

I fear that your theory, will only get worse with a larger file....
RMA.NET does not split of a band, it tests with LLR at 1% CPU, when NewPGen is sieving at 99% CPU.
If a test/s is complete, only then it is split off individually.

As it stands, RMA.NET is the most efficient, since it can continually measure the progress, and change accordingly. An optimization will not be in the method, but in the hardware resourcing.

Last fiddled with by TTn on 2005-10-02 at 03:55
 
Old 2005-10-02, 16:04   #35
Thomas11
 
Thomas11's Avatar
 
Feb 2003

77116 Posts
Default

I have written a few tools for LLR which you might be interested in.
You can use it to get very accurate information on used FFT lengths and timings. The source code is attached and I will provide Windows binaries in a separate post below. Linux users should use the source code, simply run make on it...

Here is the content of the README file:

Code:
README file for LLRtools by Thomas Ritschel
-------------------------------------------

This is just a collection of routines to get information about FFT lengths used by LLR
and to estimate running times (total and averaged).  Originally made for the 321search
project, it has been modified to work also for general k.  Please feel free to further
development.  Perhaps Jean Penne and/or George Woltman could add some of the features
to future versions of LLR/PRP.


General information:
--------------------

There are the following files:

   llrtools.c  - the tools collection
   llrtools.h  - header file for llrtools.c

   fft_len.c   - lists the used FFT lengths for given (nmin,nmax) range (fixed k)
   av_time.c   - estimates the average time per LLR test (fixed k)
   get_time.c  - estimates total and average times for given LLR input file

The following two files are also required:

   maxlen.txt  - table of FFT lengths and associated max. Mersenne exponents (nmers)
   times.txt   - list of timings (msecs) for different FFT lengths

Depending on your CPU (SSE2 or non-SSE2), you will need to copy either "maxlen_P4.txt"
or "maxlen_Athlon.txt" to "maxlen.txt".  (The other two files "maxlen_P4_proth.txt" and
"maxlen_Athlon_proth.txt" contain information about the FFT lengths and assouiated
max. Mersenne exponents for Proth tests, e.g. k*2^n+1, but this isn't tested so far...)

To get correct timings you will need to parametrize the file times.txt for your
specific hardware.  There are examples given for two Athlons (1 and 2 GHz) and Opteron.
There is no need to get timings for any kind of FFT length.  It's sufficient to have
data on those FFT lengths covered by your range of n.

For small and medium sized k (typically k < 2^20), you should get very accurate results
on the FFT lengths and the timings.  In cases where LLR uses zero-padding (which is the
case for very large k), the results may be wrong by about 10 percent, since the algorithm
used by George Woltmans gwnum is much more sophisticated and needs additional information,
we do not have here.


On the computation of total and average times:
----------------------------------------------

If we assume uniform sampling of our candidates in a given range of n, the total time
can be approximated by integrating the function t(n).

                    1           n_max
  t_average = ------------- * Integral (t(n) dn)
              n_max - n_min    n=n_min

Generally t(n) is neither a parabola nor a linear function.  Rather than it is a 
non-steady function, partially linear but having steps at the FFT turnover points.
We can get the area under this function by integrating it partially.

For fixed FFT length we'll have:

  t(n) = n * msecs_per_iteration(n) ,

where msecs_per_iteration(n) is a constant value, depending on the FFT length.
We will substitute this by m(i), where i is an index for the FFT length.

By splitting the integral into parts at the FFT turnover points we get the following
(after integration):

                    1
  t_average = ----------------- * [ m(i_max)*n_max^2 - m(i_min)*n_min^2 
              2 (n_max - n_min)

                                       i_max
                                    -   Sum    [ (m(i+1) - m(i)) * n(i)^2 ] ]
                                      i=i_min

Here, n(i) is the max. n for FFT length i, i_min and i_max are indexes of the min. and
max. FFT lengths used.

Therefore we need information at which n the FFT length will change as well as the 
times per iteration for each of the FFT lengths.


Typical usage:
--------------

- These are command line tools, so run them manually from the command line rather than
  double clicking on them.  You would'nt see any screen output in the later case.

- First run fft_len to get some insight on the FFT lengths for the (nmin,nmax) range
  of your interest.

     fft_len

  Sample screen output:

     k = 3
     n(min) = 2000000
     n(max) = 5000000

     The following FFT lengths would be used:

         fftlen       nmax
     -----------------------
         114688    2233110
         131072    2560126
         163840    3180158
         196608    3777190
         229376    4411222
         262144    5056254


- Then check, if all the necessary FFT lengths are covered by the file "times.txt".
  If not, get the msecs/iteration from runing LLR for a few seconds or minutes on some
  appropriate (k,n) pairs.

- Run av_time, to get the average time for a given range of n.  If there is insufficient
  information on the timings you will get an error.

     av_time

  Sample screen output:

     --- Athlon 1GHz ---
     k = 3
     n(min) = 2000000
     n(max) = 5000000

     average time per LLR test: 90019.175 sec


- If you already have an input file (perhaps partially sieved) for LLR, you may want to 
  see, how long it will take to run LLR on the whole file.
  This information can be obtained by running get_time.  This program needs the file name
  of your sieve file as a command line parameter.  So simply run it like the following:

     get_time input.txt          (replace "input.txt" by your file name)

  Sample screen output:

     --- Athlon 1GHz ---
     number of (k,n) pairs in file: 9357
     estimated total time for LLR testing the whole file: 100376901.970 sec
     average time per LLR test: 10727.466 sec
Attached Files
File Type: zip llrtools_src.zip (10.9 KB, 185 views)

Last fiddled with by Thomas11 on 2005-10-02 at 16:06
Thomas11 is offline  
Old 2005-10-02, 16:09   #36
Thomas11
 
Thomas11's Avatar
 
Feb 2003

3·5·127 Posts
Default

Okay, here come the Windows binaries, compiled by Watcom compiler under Win98.
Attached Files
File Type: zip llrtools_win32.zip (90.2 KB, 229 views)
Thomas11 is offline  
Old 2005-10-02, 18:24   #37
TTn
 

2×947 Posts
Default

Hi Thomas,

Thanks for the details and specs.
I should be able to make some use of this.

Right on, Bro!
Shane F.
 
Old 2005-10-03, 13:26   #38
fatphil
 
fatphil's Avatar
 
May 2003

23110 Posts
Default

Quote:
Originally Posted by VBCurtis
Phil's explanation makes a lot of sense to me, but I think the effect I tried to explain without rigorous math is not taken into account.
Indeed. Quite what effect the change in sieving range has, I really don't want to guess. However, I get a feeling that it could encourage the transition to PRP-ing. i.e. you sieve more shallowly than what my above model indicates. This is because moving the lowest band into PRP-ing state speeds up sieving for the rest of the candidates.

TTn's continuous comparison should help keep the process close to optimal. If his moving average is smooth enough then the deviation from optimal should be negligible. That's one of the rules of thumb when it comes to multi-stage computational tasks - if it's hard to model on paper then simply program the system with adaptive behaviour such that it finds its own local optimum.

Phil
fatphil is offline  
Old 2005-10-04, 05:15   #39
TTn
 

17·547 Posts
Default

Quote:
TTn's continuous comparison should help keep the process close to optimal. If his moving average is smooth enough then the deviation from optimal should be negligible. That's one of the rules of thumb when it comes to multi-stage computational tasks - if it's hard to model on paper then simply program the system with adaptive behaviour such that it finds its own local optimum.
The moving average can be modified by the user, with the timer option, in the advanced section. By default it is set to 1 hour, for titanic primes.

I set the timer to 5 minutes for the timings.
This is a little smoother, but costs 4-15 seconds depending on the size of the file, and pc speed etc.

Last fiddled with by TTn on 2005-10-04 at 05:16
 
Old 2005-10-06, 05:30   #40
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

109316 Posts
Default

I only claimed that when trying to decide if splitting off a band of numbers was worthwhile versus waiting until the usual sieve time=average LLR time remaining, you should not simply LLR a number when THAT number's LLR was equal to the sieve time.

If you are looking to sieve and LLR a small range of exponents, the traditional average sieve=average LLR works best. If you are running a range from, say, 100k to 1 million, waiting until the sieve reaches average time for that entire range (something like time for n=700k) would cost you time over breaking off, say 100-200K piece at some point. I was only trying to take a stab at the proper time to break off a piece.

Phil's point is well-taken-- sieve somewhere close, and it really doesn't matter. I've likely spent more time typing these attempts to explain than my computer would save from any such optimization!
Cheers, Curtis
VBCurtis is offline  
Old 2005-10-06, 06:45   #41
TTn
 

22·3·7·103 Posts
Default

Breaking off a piece can be a costly thing too,
as it widens the possible error margin.

RMA.NET is breaking off 1% CPU to measure the process efficiency.
This accounts for large file sizes, hardware resources.
The timer could use a slight optimization.
Only if the rates are equal will it recieve 99% priority.
You can do this manually without RMA in the same way.



Set NewPGen to log the numbers removed, so we can count the progress
of "NewPGen.del" file.

Find the "lresults.txt" file to count the completed tests.

Alg:

Test and sieve, the same file at once.

Every xhours/Day, stop LLR and NewPGen.

Compare the count of the NewPGen.del file, versus lresults.txt,
and apply handicap to the weak. (log count * 99) or (results count * 99)

Change LLR's priority from 1-3, if warranted. (remember for next handicap)
1 for NewPGen priority, 2 for equal priority, and 3 for LLR priority.

Delete numbers that have already been tested, from the sieve file.
The completed numbers, are ofcourse in the lresults.txt.


Start alg over:

Last fiddled with by TTn on 2005-10-06 at 06:54
 
Old 2005-10-27, 10:28   #42
fatphil
 
fatphil's Avatar
 
May 2003

3·7·11 Posts
Default

Quote:
Originally Posted by paulunderwood
Phil's approach is better... I did say the model I suggested was an over-simplification but it works well enough on a very large "n" range, spanning many FFT changes...

When investigating FFT changes, please note the FFT changes are different for non-SSE2 and SSE2-enabled computers
My recent back-filling of non-top-5000 primes has given me a good chance to investigate sieving strategy, and my advice _really_ falls to pieces as the number of FFT bands decreases. It doesn't become wrong, it just becomes inapplicable.

Basically, by the time you're down to only a few FFT ranges, the rate of decrease in PRP rate through PRP-ing larger and larger candidates is smaller than the rate of decrease in sieving rate caused by the same PRP-ing.
i.e. PRP-ing is making sieving much slower, and thus redundant as it's noticably decreasing the size of the remaining pool.

An adaptive technique would of course pick up on this trend, and, if possible, should be favoured.
fatphil is offline  
 

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Parallel sieving with newpgen fivemack And now for something completely different 3 2017-05-16 17:55
NewPgen Cybertronic Factoring 0 2014-03-22 10:07
Does NewPGen have a bug? MooooMoo Riesel Prime Search 16 2008-12-11 11:46
Faster sieving with NewPGen and srsieve geoff Sierpinski/Riesel Base 5 11 2006-07-10 01:10
Sieving multiple NewPGen files in a row(How?) jasong Software 18 2005-12-30 20:23

All times are UTC. The time now is 07:29.

Wed Aug 12 07:29:25 UTC 2020 up 26 days, 3:16, 1 user, load averages: 1.39, 1.40, 1.43

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.