mersenneforum.org A Tool for Studying Aliquot Sequence Data
 Register FAQ Search Today's Posts Mark Forums Read

2021-09-10, 14:05   #89
EdH

"Ed Hall"
Dec 2009

3·372 Posts

Quote:
 Originally Posted by Aillas Just a stupid question: Could the regina file have 'hole' in it's index (column A) ? Or is it guaranteed that if the file has 1 million lines, then the indexes will be 2 to 1,000,001? It seems there is no hole, but I prefer to be sure... Thanks...
There's no "hole" anywhere, but there is a little potential confusion to be aware of. (The regina file should contain one less than its M value lines.)

-The sequences run from 2 upwards to whatever Jean-Luc has stopped at for the current iteration, usually an even M, i.e. 18M currently. (2-18,000,000)

- 2-18,000,000 only includes 17,999,999 entries, because it starts with 2 rather than 1. If you add up the current numbers for primes, composites and cycles, you should get 17,999,999.

- Because I used a 0-based array and used the element number for the sequence, even though it will have one less sequence, I needed to account for M+1 elements. In actuality, I was basing the seqinfo2 code on 20M, so I used:
Code:
static struct seqdata seqd[20000001]
to cover the 20,000,000 sequence.

BTW, my current run for primes has been sitting at 99% since I first checked it this morning. It still may be a while.

 2021-09-10, 17:17 #90 garambois     "Garambois Jean-Luc" Oct 2011 France 22×32×19 Posts I confirm, there can't be a hole in column A. And yes there is always one less entry than the limit I announce, as Edwin explains : Limit to 18,000,000 and therefore effectively 17,999,999 entries currently, since we start at 2. This weekend I will be able to publish the Regina file up to 22,000,000. Do you want me to do this or do you still want to continue working with the file up to 18,000,000 and wait until later when I publish the file at 25 or 30 million ?
2021-09-10, 17:46   #91
EdH

"Ed Hall"
Dec 2009

3×372 Posts

Quote:
 Originally Posted by garambois . . . This weekend I will be able to publish the Regina file up to 22,000,000. Do you want me to do this or do you still want to continue working with the file up to 18,000,000 and wait until later when I publish the file at 25 or 30 million ?
seqinfo2 that I posted will only handle 20M at present, but I have tested it, and you can increase it to 22M by simply changing three instances of 2000001 to 2200001 in the source and recompiling it. I tested 25M and it did not compile. Since the version being worked on by Aillas should handle the larger regina-files (and more speedily), I don't know if I'll do more with mine.

2021-09-10, 18:08   #92
EdH

"Ed Hall"
Dec 2009

3·372 Posts

The seqinfo2 primes counting function finally completed:
Code:
\$ ./seqinfo2
Data available for sequences 2 through 18000000
Sequence endings - prime: 13623702, cycle: 260439, open: 4115858
Enter sequence (##/a/h/p/p##/q/u): p
This process will take several hours.  Continue? (y/n):y
1151629 unique primes found!
Listing took 100381 seconds to generate.
Enter sequence (##/a/h/p/p##/q/u):
That means 27:53:01 on an i7 3.4GHz machine (single threaded).

The largest terminating prime was only 14604141802777 (14 digits), which shows as occurring 23 times. Is it odd that the largest terminating prime would be so small?:
Code:
Enter sequence (##/a/h/p/p##/q/u): p14604141802777
List all sequences that terminate with 14604141802777? (y/n/c/f): y
1923540   2967858   3462540   6232740   8361636   11070756  11079640  11470428
11788640  11792880  12125052  13462620  13896164  14362840  14772844  15091100
16062400  16250036  16560672  16839044  17693060  17849500  17850500
23 sequences found.
Perform Advanced Filtering on these results? (y/n):
Attached Files
 primescount.7z (1.49 MB, 16 views)

 2021-09-10, 22:38 #93 Aillas     Oct 2002 France 2·79 Posts Thanks for the info and the file. I did a file comparison and the results are identical. That's a good news, that means I didn't broke something :) Here are the logs for the new version (with debug outputs I will remove later): Code: ./seqinfo Verifying regina_file... Done successfully Reserving a regina vector of 20000000 elements (capacity: 20000000) Loading regina_file...Loading regina file took 102 seconds. Done successfully Data available for sequences 2 through 18000000 Sequence endings - prime: 13623702, cycle: 260439, open: 4115858 Shrink regina vector to 17999999 elements Enter sequence (##/a/h/p/p##/q/u): p This process will take several hours. Continue? (y/n):y Progress: 100% 1151629 unique primes found! Listing took 6 seconds to generate. Enter sequence (##/a/h/p/p##/q/u): p14604141802777 List all sequences that terminate with 14604141802777? (y/n/c/f): y 1923540 2967858 3462540 6232740 8361636 11070756 11079640 11470428 11788640 11792880 12125052 13462620 13896164 14362840 14772844 15091100 16062400 16250036 16560672 16839044 17693060 17849500 17850500 23 sequences found. Perform Advanced Filtering on these results? (y/n): n Enter sequence (##/a/h/p/p##/q/u): I will try another thing, just to compare load and computation speed between the 2 version. The second one should be 'easier' regarding index and sequence number. But maybe slower... Will see. But I was not expecting a such drastic reduction in file generation time. I'm using a core i7 3,5 GHz single threaded (2013) I hope it will help you and garambois in sequences analysis. Last fiddled with by Aillas on 2021-09-10 at 22:41 Reason: Missing information
 2021-09-10, 23:56 #94 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 10000000010112 Posts It's fantastic! Thank you for all the help. This will definitely shorten some of the analysis times. Specifically, the primes count took too long to do often, but now it's no big deal. I will have to study what you have modified and see if I can learn any of your routines. How easy will it be to allow for 30M?
 2021-09-12, 09:53 #95 garambois     "Garambois Jean-Luc" Oct 2011 France 22×32×19 Posts 6 seconds VS 100381 seconds ! This is incredible ! Thanks to you for developing such powerful data analysis tools ! On my side, I have published the four files for the 22,000,000 limit (regina, regina_cycles, regina_opens, regina_prems). Edwin, your program does indeed still work normally with this new limit above 20 million. I will republish when my computer reaches the 30 million limit.
2021-09-12, 12:56   #96
Aillas

Oct 2002
France

2·79 Posts

Quote:
 Originally Posted by EdH How easy will it be to allow for 30M?
For test purpose only, I've generated 2 regina files, based on the original regina file.
For the first one, I've duplicated each line 4 times, generating a file of 72 millions entries.
For the second one, I've duplicated each line 10 times, generating a file of 180 millions entries.
each lines has a proper index...

My hardware: Core i7 3,5 Ghz /24 GB of memory and with an hard drive, no SSD (<- Important:))

Here are the results:

Code:
+------------------------------------------------+
| Regina |    Load file    | Prime List | Memory |
|--------+-----------------+------------+--------|
|   18 M |   98s ( ~1m30s) |     ~5 sec |  ~2 GB |
|--------+-----------------+------------+--------|
|   72 M |  400s ( ~7 min) |    ~19 sec | ~12 GB |
|--------+-----------------+------------+--------|
|  180 M | 1130s (~19 min) |    ~38 sec | ~27 GB |
+--------------------------+---------------------+
I had some issues with 180M file when some other applications were also running. As my computer has only 24 GB of memory, I need to quit almost all applications.
I think that with a computer of 32 GB of memory or more, it should be fine.

If I have time, I will test at work (computer with an SSD). I think it should reduce a lot the loading time.

With the new program, there won't be any issue with the number of sequences in the regina file. The limitation will be about memory of your computer.

I've done some testing about using std::map instead of std::vector. The avantage of std::map is that index=seqn.
It uses a little more memory 27 GB for 180M, but there is no speed gain.
But when quitting the application, releasing the map is time consuming vs instantaneous for the vector.
Drawback of the vector, it needs a single big block of memory.

I will try to finish the update this week.

2021-09-12, 13:41   #97
Aillas

Oct 2002
France

2×79 Posts

Quote:
 Originally Posted by EdH I will have to study what you have modified and see if I can learn any of your routines.
Concerning the primelist method, it's very slow due to an accumulation of think:

1- The first think it's to not use what C++ is offering as efficient and simple tool.
If you are new to C++, it's normal. Since C++ 11, the STL offers a lot of "container" and algorithm. It takes time to learn and just know what the STL can provide.
Instead of using a 'C' array, you could use, depending of your needs, vector,map, set,...
The basic container that replace C array is the vector.

Using a vector instead of an array (compiling for x64 target) will remove your memory limitation.

2- Based on 1, what is very time consuming in your algo:
It's the usage of your "double" array for storing the primes and the number of these primes in the regina file.
For each prime, you sequentially search this prime in the first array. To simplify, If exist 100 000 unique prime and that each line is a prime, the program will do 18 m * 100 000 iteration = 1 Trillion iteration :)
This is just for generating an unsorted array.
After the program is doing a "buble"sort. This is the easiest one to implement, but the slower. Bubble Sort complexity is O(n^2).

I replace these 2 steps by a very efficient one by using a basic C++ STL container for this operation: std::map
The key of the map is the prime number, the data associated to this key is the counter. And I don't even need to test or search if the key already exist.

Code:
primeCountMap[line.elD] += 1;
There are multi method to access data in a map. But the operator [] is 'special'. If the key exists, it returns the data, but if the key doesn't exist, it creates it.
This is exactly what we need. doing += 1 increment the existing data or the new created data.
Counting the primes is done in a single pass.
Complexity of [] is logarithmic in the size of the container.

3 - Another thing that is also time consuming
In seqinfo2, the prime numbers (column elD) is stored in a string. Converting, and comparing string is very slow vs comparing int.
But prime could not be stored in a 'int' or 'unsigned int 32 bit'. (Max for UINT32 == 4 294 967 295 ~ 4 billion)

I've change also this part using an uint64_t: Max ==18 446 744 073 709 551 615.. Heuu..I let you write this one:)

This is enough for seqinfo.

And as I don't have any more memory issue, I've also merge the 2 structs to have a single struct that match regina file.

If you have question, don't hesitate...

 2021-09-12, 14:06 #98 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 3×372 Posts @Aillas: This is all excellent! Thank you for the explanations and introducing me to some new C/C++ stuff. I had looked at vector, but not map, when I was trying to increase the capability. I also knew the sort I used was very inefficient. As you saw, I couldn't figure out how to deal with larger numbers without making use of GMP. I was concerned uint64_t might not be large enough, with only 20 decimal digits. But, currently the largest ending prime only has 14. Is uint64_t larger than 20 digits? What if a larger terminating prime happens to turn up?
2021-09-14, 16:20   #99
Aillas

Oct 2002
France

2·79 Posts

Quote:
 Originally Posted by EdH @Aillas: This is all excellent! Thank you for the explanations and introducing me to some new C/C++ stuff. I had looked at vector, but not map, when I was trying to increase the capability. I also knew the sort I used was very inefficient. As you saw, I couldn't figure out how to deal with larger numbers without making use of GMP. I was concerned uint64_t might not be large enough, with only 20 decimal digits. But, currently the largest ending prime only has 14. Is uint64_t larger than 20 digits? What if a larger terminating prime happens to turn up?
if a larger terminating prime happens, using uint64_t, seqinfo will probably crash.
Subject to monitor with new regina_file. I can do a simple check app to verify this point.

For test purpose only, I could try to switch from uint64_t to (long) double. Need to be careful on conversion and comparison functions.

 Similar Threads Thread Thread Starter Forum Replies Last Post schickel Aliquot Sequences 3621 2021-11-25 23:11 10metreh Aliquot Sequences 4 2021-10-28 22:17 arbooker Aliquot Sequences 11 2021-09-10 05:28 garambois Aliquot Sequences 24 2021-02-25 23:31 schickel Aliquot Sequences 67 2012-01-20 17:53

All times are UTC. The time now is 09:05.

Tue Nov 30 09:05:27 UTC 2021 up 130 days, 3:34, 0 users, load averages: 0.87, 1.03, 0.99