20210910, 14:05  #89  
"Ed Hall"
Dec 2009
Adirondack Mtns
4192_{10} Posts 
Quote:
The sequences run from 2 upwards to whatever JeanLuc has stopped at for the current iteration, usually an even M, i.e. 18M currently. (218,000,000)  218,000,000 only includes 17,999,999 entries, because it starts with 2 rather than 1. If you add up the current numbers for primes, composites and cycles, you should get 17,999,999.  Because I used a 0based array and used the element number for the sequence, even though it will have one less sequence, I needed to account for M+1 elements. In actuality, I was basing the seqinfo2 code on 20M, so I used: Code:
static struct seqdata seqd[20000001] BTW, my current run for primes has been sitting at 99% since I first checked it this morning. It still may be a while. 

20210910, 17:17  #90 
"Garambois JeanLuc"
Oct 2011
France
3^{2}·79 Posts 
I confirm, there can't be a hole in column A.
And yes there is always one less entry than the limit I announce, as Edwin explains : Limit to 18,000,000 and therefore effectively 17,999,999 entries currently, since we start at 2. This weekend I will be able to publish the Regina file up to 22,000,000. Do you want me to do this or do you still want to continue working with the file up to 18,000,000 and wait until later when I publish the file at 25 or 30 million ? 
20210910, 17:46  #91 
"Ed Hall"
Dec 2009
Adirondack Mtns
1060_{16} Posts 
seqinfo2 that I posted will only handle 20M at present, but I have tested it, and you can increase it to 22M by simply changing three instances of 2000001 to 2200001 in the source and recompiling it. I tested 25M and it did not compile. Since the version being worked on by Aillas should handle the larger reginafiles (and more speedily), I don't know if I'll do more with mine.

20210910, 18:08  #92 
"Ed Hall"
Dec 2009
Adirondack Mtns
2^{5}·131 Posts 
The seqinfo2 primes counting function finally completed:
Code:
$ ./seqinfo2 Data available for sequences 2 through 18000000 Sequence endings  prime: 13623702, cycle: 260439, open: 4115858 Enter sequence (##/a/h/p/p##/q/u): p This process will take several hours. Continue? (y/n):y 1151629 unique primes found! Listing took 100381 seconds to generate. Enter sequence (##/a/h/p/p##/q/u): The largest terminating prime was only 14604141802777 (14 digits), which shows as occurring 23 times. Is it odd that the largest terminating prime would be so small?: Code:
Enter sequence (##/a/h/p/p##/q/u): p14604141802777 List all sequences that terminate with 14604141802777? (y/n/c/f): y 1923540 2967858 3462540 6232740 8361636 11070756 11079640 11470428 11788640 11792880 12125052 13462620 13896164 14362840 14772844 15091100 16062400 16250036 16560672 16839044 17693060 17849500 17850500 23 sequences found. Perform Advanced Filtering on these results? (y/n): 
20210910, 22:38  #93 
Oct 2002
France
3×53 Posts 
Thanks for the info and the file.
I did a file comparison and the results are identical. That's a good news, that means I didn't broke something :) Here are the logs for the new version (with debug outputs I will remove later): Code:
./seqinfo Verifying regina_file... Done successfully Reserving a regina vector of 20000000 elements (capacity: 20000000) Loading regina_file...Loading regina file took 102 seconds. Done successfully Data available for sequences 2 through 18000000 Sequence endings  prime: 13623702, cycle: 260439, open: 4115858 Shrink regina vector to 17999999 elements Enter sequence (##/a/h/p/p##/q/u): p This process will take several hours. Continue? (y/n):y Progress: 100% 1151629 unique primes found! Listing took 6 seconds to generate. Enter sequence (##/a/h/p/p##/q/u): p14604141802777 List all sequences that terminate with 14604141802777? (y/n/c/f): y 1923540 2967858 3462540 6232740 8361636 11070756 11079640 11470428 11788640 11792880 12125052 13462620 13896164 14362840 14772844 15091100 16062400 16250036 16560672 16839044 17693060 17849500 17850500 23 sequences found. Perform Advanced Filtering on these results? (y/n): n Enter sequence (##/a/h/p/p##/q/u): But I was not expecting a such drastic reduction in file generation time. I'm using a core i7 3,5 GHz single threaded (2013) I hope it will help you and garambois in sequences analysis. Last fiddled with by Aillas on 20210910 at 22:41 Reason: Missing information 
20210910, 23:56  #94 
"Ed Hall"
Dec 2009
Adirondack Mtns
10140_{8} Posts 
It's fantastic! Thank you for all the help. This will definitely shorten some of the analysis times. Specifically, the primes count took too long to do often, but now it's no big deal. I will have to study what you have modified and see if I can learn any of your routines.
How easy will it be to allow for 30M? 
20210912, 09:53  #95 
"Garambois JeanLuc"
Oct 2011
France
1307_{8} Posts 
6 seconds VS 100381 seconds !
This is incredible ! Thanks to you for developing such powerful data analysis tools ! On my side, I have published the four files for the 22,000,000 limit (regina, regina_cycles, regina_opens, regina_prems). Edwin, your program does indeed still work normally with this new limit above 20 million. I will republish when my computer reaches the 30 million limit. 
20210912, 12:56  #96 
Oct 2002
France
3×53 Posts 
For test purpose only, I've generated 2 regina files, based on the original regina file.
For the first one, I've duplicated each line 4 times, generating a file of 72 millions entries. For the second one, I've duplicated each line 10 times, generating a file of 180 millions entries. each lines has a proper index... My hardware: Core i7 3,5 Ghz /24 GB of memory and with an hard drive, no SSD (< Important:)) Here are the results: Code:
++  Regina  Load file  Prime List  Memory  +++  18 M  98s ( ~1m30s)  ~5 sec  ~2 GB  +++  72 M  400s ( ~7 min)  ~19 sec  ~12 GB  +++  180 M  1130s (~19 min)  ~38 sec  ~27 GB  +++ I think that with a computer of 32 GB of memory or more, it should be fine. If I have time, I will test at work (computer with an SSD). I think it should reduce a lot the loading time. With the new program, there won't be any issue with the number of sequences in the regina file. The limitation will be about memory of your computer. I've done some testing about using std::map instead of std::vector. The avantage of std::map is that index=seqn. It uses a little more memory 27 GB for 180M, but there is no speed gain. But when quitting the application, releasing the map is time consuming vs instantaneous for the vector. Drawback of the vector, it needs a single big block of memory. I will try to finish the update this week. 
20210912, 13:41  #97  
Oct 2002
France
237_{8} Posts 
Quote:
1 The first think it's to not use what C++ is offering as efficient and simple tool. If you are new to C++, it's normal. Since C++ 11, the STL offers a lot of "container" and algorithm. It takes time to learn and just know what the STL can provide. Instead of using a 'C' array, you could use, depending of your needs, vector,map, set,... The basic container that replace C array is the vector. Using a vector instead of an array (compiling for x64 target) will remove your memory limitation. 2 Based on 1, what is very time consuming in your algo: It's the usage of your "double" array for storing the primes and the number of these primes in the regina file. For each prime, you sequentially search this prime in the first array. To simplify, If exist 100 000 unique prime and that each line is a prime, the program will do 18 m * 100 000 iteration = 1 Trillion iteration :) This is just for generating an unsorted array. After the program is doing a "buble"sort. This is the easiest one to implement, but the slower. Bubble Sort complexity is O(n^2). I replace these 2 steps by a very efficient one by using a basic C++ STL container for this operation: std::map The key of the map is the prime number, the data associated to this key is the counter. And I don't even need to test or search if the key already exist. Code:
primeCountMap[line.elD] += 1; This is exactly what we need. doing += 1 increment the existing data or the new created data. Counting the primes is done in a single pass. Complexity of [] is logarithmic in the size of the container. 3  Another thing that is also time consuming In seqinfo2, the prime numbers (column elD) is stored in a string. Converting, and comparing string is very slow vs comparing int. But prime could not be stored in a 'int' or 'unsigned int 32 bit'. (Max for UINT32 == 4 294 967 295 ~ 4 billion) I've change also this part using an uint64_t: Max ==18 446 744 073 709 551 615.. Heuu..I let you write this one:) This is enough for seqinfo. And as I don't have any more memory issue, I've also merge the 2 structs to have a single struct that match regina file. If you have question, don't hesitate... 

20210912, 14:06  #98 
"Ed Hall"
Dec 2009
Adirondack Mtns
2^{5}·131 Posts 
@Aillas: This is all excellent!
Thank you for the explanations and introducing me to some new C/C++ stuff. I had looked at vector, but not map, when I was trying to increase the capability. I also knew the sort I used was very inefficient. As you saw, I couldn't figure out how to deal with larger numbers without making use of GMP. I was concerned uint64_t might not be large enough, with only 20 decimal digits. But, currently the largest ending prime only has 14. Is uint64_t larger than 20 digits? What if a larger terminating prime happens to turn up? 
20210914, 16:20  #99  
Oct 2002
France
10011111_{2} Posts 
Quote:
Subject to monitor with new regina_file. I can do a simple check app to verify this point. For test purpose only, I could try to switch from uint64_t to (long) double. Need to be careful on conversion and comparison functions. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Aliquot sequence reservations  schickel  Aliquot Sequences  3641  20220121 13:47 
Extending an aliquot sequence backwards  arbooker  Aliquot Sequences  13  20220113 08:04 
Useful aliquotsequence links  10metreh  Aliquot Sequences  4  20211028 22:17 
A new tool to identify aliquot sequence margins and acquisitions  garambois  Aliquot Sequences  24  20210225 23:31 
Another Aliquot Sequence site  schickel  Aliquot Sequences  67  20120120 17:53 