mersenneforum.org A Tool for Studying Aliquot Sequence Data
 Register FAQ Search Today's Posts Mark Forums Read

 2021-04-14, 12:44 #23 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 23×3×5×31 Posts Thanks Jean-Luc! This will give me some direction to head in. Some seem that they will be easy to implement, others might be challenging. I've tentatively added a place holder for a routine to create an update file that can be be used for just catching new terminations and merges, but since it would need to run all open-ended sequences and that is probably how you update regina_file, I haven't created that yet. How often do you update the existing file? Would the update feature be of use? My intention was to create a second file, leaving regina_file alone. Then, the extra would update the arrays in the program.
2021-04-14, 14:27   #24
garambois

"Garambois Jean-Luc"
Oct 2011
France

10010010012 Posts

Quote:
 Originally Posted by EdH Thanks Jean-Luc! This will give me some direction to head in. Some seem that they will be easy to implement, others might be challenging.
Let's be careful, I don't know if my ideas will work.
Some of my ideas must be very naive, I am not a professional mathematician !
I work more with my intuition than with my reason.
You have seen that I have stated conjectures that can be proved in one line or that were already known.
I hope that some of my ideas will be valid if you orient your work to what I have said, but we have no guarantee of that !
But maybe also that at the end of the work, we could find a new "not naive" conjecture ;-) !

Quote:
 Originally Posted by EdH I've tentatively added a place holder for a routine to create an update file that can be be used for just catching new terminations and merges, but since it would need to run all open-ended sequences and that is probably how you update regina_file, I haven't created that yet. How often do you update the existing file? Would the update feature be of use? My intention was to create a second file, leaving regina_file alone. Then, the extra would update the arrays in the program.
How often do I update the existing file ?
That is the whole problem.
To complete regina_file from 1 to 1e6, it took a few hours.
To complete it from 14e6 to 15e6, it takes months.
What takes time in the program is to search for the numbers in the tables which become huge !
I don't change the lines that are already there.
However, if you want to update all the existing lines by scanning the Open-end sequences on FactorDB, that's great and I'll be very interested in this completed "regina_file".
But I think it would be quite complicated to be able to modify all 14 variables in each line without making an error !
I don't know if I understood correctly, is that what you want to do ?
It seems to me very complicated !
Unless you don't modify all 14 variables ?

2021-04-14, 15:07   #25
EdH

"Ed Hall"
Dec 2009

23·3·5·31 Posts

Quote:
 Originally Posted by garambois . . . But I think it would be quite complicated to be able to modify all 14 variables in each line without making an error ! I don't know if I understood correctly, is that what you want to do ? It seems to me very complicated ! Unless you don't modify all 14 variables ?
I'm not currently making use of all the elements in regina_file, but to have an accurate update file, I will need to include all elements that I do use, i.e. if I start using numbers of maximums and parity changes, those elements will need to be included.

Let's see if I can give a brief description of my thoughts:

1. Leave regina_file unchanged, but load the currently used elements.
2. Run through the sequences, looking only for open-ended (o-e) ones, since the others should not have changed.
3. If an o-e sequence has terminated or merged, write a line (of at least the currently used elements) to the update file.
4. Next time the program is run, after reading the original regina_file, if the update file exists, it is used to update those sequences within the program arrays.

This would give the program the most up to date data based on the most recent update.

I imagine the update file could be a reference for which elements of regina_file might need further work eventually, but how important is that currency? If only a handful of o-e sequences have changed, are the hours of running an update worth the gain.

As to whether to modify all the variables, based on how many sequences have changed since the reginal_file run, it may be better to include all elements. That way, if I add any reports, I wouldn't have to run update again to gather the rest (although running only those in the update file might be relatively quick). My problem there is I don't fully understand all of the elements, yet.

For now, I'll work with the current regina_file and leave the update issue simmering. . .

BTW, if you are simply appending your current expansion for the regina_file, the program should already be able to accept that expanded version, as long as the name remains regina_file. It shouldn't bother the original, but I'd use a copy, anyway instead of the current one just to be sure. (Maybe, if you have a backup from some point you could copy it into a unique directory with my program and try it. After it reads the new reginal_file, it should display the new counts.)

2021-04-15, 05:05   #26
Happy5214

"Alexander"
Nov 2008
The Alamo City

3×7×29 Posts

Quote:
 Originally Posted by garambois 2) A very simple way to visualize the data would also be to be able to launch a regina_file analysis by entering this in a program for example : [n%2==0, a==0, b, c, d, e==0, f, g, h, i, j, 1.7
If you're going to do queries like that, you may want to convert the data into an SQLite database (no telling how big it would be, but it would likely be bigger than the uncompressed regina_file) and add some indices to make querying faster. I wonder if bundling a conversion program for the user to run in a reasonable amount of time is feasible (SQLite is available as two files, a C source and header, so distribution of that library is easy) so they don't have to download an even bigger file.

2021-04-15, 17:08   #27
garambois

"Garambois Jean-Luc"
Oct 2011
France

11118 Posts

Quote:
 Originally Posted by EdH BTW, if you are simply appending your current expansion for the regina_file, the program should already be able to accept that expanded version, as long as the name remains regina_file. It shouldn't bother the original, but I'd use a copy, anyway instead of the current one just to be sure. (Maybe, if you have a backup from some point you could copy it into a unique directory with my program and try it. After it reads the new reginal_file, it should display the new counts.)
I don't know if I understand you correctly ?
Would it help you if I put the regina_file up to 14460000 online ?
This is where I am at the moment.
(I paused the calculations for a few days, as I am perfecting my program for calculating sequences : I want to be able to use all threads simultaneously for the ecm and NFS methods.)

2021-04-15, 18:01   #28
EdH

"Ed Hall"
Dec 2009

372010 Posts

Quote:
 Originally Posted by Happy5214 If you're going to do queries like that, you may want to convert the data into an SQLite database (no telling how big it would be, but it would likely be bigger than the uncompressed regina_file) and add some indices to make querying faster. I wonder if bundling a conversion program for the user to run in a reasonable amount of time is feasible (SQLite is available as two files, a C source and header, so distribution of that library is easy) so they don't have to download an even bigger file.
I've never been able to figure out how to use sqlite. I suppose that's something I should do sometime. I think I can implement much of the suggestions in some form, however I can't grasp the concept of how to display the factor chain for all the composite sequence first terms.

Quote:
 Originally Posted by garambois I don't know if I understand you correctly ? Would it help you if I put the regina_file up to 14460000 online ? This is where I am at the moment. (I paused the calculations for a few days, as I am perfecting my program for calculating sequences : I want to be able to use all threads simultaneously for the ecm and NFS methods.)
I was merely suggesting that the program should already work with your current file, but a copy might be better, if your program that is adding sequences has the file open for processing. I don't think I need the newer one at this point, but in testing my program, you should be able to use the newer one.

2021-04-16, 17:27   #29
garambois

"Garambois Jean-Luc"
Oct 2011
France

32×5×13 Posts

Quote:
 Originally Posted by EdH I was merely suggesting that the program should already work with your current file, but a copy might be better, if your program that is adding sequences has the file open for processing. I don't think I need the newer one at this point, but in testing my program, you should be able to use the newer one.

OK, great : the program works perfectly with the file up to 1446e4 !

2021-04-16, 21:52   #30
EdH

"Ed Hall"
Dec 2009

E8816 Posts

Quote:
 Originally Posted by garambois OK, great : the program works perfectly with the file up to 1446e4 !
Excellent! Thanks for letting me know.

 2021-04-19, 08:15 #31 kar_bon     Mar 2006 Germany 1011010010012 Posts As usual I'm using (g)awk for processing text-files and here's my solution for handling the regina_file. First I delete all brackets and spaces from the original regina_file by calling (here under WIN): Code: gawk -f list.awk regina_file and using the "list.awk" source with Code: BEGIN { x=1 } { line=$0 sub(/\[/,"",$0) sub(/]/,"",$0) gsub(/ /,"",$0) print $0 >"regina_file_new" x++ } END { print "last: "x } This is running ~2 minutes to process all 14 mill. entries and produces a ~200 MByte smaller file. For an example of running a query you have to split every line into an array and comparing with the wanted values. Example: The query for "19560" could look like this: Code: BEGIN { x=1; found=0 } { line=$0 split(line,a,",") x++ if (a[1]%2==0) if (a[2]==0) if (a[6]==0) if ((a[12]>1.7) && (a[12]<2.3)) { found++ print found": "a[1] } } END { print "last: "x } Note: The query produces/finds many! lines with those parameters: you should output the results into a file and also bound the n-values (=a[1]). The example for "496" is like: Code: BEGIN { x=1; found=0 } { line=\$0 split(line,a,",") x++ if (a[4]==496) { found++ print found": "a[1] } } END { print "last: "x } This runs ~1.5 min and finds 45 values. Calling these by the fitted statements like Code: gawk -f 496.awk regina_file_new is no problem. I think many of such queries can be done by nested IF-statements like above examples. Disadvantage: Every query-run the whole regina_file_new has to be processed again and can not be saved into a big array. Note: To update or insert a new parameter to the original regina_file it's also possible to use awk here: a new parameter calculated from the given or any external file can do this easily without loading the whole file into a text editor or in any other matter.
 2021-04-19, 11:25 #32 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 23·3·5·31 Posts Thanks kar_bon, The one issue of constantly reading the file was the main driver for my move to reading it once and then working with the arrays. Right now, I'm working with a new idea that may force a complete rewrite of the source. But, I'm also looking at some additions to the current version. I do appreciate all thoughts.
 2021-04-19, 13:05 #33 garambois     "Garambois Jean-Luc" Oct 2011 France 24916 Posts Many thanks Karsten. I am rewriting the entire program to add the geometric means. This is the right time to change the format of the regina_file if necessary. Do you think I should drop the "[", and the "]" and the "," and replace only the "," with spaces ? So, in the new regina_file, this line : [2, 1, 1, 2, 1, 0, 1, 0, 0, 0, 0, 0.5000000000, 0.5000000000, 0.5000000000] would be replaced by this one : 2 1 1 2 1 0 1 0 0 0 0 0.5000000000 0.5000000000 0.5000000000 0.5000000000 The fifteenth value is the geometric mean.

 Similar Threads Thread Thread Starter Forum Replies Last Post schickel Aliquot Sequences 3465 2021-05-17 20:45 garambois Aliquot Sequences 24 2021-02-25 23:31 arbooker Aliquot Sequences 5 2020-11-07 15:58 schickel Aliquot Sequences 67 2012-01-20 17:53 10metreh Aliquot Sequences 2 2009-07-31 17:43

All times are UTC. The time now is 18:09.

Tue May 18 18:09:32 UTC 2021 up 40 days, 12:50, 0 users, load averages: 2.38, 2.16, 2.07