mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Aliquot Sequences (https://www.mersenneforum.org/forumdisplay.php?f=90)
-   -   The Appallingly Blue Page (https://www.mersenneforum.org/showthread.php?t=16965)

Dubslow 2012-07-10 07:28

The Appallingly Blue Page
 
Hey everybody. :max:

I'm rather pleased to announce that I have created a spider to automatically update Aliquot sequence statuses, and output it in a really pretty HTML table.

[url]http://.../aliquot/AllSeq.html[/url] [COLOR=Red]EDIT: outdated
[/COLOR]
[URL]http://www.rechenkraft.net/aliquot/AllSeq.html[/URL]

Of course, the reason it's not been done before is strain on the FactorDB. Thus my spider is run once per hour, updates 55 sequences, then exits, saving its location for the next hour. This means that each sequence gets updated once over the course of [URL="http://www.wolframalpha.com/input/?i=9229%2F55%20hours"]a week[/URL], at which point it restarts at the beginning and does it again. The idea, of course, is that no sequence is more than a week out of date.

In addition, not only can I update sequences, I can also check reservations here on MersenneForum (and the page even shows the last update to the reservations). That's why I [URL="http://www.mersenneforum.org/showthread.php?p=303919#post303919"]asked about subproject reservations[/URL]. For the moment then, the table only shows reservations from the main reservation thread, but there are two ways to fix that: 1) Put all subproject reservations in the lead post of the main reservation thread, or (probably easier for teh mods) 2) format the leading subproject posts in the same manner as the main thread lead post.

And finally, though certainly not least, I found an excellent project called [URL="http://datatables.net/index"]DataTables[/URL] that uses JavaScript to allow for efficient sorting and searching of large tables. There's more detail on the page, so I encourage you to read it.

There are two major weaknesses: first, I can't do much about FDB sequence errors, though if you look there is a list of known wrong sequences. Second, this of course requires workers to upload their work to the FDB, though AFAICT this has essentially become the norm. I believe Paul Zimmermann does it semi-regularly, and though Christophe Clavier doesn't update the FDB himself, he does make .elf files available for all his sequences. I created a spider that runs weekly to (as necessary) upload his work to the FDB. Thus the only major work that might be missing is Clifford Stern's, though it's certainly possible that he is still updating the FDB.

To go along with it, I've created a basic [URL="http://dubslow.tk/aliquot/statistics.html"]statistics page[/URL] with very simple stats; it looks kind of silly now, but that's still under construction.

I don't have much left I can think of to add; the last thing on my list was to add the very simple code to pick out drivers and display that just in front of the main "Factors" column. I welcome any suggestions.


What do you think? :whistle:

:grin:

kar_bon 2012-07-10 08:02

Here's any idea to think of:

I don't know how you download the last line of a sequence (perhaps like [url=http://factordb.com/sequences.php?se=1&aq=434040&action=last]this[/url]).

If so, you could run a comparison over all open seqs to find merges = last line same.
Although the timeframe all seqs are downloaded (55 per hour) is big, not all merges would be found if someone worked on such seq. in the meanwhile.

This feature is not yet implemented in the FactorDB as it was long time ago.

The FDB restriction are the thing which let me gave up to update my pages:
I've done such last-line-download (on a quad core with 4 threads took about an hour) and made my pages via script, finding merges and new terminations in seconds.

Dubslow 2012-07-10 08:18

It's fairly easy to query and pause, query and pause like I've done.

I do use "action=last", but as it is the spider doesn't record the actual value/id of the large numbers, just their lengths. It would be fairly easy to modify, but I'd only want to run such a merge-finder once a week, after each complete refresh. Now that I think about it though, such a thing would be easier than I was thinking 15 minutes ago. (When you mentioned it, I was initially thinking like in the "Genealogy" thread, which requires much more logic. :razz:)

I actually got the idea for the stats page from your website. :razz: I would love to help you in any way possible. My script is ~200+ lines of Python (plus a few hundred more lines of HTML templates). How did you update your site?

kar_bon 2012-07-10 08:33

- First made a file with all open seqs
- creating 4 files downloading all last lines with wget
- downloading in 4 threads all last lines in 4 folders
- processing the last-line-files with awk script to get lines like this (old ones):
[code]
276 U 1687. 3678759348...6<165> = 2 * 3^2 * 7^2 * 53 * 7869677296...1<160>
552 U 1057. 4238228081...6<179> = 2^2 * 3 * 71 * 145633 * 3415741009...1<171>
564 U 3357. 2239382335...8<172> = 2^2 * 7 * 31 * 103 * 6211 * 26557 * 1499962302625458296587675861761081389<37> * 1012395977...9<123>
660 U 890. 2345292265...0<181> = 2^3 * 3^2 * 5 * 6514700736...9<178>
966 U 893. 8491715927...0<178> = 2^2 * 3^2 * 5 * 83 * 2099 * 2707898746...9<171>
[/code]

Here 'U' stands for 'Unchecked' in the FactorDB. 'P' would be prime so terminated.

- running another awk-script to do the html-pages (reservations were read from another file)
- running awk-script to make stats like this:
[code]
Counting OES per 100k-ranges:
000k 100k 200k 300k 400k 500k 600k 700k 800k 900k
902 953 918 855 889 951 939 927 959 960 9253

Counting OES-lengths:
000k 100k 200k 300k 400k 500k 600k 700k 800k 900k
1323644 1342120 1250274 1121133 1209676 1283284 1238300 1262357 1272105 1257328 12560221 1357.421

Counting OES-sizes:
000k 100k 200k 300k 400k 500k 600k 700k 800k 900k
108573 110816 105032 97341 100646 108090 105993 105027 108294 108318 1058130 114.355
[/code]

- running a sort-tool (CMsort) for finding merges and terminations
- running awk-script for small queries:
[code]
type=1: all Seqs of range r1 to r2
type=2: all Seqs <400 lines (-> Project 3b)
type=3: all Seqs 150000<n<200000, index<110 (-> Project 9)
type=4: all Seqs 100000<n<150000, index<110 (-> Project 7)
type=5: all Seqs length<100 digits of last index
type=6: all Seqs length<100 of composite
[/code]

So the whole work was done in a little bit more than 1 hour, all data were 'just in time'.

henryzz 2012-07-10 21:52

Nice page. I especially like the ways of sorting the table. :smile:
Not certain about the colour scheme. It seems a bit in your face to me.

Dubslow 2012-07-10 22:17

[QUOTE=henryzz;304426]Nice page. I especially like the ways of sorting the table. :smile:
Not certain about the colour scheme. It seems a bit in your face to me.[/QUOTE]

I like blue :razz:

I'd be more open to suggestions for the table background than the page bg, but I'll listen to anything.

Edit: Why do the [URL="http://ss64.com/bash/crontab.html"]'e' and 'r'[/URL] keys have to be right next to each other? :cry:

Dubslow 2012-07-12 06:31

Okay, I've now added a 'Driver' [URL="http://dubslow.tk/aliquot/AllSeq.html"]column[/URL].

I have quite a few questions about what you guys want.

--Primarily, though it's called the 'Driver' column, it also lists the [URL="http://www.mersenneforum.org/showpost.php?p=174259&postcount=20"]guides[/URL], as well as defaulting to listing the current power of two if no driver or guide is found.

*Should it keep this behavior?

*Should it not display guides/powers of two?

*Should it display any powers of the non-two factors (e.g. for seeing if a driver is escapable)?

*Should I get rid of the 'Factors' column altogether?


--Additionally, as kar_bon suggested, the script now tracks the FDB ID of the last line, and once a week I run a script to check for any duplicates, i.e. merges. Also like he mentioned, the major flaw is that if a merged pair is updated between when I get the first branch and get the second branch, then the merge won't be detected. (It will take a week for the ID list to be fully populated.) (If you look hard enough, the ID is available for each sequence.)

--Would anybody want for each row/sequence to be a link to its status page?

LaurV 2012-07-12 15:04

Very nice job with that table! I almost forgive you for those colors :razz:

Stupid question: how the reservations got into that table? you add them by hand? In any case, please add me to 4290 which I am nurturing since it was C126 (see [URL="http://www.mersenneforum.org/showpost.php?p=301301&postcount=21"]here[/URL] about it, if I will have some free CPU I will queue the C142)

Dubslow 2012-07-12 17:11

[QUOTE=LaurV;304530]Very nice job with that table! I almost forgive you for those colors :razz:[/quote]
Again, if anybody suggests a different scheme, I'd probably do it.
[QUOTE=LaurV;304530]
Stupid question: how the reservations got into that table? you add them by hand? In any case, please add me to 4290 which I am nurturing since it was C126 (see [URL="http://www.mersenneforum.org/showpost.php?p=301301&postcount=21"]here[/URL] about it, if I will have some free CPU I will queue the C142)[/QUOTE]

[QUOTE=Dubslow;304386]
In addition, not only can I update sequences, I can also check reservations here on MersenneForum (and the page even shows the last update to the reservations). That's why I [URL="http://www.mersenneforum.org/showthread.php?p=303919#post303919"]asked about subproject reservations[/URL]. For the moment then, the table only shows reservations from the main reservation thread, but there are two ways to fix that: 1) Put all subproject reservations in the lead post of the main reservation thread, or (probably easier for teh mods) 2) format the leading subproject posts in the same manner as the main thread lead post.[/QUOTE]
[code]def get_reservations():
reserves = {}
req = request.Request('http://www.mersenneforum.org/showpost.php?p=165249&postcount=1',
# This is the lead post of the main reservations thread
headers = {'User-Agent': 'Dubslow/AliquotSequences'} )
page = request.urlopen(req).read().decode('utf-8')
update = re.search(r'<!-- edit note -->.*Last fiddled with by [A-Za-z_0-9 -]+? on ([0-9a-zA-Z ]+) at <span class="time">([0-9:]{5})</span>', page, flags=re.DOTALL)
updated = update.group(1)+' '+update.group(2)
page = re.search(r'<pre.*?>(.*?)</pre>', page, flags=re.DOTALL).group(1)
for line in page.splitlines():
herp = re.match(r' {0,3}([0-9]{3,6}) ([0-9A-Za-z_ -]{1,16})', line)
try:
name = herp.group(2)
except: pass
else:
if 'jacobs and' in name:
name = 'jacobs and Richard Guy'
reserves[int(herp.group(1))] = name.strip()
return reserves, updated[/code]
You'll notice that the "(as of <date/time>)" note in the column header matches the date/time that the [URL="http://www.mersenneforum.org/showpost.php?p=165249&postcount=1"]lead post[/URL] of the main reservations thread was last edited at.

So if you want it to appear, ask the mods :razz: (Somehow my reservation of 484470 was missed in the last edit :razz:)

PS: Regarding merges, SM_88 directed me to [url=http://factordb.com/endings]this page[/url], about which I had no idea. Anybody could fairly easily check for merges with it.

kar_bon 2012-07-12 17:45

[QUOTE=Dubslow;304536]PS: Regarding merges, SM_88 directed me to [url=http://factordb.com/endings]this page[/url], about which I had no idea. Anybody could fairly easily check for merges with it.[/QUOTE]

This page with endings was done only once in March and contains still some error (missing lines and some doubled).
I've just downloaded that page and compared it with the one from March: no changes!
Syd said then, this page will not be updated because of too much DB-accesses.

Dubslow 2012-07-12 17:49

[QUOTE=kar_bon;304538]This page with endings was done only once in March and contains still some error (missing lines and some doubled).
I've just downloaded that page and compared it with the one from March: no changes!
Syd said then, this page will not be updated because of too much DB-accesses.[/QUOTE]

Ah... good thing I'm tracking the ID's then. :smile:


All times are UTC. The time now is 23:33.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.