mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Data > Marin's Mersenne-aries

Closed Thread
 
Thread Tools
Old 2019-01-02, 07:53   #859
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

3×977 Posts
Default

I find manually distributing lists to be a bit too much work.

If my systems were dedicated to DC range TC, I'd probably get through 5+ a day. At the LL wavefront it would be closer to 2 or 3. How quickly do they come in?

As far as query tuning, I like your scoring idea. I'd make a new column on the exponents table (assuming there is one). When a matching LL or PRP result is submitted, set the score to 0.0. If a mismatch comes in, have the server set the score to 1.0. Anything otherwise interesting can have its score set with a manual update query.

Then when someone's cpu, who has the "interesting" checkbox set, requests an exponent, a simple index on (score desc, exponent asc) can make the `select ... from exponents where score > 0 and exponent > $cpu_min_elligible_exponent order by score desc, exponent asc` query a fast index scan.

I should find out if there's a free version of SQL Server so I can play with it. I really enjoy optimizing SQL. Edit: there is a free version, but it's limited to only 10 GB, so not sufficient to store a replica of all the primer data. Could still be used for expermients though.

Last fiddled with by Mark Rose on 2019-01-02 at 07:58
Mark Rose is offline  
Old 2019-01-02, 15:53   #860
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

37×89 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
... there is a free version, but it's limited to only 10 GB, so not sufficient to store a replica of all the primer data. Could still be used for expermients though.
You'd be amazed what you could fit into 10GB regarding GIMPS data. Primenet's largest table, by far, is the one that holds the results entries from each submission (about 30 GB, plus 24GB more for indices). The other tables that just hold the data itself like exponent, result status (verified, unverified, bad, etc) along with other "minor" things like the date, user id, cpu id, etc.... it's relatively compact.
For the LL results, it's currently around 280 MB (although the indices on it are nearly twice that, for performance).

Another somewhat larger table is the one that holds factors... lots of exponents under 1e9 with factors (and many with multiple), and those factors are stored as varchar since they can get ginormous. So that one is around 2.4 GB (1GB for indices).

The performance part of picking out exponents needing triple-checks is slow because of the aggregate/counting that has to happen during the query.

In essence it looks like (in a very simple form):
Code:
select distinct exponent from ll_results where result_state=unverified group by exponent having count(exponent) > 1
Making something like that perform well is the trick. Up front that query finishes in milliseconds, but in reality it takes me about 8 seconds to query right now, because I also have to join to the assignments table to get info on which of those are already assigned/available, and also join to a "factoring effort" table that tracks the TF / P-1 factoring for each of them to see if any of those could use extra factoring first. Even if it doesn't apply at all (most of these exponents have been factored adequately), that still goes into creating the worktodo entry, where it includes the current TF bit level.

Anyway... something to think about for me... some type of exponent "score" system...
Madpoo is offline  
Old 2019-01-02, 17:51   #861
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

3×977 Posts
Default

Quote:
Originally Posted by Madpoo View Post
You'd be amazed what you could fit into 10GB regarding GIMPS data. Primenet's largest table, by far, is the one that holds the results entries from each submission (about 30 GB, plus 24GB more for indices). The other tables that just hold the data itself like exponent, result status (verified, unverified, bad, etc) along with other "minor" things like the date, user id, cpu id, etc.... it's relatively compact.
For the LL results, it's currently around 280 MB (although the indices on it are nearly twice that, for performance).

Another somewhat larger table is the one that holds factors... lots of exponents under 1e9 with factors (and many with multiple), and those factors are stored as varchar since they can get ginormous. So that one is around 2.4 GB (1GB for indices).
Still much more than 10 GB. :-)

Quote:
The performance part of picking out exponents needing triple-checks is slow because of the aggregate/counting that has to happen during the query.

In essence it looks like (in a very simple form):
Code:
select distinct exponent from ll_results where result_state=unverified group by exponent having count(exponent) > 1
Making something like that perform well is the trick. Up front that query finishes in milliseconds, but in reality it takes me about 8 seconds to query right now, because I also have to join to the assignments table to get info on which of those are already assigned/available, and also join to a "factoring effort" table that tracks the TF / P-1 factoring for each of them to see if any of those could use extra factoring first. Even if it doesn't apply at all (most of these exponents have been factored adequately), that still goes into creating the worktodo entry, where it includes the current TF bit level.
Can you paste the full query? Chances are it can be rewritten to be much faster, especially since the query you pasted above runs in milliseconds.
Mark Rose is offline  
Old 2019-01-02, 22:42   #862
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

1100110111012 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
Can you paste the full query? Chances are it can be rewritten to be much faster, especially since the query you pasted above runs in milliseconds.
I actually got to thinking about it earlier and managed to strip out some weird stuff I had in there previously so now it runs in about a second.

I had some old junk in there before to look at any current assignments and tell me how far along they were, when they had last checked in, etc. When my workers were starving for things to do, I did poach some assignments for triple-checks if that assignment hadn't reported in for a month, or whatever. Bear in mind that quite a few of these triple-checks were for larger exponents that where maybe one result was suspicious, so it was done again as a "first time check" which mismatched, and maybe it was previously assigned but had expired, so that expired assignment got converted to a live double (really a triple) check. I didn't feel bad about poaching assignments that hadn't reported in for months or even years.

I also streamlined a hideous "where" clause I was using and turned it into a custom join which really helps (I'm not matching huge datasets that I whittle down with a WHERE anyway). When I'm doing these types of queries for my own use I don't typically bother to optimize much unless it's really horrible. So yeah, went from 8 seconds to 1 second, and there went my lunch break. LOL

I even took another look at a different query that looks for self-verified work and got that running in about half the time... it shouldn't have been taking as long as it did (only returns a pair of results now anyway) but I was sloppy when I wrote it so it was taking twice as long as it should have. That's trickier since it's doing a join on the same table but has to look for cases where the user is the same but a different shift and/or date for the result, and those are the *only* two used to verify. Kind of a pain, but only takes 4 seconds now, and it's a pretty big table it's working with so I'm happy with it.

(PS - that other query takes 4 seconds because based on the self-verified work it finds, I'm looking up the factoring effort done on it to help me generate a valid worktodo entry - it's surprising how many people have double-checked their own first-time tests and haven't done a single bit of P-1 on it)

Last fiddled with by Madpoo on 2019-01-02 at 22:48
Madpoo is offline  
Old 2019-01-02, 23:18   #863
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

25·13·23 Posts
Default

Anyway you can make a super-secret link for some mods to run the query, say once a week. We could keep the list up to date.
Uncwilly is online now  
Old 2019-01-02, 23:28   #864
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

3×977 Posts
Default

So without the query to modify, I'll have to show a more generic approach. But you can do something like this:

Code:
select exponent, ...
from (
    select distinct exponent 
    from ll_results 
    where result_state=unverified 
    group by exponent
    having count(exponent) > 1
) as needs_tc
join factoring_effort on needs_tc.exponent = factoring_effort.exponent
left join assignments on needs_tc.exponent = assignments.exponent
  and assignments.assignment_id = (
    select max(assignment_id)
    from assignments
    where exponent = needs_tc.exponent
      and expired = null
  )
where factoring_effort.bits > magic_formula_or_case_statement
  and assignments.assignment_id is null
order by exponent

-- assignments should have an index on (exponent, expired, assignment_id)
The advantage being the subquery should only produce a small number of rows, which is fast, and only then join the other tables for a second layer of filtering. If the joins are efficient, the whole thing should run in milliseconds.
Mark Rose is offline  
Old 2019-01-03, 00:28   #865
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

256616 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
The advantage being the subquery should only produce a small number of rows, which is fast, and only then join the other tables for a second layer of filtering. If the joins are efficient, the whole thing should run in milliseconds.
+1!

"Subselects" / "Subqueries" are a very powerful tool in a SQL geeks' tool-belt!

Joins are sometimes a*b in size, if not constrained....
chalsall is offline  
Old 2019-01-03, 02:04   #866
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

3×977 Posts
Default

I realize now that the subquery predicate on the assignments join can be dropped completely: the null check in the outside where takes care of it.
Mark Rose is offline  
Old 2019-01-03, 03:39   #867
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

2·4,787 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
I realize now that the subquery predicate on the assignments join can be dropped completely: the null check in the outside where takes care of it.
Some won't get that, even in this rarefied expertise space....
chalsall is offline  
Old 2019-01-03, 17:51   #868
irowiki
 
Sep 2018

4516 Posts
Default

I also would love a way to just assign machines to Triple checks if needed!
irowiki is offline  
Old 2019-01-04, 15:02   #869
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

25·13·23 Posts
Default

So I took this one:
DoubleCheck=47168477,73,1

And the 2009 result was ok, the 2018 result was bad. That is a rarity for me.
Uncwilly is online now  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Double checks casmith789 PrimeNet 7 2015-05-26 00:53
Help doing some quadrup1e+ checks Madpoo Data 28 2015-04-06 17:01
Double checks Rastus Data 1 2003-12-19 18:20
How do I get rid of the Triple Checks?? outlnder Lounge 4 2003-04-07 18:06
Double-checks come in pairs? BigRed Software 1 2002-10-20 05:29

All times are UTC. The time now is 01:26.

Thu May 13 01:26:14 UTC 2021 up 34 days, 20:07, 0 users, load averages: 2.30, 2.17, 2.10

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.