mersenneforum.org > Data Trippple Checks
 Register FAQ Search Today's Posts Mark Forums Read

 2019-01-02, 07:53 #859 Mark Rose     "/X\(‘-‘)/X\" Jan 2013 3×977 Posts I find manually distributing lists to be a bit too much work. If my systems were dedicated to DC range TC, I'd probably get through 5+ a day. At the LL wavefront it would be closer to 2 or 3. How quickly do they come in? As far as query tuning, I like your scoring idea. I'd make a new column on the exponents table (assuming there is one). When a matching LL or PRP result is submitted, set the score to 0.0. If a mismatch comes in, have the server set the score to 1.0. Anything otherwise interesting can have its score set with a manual update query. Then when someone's cpu, who has the "interesting" checkbox set, requests an exponent, a simple index on (score desc, exponent asc) can make the select ... from exponents where score > 0 and exponent > \$cpu_min_elligible_exponent order by score desc, exponent asc query a fast index scan. I should find out if there's a free version of SQL Server so I can play with it. I really enjoy optimizing SQL. Edit: there is a free version, but it's limited to only 10 GB, so not sufficient to store a replica of all the primer data. Could still be used for expermients though. Last fiddled with by Mark Rose on 2019-01-02 at 07:58
2019-01-02, 15:53   #860
Serpentine Vermin Jar

Jul 2014

37×89 Posts

Quote:
 Originally Posted by Mark Rose ... there is a free version, but it's limited to only 10 GB, so not sufficient to store a replica of all the primer data. Could still be used for expermients though.
You'd be amazed what you could fit into 10GB regarding GIMPS data. Primenet's largest table, by far, is the one that holds the results entries from each submission (about 30 GB, plus 24GB more for indices). The other tables that just hold the data itself like exponent, result status (verified, unverified, bad, etc) along with other "minor" things like the date, user id, cpu id, etc.... it's relatively compact.
For the LL results, it's currently around 280 MB (although the indices on it are nearly twice that, for performance).

Another somewhat larger table is the one that holds factors... lots of exponents under 1e9 with factors (and many with multiple), and those factors are stored as varchar since they can get ginormous. So that one is around 2.4 GB (1GB for indices).

The performance part of picking out exponents needing triple-checks is slow because of the aggregate/counting that has to happen during the query.

In essence it looks like (in a very simple form):
Code:
select distinct exponent from ll_results where result_state=unverified group by exponent having count(exponent) > 1
Making something like that perform well is the trick. Up front that query finishes in milliseconds, but in reality it takes me about 8 seconds to query right now, because I also have to join to the assignments table to get info on which of those are already assigned/available, and also join to a "factoring effort" table that tracks the TF / P-1 factoring for each of them to see if any of those could use extra factoring first. Even if it doesn't apply at all (most of these exponents have been factored adequately), that still goes into creating the worktodo entry, where it includes the current TF bit level.

Anyway... something to think about for me... some type of exponent "score" system...

2019-01-02, 17:51   #861
Mark Rose

"/X\(‘-‘)/X\"
Jan 2013

3×977 Posts

Quote:
 Originally Posted by Madpoo You'd be amazed what you could fit into 10GB regarding GIMPS data. Primenet's largest table, by far, is the one that holds the results entries from each submission (about 30 GB, plus 24GB more for indices). The other tables that just hold the data itself like exponent, result status (verified, unverified, bad, etc) along with other "minor" things like the date, user id, cpu id, etc.... it's relatively compact. For the LL results, it's currently around 280 MB (although the indices on it are nearly twice that, for performance). Another somewhat larger table is the one that holds factors... lots of exponents under 1e9 with factors (and many with multiple), and those factors are stored as varchar since they can get ginormous. So that one is around 2.4 GB (1GB for indices).
Still much more than 10 GB. :-)

Quote:
 The performance part of picking out exponents needing triple-checks is slow because of the aggregate/counting that has to happen during the query. In essence it looks like (in a very simple form): Code: select distinct exponent from ll_results where result_state=unverified group by exponent having count(exponent) > 1 Making something like that perform well is the trick. Up front that query finishes in milliseconds, but in reality it takes me about 8 seconds to query right now, because I also have to join to the assignments table to get info on which of those are already assigned/available, and also join to a "factoring effort" table that tracks the TF / P-1 factoring for each of them to see if any of those could use extra factoring first. Even if it doesn't apply at all (most of these exponents have been factored adequately), that still goes into creating the worktodo entry, where it includes the current TF bit level.
Can you paste the full query? Chances are it can be rewritten to be much faster, especially since the query you pasted above runs in milliseconds.

2019-01-02, 22:42   #862
Serpentine Vermin Jar

Jul 2014

1100110111012 Posts

Quote:
 Originally Posted by Mark Rose Can you paste the full query? Chances are it can be rewritten to be much faster, especially since the query you pasted above runs in milliseconds.
I actually got to thinking about it earlier and managed to strip out some weird stuff I had in there previously so now it runs in about a second.

I had some old junk in there before to look at any current assignments and tell me how far along they were, when they had last checked in, etc. When my workers were starving for things to do, I did poach some assignments for triple-checks if that assignment hadn't reported in for a month, or whatever. Bear in mind that quite a few of these triple-checks were for larger exponents that where maybe one result was suspicious, so it was done again as a "first time check" which mismatched, and maybe it was previously assigned but had expired, so that expired assignment got converted to a live double (really a triple) check. I didn't feel bad about poaching assignments that hadn't reported in for months or even years.

I also streamlined a hideous "where" clause I was using and turned it into a custom join which really helps (I'm not matching huge datasets that I whittle down with a WHERE anyway). When I'm doing these types of queries for my own use I don't typically bother to optimize much unless it's really horrible. So yeah, went from 8 seconds to 1 second, and there went my lunch break. LOL

I even took another look at a different query that looks for self-verified work and got that running in about half the time... it shouldn't have been taking as long as it did (only returns a pair of results now anyway) but I was sloppy when I wrote it so it was taking twice as long as it should have. That's trickier since it's doing a join on the same table but has to look for cases where the user is the same but a different shift and/or date for the result, and those are the *only* two used to verify. Kind of a pain, but only takes 4 seconds now, and it's a pretty big table it's working with so I'm happy with it.

(PS - that other query takes 4 seconds because based on the self-verified work it finds, I'm looking up the factoring effort done on it to help me generate a valid worktodo entry - it's surprising how many people have double-checked their own first-time tests and haven't done a single bit of P-1 on it)

Last fiddled with by Madpoo on 2019-01-02 at 22:48

 2019-01-02, 23:18 #863 Uncwilly 6809 > 6502     """"""""""""""""""" Aug 2003 101×103 Posts 25·13·23 Posts Anyway you can make a super-secret link for some mods to run the query, say once a week. We could keep the list up to date.
 2019-01-02, 23:28 #864 Mark Rose     "/X\(‘-‘)/X\" Jan 2013 3×977 Posts So without the query to modify, I'll have to show a more generic approach. But you can do something like this: Code: select exponent, ... from ( select distinct exponent from ll_results where result_state=unverified group by exponent having count(exponent) > 1 ) as needs_tc join factoring_effort on needs_tc.exponent = factoring_effort.exponent left join assignments on needs_tc.exponent = assignments.exponent and assignments.assignment_id = ( select max(assignment_id) from assignments where exponent = needs_tc.exponent and expired = null ) where factoring_effort.bits > magic_formula_or_case_statement and assignments.assignment_id is null order by exponent -- assignments should have an index on (exponent, expired, assignment_id) The advantage being the subquery should only produce a small number of rows, which is fast, and only then join the other tables for a second layer of filtering. If the joins are efficient, the whole thing should run in milliseconds.
2019-01-03, 00:28   #865
chalsall
If I May

"Chris Halsall"
Sep 2002

256616 Posts

Quote:
 Originally Posted by Mark Rose The advantage being the subquery should only produce a small number of rows, which is fast, and only then join the other tables for a second layer of filtering. If the joins are efficient, the whole thing should run in milliseconds.
+1!

"Subselects" / "Subqueries" are a very powerful tool in a SQL geeks' tool-belt!

Joins are sometimes a*b in size, if not constrained....

 2019-01-03, 02:04 #866 Mark Rose     "/X\(‘-‘)/X\" Jan 2013 3×977 Posts I realize now that the subquery predicate on the assignments join can be dropped completely: the null check in the outside where takes care of it.
2019-01-03, 03:39   #867
chalsall
If I May

"Chris Halsall"
Sep 2002

2·4,787 Posts

Quote:
 Originally Posted by Mark Rose I realize now that the subquery predicate on the assignments join can be dropped completely: the null check in the outside where takes care of it.
Some won't get that, even in this rarefied expertise space....

 2019-01-03, 17:51 #868 irowiki   Sep 2018 4516 Posts I also would love a way to just assign machines to Triple checks if needed!
 2019-01-04, 15:02 #869 Uncwilly 6809 > 6502     """"""""""""""""""" Aug 2003 101×103 Posts 25·13·23 Posts So I took this one: DoubleCheck=47168477,73,1 And the 2009 result was ok, the 2018 result was bad. That is a rarity for me.

 Similar Threads Thread Thread Starter Forum Replies Last Post casmith789 PrimeNet 7 2015-05-26 00:53 Madpoo Data 28 2015-04-06 17:01 Rastus Data 1 2003-12-19 18:20 outlnder Lounge 4 2003-04-07 18:06 BigRed Software 1 2002-10-20 05:29

All times are UTC. The time now is 01:26.

Thu May 13 01:26:14 UTC 2021 up 34 days, 20:07, 0 users, load averages: 2.30, 2.17, 2.10