View Single Post
Old 2015-04-27, 23:03   #964
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

1100110011012 Posts
Default

Quote:
Originally Posted by James Heinrich View Post
My approach in a case like that, where generating a page with a lot of slow queries but the contents are mostly static, is to cache the generated HTML to a database table. The generate will be slow the first time, but once the first person has looked at it it's lightning-fast for everyone else. Then whenever new data is submitted that would affect the displayed page (new ECM run, factor found, whatever) that row is easily cleared from the database. In a case like this you could even hardcode an exception for known-problem exponents like M1277 (or any of the small exponents with much ECM) to not clear the cache on a trivial update (yet another ECM curve run) but only if a factor is found or similar major change.
Well, I take it back. It actually is something I added that made something like M1277 go so slow.

There's a query that pulls all the data and it uses some loose JOINS that involves duplicates being generated for certain sets of data. In the case of M1277 although there are *only* 1023 log entries for ECM results, by the time the joins to certain other non-distinct tables are involved, the report for M1277 involves no less than 91,520 rows.

Once PHP gets a hold of it all, it removes dupe entries based on the type of data being displayed... still not really optimal but it's also worth noting that this works equally well for single or multiple exponents... all things said it's not too bad.

Where I messed up was when I tried to "fix" the slow display where it would show *all* of the ECM info by letting you toggle the display of "no factors found by ECM" on or off. But I thought I would helpfully calculate how many curves in total had been run as some sort of probably un-useful tip.

Trouble is, I did that in a subselect of the main query and I think it was doing a big SUM calculation on each row... 91,520 times.

So... yeah, it's no wonder that if you tried to include all of the ECM data (which would still sum up the # of curves) it would basically time out.

If you left the "full ECM history" box unchecked then there's only 1496 rows, and while it still does my stupid sum in each row it will finish in 30-60 seconds if you're patient enough.

Solution: I'm removing the mostly useless "recent curves run" from being displayed. Be aware that if you still feel like looking at all the times someone did ECM on a number and found nothing, in the most extreme case like M1277 it will still take a while to show everything (maybe 30-40 seconds).

My advice? If you're crawling for importing data into your own setup, please do yourself a favor and use the XML version of the exponent report. e.g.:
http://www.mersenne.org/report_expon...hp?exp_lo=1277

I wrote that from scratch and it's just using raw SQL "for xml" and it's VERY fast. That reminds me that I meant to put just a little link on each report page so you could click and view the XML data for that exponent. Right now it's kind of a hidden feature.
Madpoo is offline   Reply With Quote