mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet

Reply
 
Thread Tools
Old 2016-07-15, 23:33   #1057
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

63158 Posts
Default

FYI, I've seen sporadic problems through the day where the monitoring service reported short duration outages.

Looking into it, it appears a certain IP address (with a user agent string of "Ruby") has decided to do a massive crawl of the exponent reports (EDIT: by "massive" I mean they're trying to crawl at a rate of 30+ pages per second for extended periods of time... 603K pages hit in the past nearly 24 hours).

Before I dig into it more and try to mitigate, may I suggest that if anyone is trying to gather data on exponents, please either download the daily archives of the result logs, or crawl the XML specific page which operates much more efficiently.

It would be too bad to have to implement some rate controls on the server side, so if you must crawl for data, try to do an appropriate rate limit on the crawler.

Last fiddled with by Madpoo on 2016-07-15 at 23:39
Madpoo is offline   Reply With Quote
Old 2016-07-16, 03:01   #1058
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

29×113 Posts
Default

Quote:
Originally Posted by Madpoo View Post
It would be too bad to have to implement some rate controls on the server side, so if you must crawl for data, try to do an appropriate rate limit on the crawler.
Well, like I said, I hated to do it, but after setting up a rate limiter in logging only mode for a few hours and checking to make sure no other traffic is caught in the net, I've turned it on because that user is still aggressively crawling and sometimes forcing other connections to fail.

If that user was you (I know what city and ISP but beyond that I haven't tried to narrow it down) just PM me here and we'll work out a better way to do whatever you're doing.
Madpoo is offline   Reply With Quote
Old 2016-07-16, 05:23   #1059
0PolarBearsHere
 
0PolarBearsHere's Avatar
 
Oct 2015

2×7×19 Posts
Default

Quote:
Originally Posted by Madpoo View Post
and we'll work out a better way to do whatever you're doing.
Like not using RubyOnRails :P ? (wasn't me by the way)

Last fiddled with by 0PolarBearsHere on 2016-07-16 at 05:23
0PolarBearsHere is offline   Reply With Quote
Old 2016-07-16, 14:48   #1060
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

22×1,447 Posts
Default

Quote:
Originally Posted by Madpoo View Post
If that user was you (I know what city and ISP but beyond that I haven't tried to narrow it down) just PM me here and we'll work out a better way to do whatever you're doing.
Not me either, but I just want to mention the possibility of a VPN. If it was me all you'd see was whichever VPN I chose, so perhaps instead of an IP block, a content/agent block might be better?
retina is offline   Reply With Quote
Old 2016-07-17, 04:11   #1061
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

327710 Posts
Default

Quote:
Originally Posted by Madpoo View Post
Well, like I said, I hated to do it, but after setting up a rate limiter in logging only mode for a few hours and checking to make sure no other traffic is caught in the net, I've turned it on because that user is still aggressively crawling and sometimes forcing other connections to fail.

If that user was you (I know what city and ISP but beyond that I haven't tried to narrow it down) just PM me here and we'll work out a better way to do whatever you're doing.
I don't know who it was (I searched for past hits from that IP address to see if I could match to a user, but I couldn't). Best I could tell it was some new user since they checked out the home page, looked at the download page and a few other things and then started crawling the exponent reports for every. single. exponent. one. by. one.

I was on the road all day today and kept getting emails from the monitoring setup about downtime, so now that I'm back at my computer I finally just blocked their IP address altogether.

Not something I wanted to do, but the problem was that even blocking them when they did 50 requests in 5 seconds meant some were still being processed and the backlog it built up made each request take progressively longer, which made other requests queue up. Even though it was unintentional I'd have to classify it as a DoS so a block is appropriate, unfortunately.

So, if you're that person in a certain US state with lots of lakes, and you came here looking for help on why you can't hit mersenne.org any more and your crawler keeps getting 403 errors, PM me and let's work on a better way to do this.

Meanwhile this will urge me to get off my butt and setup better dynamic restrictions to prevent this from happening in the future. And also a good motivation to dig into that page again and optimize it...something that's been in the back of my mind for a while now.
Madpoo is offline   Reply With Quote
Old 2016-07-17, 23:12   #1062
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

54678 Posts
Default

I have two "Manual testing" CPUs. One says it appears lost. How can I make it go away?
Mark Rose is offline   Reply With Quote
Old 2016-07-18, 00:10   #1063
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

10110100111002 Posts
Default

Quote:
Originally Posted by Madpoo View Post
... I finally just blocked their IP address altogether.
Thankfully it was not my VPN you blocked. But I do hope that in general IP blocks are not going to have to be common things?
retina is offline   Reply With Quote
Old 2016-07-18, 03:38   #1064
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

63158 Posts
Default

Quote:
Originally Posted by retina View Post
Thankfully it was not my VPN you blocked. But I do hope that in general IP blocks are not going to have to be common things?
I hope not either. It's a pain. Just a result of someone crawling far too aggressively.

I'll reiterate that for anyone looking to capture data on exponents, there are options far more suited to that than crawling the html report_exponent pages. XML reports are awesome and faster, you can also specify a range of exponents instead of doing one request per exponent, or if you want a specific large batch of something or another, ask and if it's not too cumbersome I may be able to do a BCP package or something, but since that's a manual thing on my part, I won't make any promises on if/when.
Madpoo is offline   Reply With Quote
Old 2016-07-18, 04:00   #1065
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

32·11·29 Posts
Default

Quote:
Originally Posted by Madpoo View Post
I hope not either. It's a pain. Just a result of someone crawling far too aggressively.

I'll reiterate that for anyone looking to capture data on exponents, there are options far more suited to that than crawling the html report_exponent pages. XML reports are awesome and faster, you can also specify a range of exponents instead of doing one request per exponent, or if you want a specific large batch of something or another, ask and if it's not too cumbersome I may be able to do a BCP package or something, but since that's a manual thing on my part, I won't make any promises on if/when.
Perhaps the page should be updated to tell people there are other options.
Mark Rose is offline   Reply With Quote
Old 2016-07-18, 20:41   #1066
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

166316 Posts
Default

Can I suggest checking after a few days to see if it is possible to unblock this ip-address. If it is a public vpn it has potential to block people other than intended.
henryzz is offline   Reply With Quote
Old 2016-07-18, 20:58   #1067
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

310710 Posts
Default

I'm not sure why so many people are jumping to the conclusion the IP in question is a VPN IP.
In any case, if all the DoS traffic is heading to a single report page, you could perhaps try a soft-block: rather than completely blocking the IP address from the server just insert a couple lines of code at the top of the report page to prevent running expensive queries but also provide feedback to the spider in question. Something like
Code:
if ($_SERVER['REMOTE_ADDR'] == '123.234.345.456') {
  die('You have been blocked for aggressive spidering. Please email madpoo@primenet to discuss better ways of getting the data you want');
}
James Heinrich is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Official "Faits erronés dans de belles-lettres" thread ewmayer Lounge 39 2015-05-19 01:08
Official "all-Greek-to-me Fiction Literature and Cinema" Thread ewmayer Science & Technology 41 2014-04-16 11:54
Official "Lasciate ogne speranza" whinge-thread cheesehead Soap Box 56 2013-06-29 01:42
Official "Ernst is a deceiving bully and George is a meanie" thread cheesehead Soap Box 61 2013-06-11 04:30
Official "String copy Statement Considered Harmful" thread Dubslow Programming 19 2012-05-31 17:49

All times are UTC. The time now is 07:20.

Wed Oct 21 07:20:03 UTC 2020 up 41 days, 4:31, 0 users, load averages: 1.24, 1.39, 1.35

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.