mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   PrimeNet (https://www.mersenneforum.org/forumdisplay.php?f=11)
-   -   OFFICIAL "SERVER PROBLEMS" THREAD (https://www.mersenneforum.org/showthread.php?t=5758)

NBtarheel_33 2014-08-09 05:11

[QUOTE=Prime95;380040]People fought replacing P95-years for a long, long time. I suspect they'd fight changing GHZ-days too.[/QUOTE]

Oh, keep the GHz-day as a unit, but just recalibrate it to say, a Haswell instead of a Core 2 Duo.

[QUOTE=Prime95]I'm still getting several DCs a day from v4 computers. It turns out people are still downloading v24 prime95. Apparently, one can still create new v4 user IDs, get work, and report results. There is a PHP script that turns v4 requests into v5 requests, so there is no reason not to continue supporting v4 computers.[/QUOTE]

Wow, never would have thunk it. We probably should get any mirror sites to update their outdated Prime95 links as we catch them.

axn 2014-08-09 08:44

[QUOTE=Prime95;380039]I'm 99% there. Thanks, James. The only remaining issue is generating a unique filename for YAFU to write to instead of the fixed siqs.dat.[/QUOTE]

Md5 hash of the input number should be good enough.

kladner 2014-08-10 03:02

I just got a CGI timeout on a batch of mfaktc results which included two factors. 0300 UTC, 8/10/2014.

tha 2014-08-10 08:47

The [URL="http://mersenne.org/report_recent_cleared/"]recent cleared[/URL] list shows only C for composite and no F for factor for a couple of hours, which is very counterintuitive, or a sign of something being wrong.

kladner 2014-08-10 15:35

[QUOTE=tha;380119]The [URL="http://mersenne.org/report_recent_cleared/"]recent cleared[/URL] list shows only C for composite and no F for factor for a couple of hours, which is very counterintuitive, or a sign of something being wrong.[/QUOTE]

The factors were the hangup. This morning (my time) the entire set of results went through. All the NF results came up with Error 40, not needed. The two factors cleared and displayed credit.

Madpoo 2014-08-11 03:18

[QUOTE=kladner;380124]The factors were the hangup. This morning (my time) the entire set of results went through. All the NF results came up with Error 40, not needed. The two factors cleared and displayed credit.[/QUOTE]

Having looked at the current server now, I can see that there are times when the disk subsystem really starts to get overwhelmed... it will start to page excessively and that results in SQL queries taking longer... if there are any writes involved, then the transaction log starts growing since it can't keep up, and everything else on the system starts to crawl.

I don't know exactly what kinds of things can trigger those episodes, but it is interesting to see the cascade effect in action.

The replacement system has much faster and more disks. More memory and 64-bit helps too because I think the real problem right now is the memory crunch from the large SQL dataset. It could possibly be helped by limiting SQL to 1 GB of the 2 GB total... the overall system could be more responsive in general and not page so much, but SQL does work better the more memory it has. I just wonder how much it really helps to say SQL can use 1GB or nearly 2GB when the database is in the dozens of GB in size.

I'd look at that angle more, reducing the SQL memory allocations, but hopefully that old server's days are numbered.

I'm not as available this coming week for testing but it's set up enough that George is doing more functionality tests... so far I've had v4 and v5 clients connect and get/check-in results and I think all of the website stuff is working. There were a few (minor, I think) PHP incompatibilities with the upgrade from 5.2 to 5.5 along the way but those would have needed fixing at some point anyway, so that's good.

In summary, better things are coming. :smile:

LaurV 2014-08-11 03:50

:tu: Good work! Eagerly waiting...

Mark Rose 2014-08-11 14:40

[QUOTE=Madpoo;380156]The replacement system has much faster and more disks. More memory and 64-bit helps too because I think the real problem right now is the memory crunch from the large SQL dataset. It could possibly be helped by limiting SQL to 1 GB of the 2 GB total... the overall system could be more responsive in general and not page so much, but SQL does work better the more memory it has. I just wonder how much it really helps to say SQL can use 1GB or nearly 2GB when the database is in the dozens of GB in size.[/QUOTE]

I have no experience with MSSQL, but in general, databases can more efficiently manage memory with larger internal caches than relying on an OS file system cache because they have additional insight into the data the OS lacks. If you reduce the memory allocated to MSSQL, the OS will use more memory to cache files recently written to, which is pointless if those files are also cached in MSSQL, and MSSQL will end up doing more disk reads as its caches are smaller.

Running a separate virtual machine for the database server is a good way to avoid having other processes page the database memory out. I would certainly do it.

M29 2014-08-11 15:26

[QUOTE=Madpoo;380156]I can see that there are times when the disk subsystem really starts to get overwhelmed... [/QUOTE]The system seems to be out picking daisies when returning the [i]Top Producers >> Totals Overall[/i].

I think it is supposed to return the Top 500 ("/report_top_500/"). Instead it returns all 4500+.

kladner 2014-08-11 16:44

[QUOTE]In summary, better things are coming. :smile:[/QUOTE]

That's very good to hear. Thanks for all your efforts and contributions.

Madpoo 2014-08-11 19:16

[QUOTE=M29;380179]The system seems to be out picking daisies when returning the [i]Top Producers >> Totals Overall[/i].

I think it is supposed to return the Top 500 ("/report_top_500/"). Instead it returns all 4500+.[/QUOTE]

Oh, weird. You're right. That's more than 500.

That's one of the reports that gets generated hourly, but the server right now does not do any gzip compression, so that large > 1MB of data gets sent to the client via the scenic route.

I did test out switching the current server to use the built in PHP compression and that works, but I wasn't entirely sure if the v4/v5 API communications will work okay if the client gets gzipped data back from it's requests.

On the replacement server I avoided that by just using built in IIS compression for the main site and have it disabled for all of the API related calls. It might work anyway... the clients are probably using whatever built-in HTTP for it's OS which *should* be able to decompress any gzipped responses but I'm not going to test all of them just for that, and those calls don't normally return much data.

Suffice to say, when doing a webpagetest.org check of that page on the current server it can take 10-20 seconds. On the test server it was taking 1-2 seconds tops. 1+ MB of data compressed down to 150KB or so.

I guess if that report was fixed to just generate the top 500 and not the larger set, it wouldn't take as long anyway.

Another fun stat is how long it takes the server to run the SQL tasks that generate those hourly stats. On the current server it could take anywhere from 3 minutes to as much as 20 minutes, just depending on how much disk thrashing was going on, but on average I think I saw it taking 4-5 minutes. The test server is consistently doing it in 35-40 *seconds*.

That's not to say the final server replacement will do exactly the same, since we're testing this all on a virtual machine right now since that was easier for me to get going, but it is a close match to the physical box itself.

Madpoo 2014-08-11 19:34

[QUOTE=Madpoo;380203]...when doing a webpagetest.org check of that page on the current server it can take 10-20 seconds.[/QUOTE]

If you're curious, here's the webpagetest I did on the current server for that page:
[URL="http://www.webpagetest.org/result/140805_E3_3QR/"]http://www.webpagetest.org/result/140805_E3_3QR/[/URL]

VictordeHolland 2014-08-12 21:27

The site seems to be more responsive and uploading manual results was faster than the past two weeks, did you guys do anything?

Prime95 2014-08-12 22:22

[QUOTE=VictordeHolland;380249]The site seems to be more responsive and uploading manual results was faster than the past two weeks, did you guys do anything?[/QUOTE]

Nope, you were just lucky.

The site does seem happier with 5GB of free disk space. Madpoo is out of town this week so there won't be any work on the new server til he gets back.

preben s 2014-08-13 03:36

The manual submitting if the results has been a little troublesome with the many cgi timeouts.

A few weeks ago, I tried to change the upper target exponents in trial factoring up to a value like 2^73 or 2^74 in my assignments.

While I submitted hundreds of results before, now I I just have a few results to submit every time.

Is it an idea to be followed by others?

LaurV 2014-08-13 04:53

Don't change the pledged factoring level. Or, if you do, then report all the bitlevels at once, when they are finished. Or even better, if you need "higher bitlevels" to factor, go to GPU72.

If you manually reserve TF work, then change the bitlevels, this will result almost always in [U]duplication of work[/U]. As you TF, lower bitlevels will be finished faster, PrimeNet has no way to know that you are doing higher bitlevels too, so it will assign them to other users, which will duplicate your work. Or you their. The server can't "guess your mind" to know that you intend to factor higher, unless you don't "pledge" so (reserving the right work).

Don't factor higher than you pledge for! This is a bad practice.

TheMawn 2014-08-13 07:15

It's one of the unfortunate aspects of having multiple servers handing out assignments. GPU72.com is an invaluable addition to our [STRIKE]empire[/STRIKE] resources but I think it does cause a bit of confusion.

On the other hand, Primenet should not be assigning stuff "owned" by GPU72.

preben s 2014-08-13 07:34

[QUOTE=LaurV;380268]Don't change the pledged factoring level. Or, if you do, then report all the bitlevels at once, when they are finished. Or even better, if you need "higher bitlevels" to factor, go to GPU72.

If you manually reserve TF work, then change the bitlevels, this will result almost always in [U]duplication of work[/U]. As you TF, lower bitlevels will be finished faster, PrimeNet has no way to know that you are doing higher bitlevels too, so it will assign them to other users, which will duplicate your work. Or you their. The server can't "guess your mind" to know that you intend to factor higher, unless you don't "pledge" so (reserving the right work).

Don't factor higher than you pledge for! This is a bad practice.[/QUOTE]


One exponent to test is only assigned for one test at a time. It can clearly be seen under the menu "Work Distribution". Get an assignment, and the number of available assignments drop correspondingly. Submit results and the number of available will be recalculated at the hourly run.

And yes, the deliver at once option must be set in the ini file to avoid conflicts.

Madpoo 2014-08-13 12:46

[QUOTE=TheMawn;380272]It's one of the unfortunate aspects of having multiple servers handing out assignments. GPU72.com is an invaluable addition to our [STRIKE]empire[/STRIKE] resources but I think it does cause a bit of confusion.

On the other hand, Primenet should not be assigning stuff "owned" by GPU72.[/QUOTE]

Is GPU72 getting exponent assignments through the Primenet API at all, or do you mean Primenet isn't currently storing info on factoring above certain bit depths which GPU72 might be re-doing in some cases?

I can never afford (or at least justify the purchase to the wife) a fancy GPU so I haven't actually looked at the project, but it does sound nice to have such a powerful system doing fun stuff like that even if you're not a gamer.

Madpoo 2014-08-13 12:59

[QUOTE=Prime95;380255]Nope, you were just lucky.

The site does seem happier with 5GB of free disk space. Madpoo is out of town this week so there won't be any work on the new server til he gets back.[/QUOTE]

Yeah, it's been a hectic past couple days, relocating systems from a New York datacenter over to New Jersey. :)

I think when I was looking at the current system I may have been overly aggressive in keeping the database log file small to help with the nightly backups we're doing to the test site. Currently the log file tends to grow pretty quick up to 2.5-3 GB after running for a day. When it gets that big, it takes longer to backup over an Internet connection so I'd been doing a series of a couple log backups in a row with a logfile shrink after each, so I could get it down to a small size.

But what typically happens during the day is there's some long running transactions which naturally fills that log right back up, and logfile expansion is a somewhat expensive I/O operation. I think I'm better off dealing with large log backups and avoiding having that slow disk on the current system expanding all through the day like that.

It's like a catch-22, where a transaction is slow so causes the log to fill more, which causes the logfile to expand, which slows down the system and slows down the transaction, etc. Rinse and repeat through the day and we're right back to a 3GB logfile. I'm just going to leave it like that unless it really started growing out of control, but 2-4 GB seems to be it's daily sweet spot. Doesn't leave much spare room, but enough for now.

The main data file is 54 GB but could be shrunk to 45 or so... Out of curiosity I shrunk it on the test server to see how long it took and on there it took a good 15 minutes. I can only imagine how long the current server would take, but I'd guess it'd be shrinking that for the better part of a day or two, and while that's happening the log file would be growing like crazy.

There are also index optimizations that are done in offline mode and it just takes far too long on the current system to do that. I may try that out on the new system and see how long those take because that'd be a nice weekly task to do and help overall performance. It looks like George is making good use of indexes, but they do get worked over and need rebuilding, it just takes so long on the current system that going without indexes while they're optimizing is a long and slow ordeal.

chalsall 2014-08-13 13:16

[QUOTE=Madpoo;380282]Is GPU72 getting exponent assignments through the Primenet API at all, or do you mean Primenet isn't currently storing info on factoring above certain bit depths which GPU72 might be re-doing in some cases?[/QUOTE]

GPU72 has several spiders which reserves candidates from Primenet via the APIs, and then "lends" them out to people for higher-level TF'ing, and then often P-1'ing. There /shouldn't/ be any duplication of work.

VictordeHolland 2014-08-13 13:17

[QUOTE=Madpoo;380282]Is GPU72 getting exponent assignments through the Primenet API at all, or do you mean Primenet isn't currently storing info on factoring above certain bit depths which GPU72 might be re-doing in some cases?[/QUOTE]
Once a factor is found by TF, P-1 or ECM the current bitlevel of TF is 'lost' on the Primenet server. To me this is not much of an issue since (factor = composite). This information could be handy for those who want to find more factors/fully factorize exponents and want to do more TF on those exponents.

James Heinrich 2014-08-13 14:08

[QUOTE=VictordeHolland;380286]Once a factor is found by TF, P-1 or ECM the current bitlevel of TF is 'lost' on the Primenet server.[/QUOTE]Not entirely. The data is still hiding in there (at least for results from the last few years) but previously the web interface may have been more reluctant to talk about pre-factor details. I reworked that page several months ago to get it to show whatever info it has on the "full" report. You can see from [url=http://v5www.mersenne.org/report_exponent/default.php?exp_lo=68687083&full=1]this random example[/url] of a recently-found factor that the prior NF history is still there.

Madpoo 2014-08-13 20:31

[QUOTE=Madpoo;380283]There are also index optimizations that are done in offline mode and it just takes far too long on the current system to do that. I may try that out on the new system and see how long those take because that'd be a nice weekly task to do and help overall performance. It looks like George is making good use of indexes, but they do get worked over and need rebuilding, it just takes so long on the current system that going without indexes while they're optimizing is a long and slow ordeal.[/QUOTE]

I just did an index reorganize task on the test server... took 1 hour, 12 minutes. Considering the sizes of some of those indices it's not surprising it took that long. I checked beforehand and they were severely fragmented. Ran it again right after and it only took 8 1/2 minutes.

It's definitely something to add to a nightly maint. task on a new system, doing a reorg of those to keep the fragmentation to a low level. Keeping it up is the key.

snme2pm1 2014-08-14 04:25

[QUOTE=Madpoo;380307]... offline ... a nightly maint. task[/QUOTE]

Me wonders as to the definition of "night", and impact on mersenne.org response to people's enquiries and submissions, prime95 and misfit.

James Heinrich 2014-08-14 04:28

[QUOTE=snme2pm1;380335]Me wonders as to the definition of "night"[/QUOTE]Presumably that would be set to the typical period of least average load on the server, which may or may not conform to any particular user's (or even the of the server's) geographic location's definition of "night".

snme2pm1 2014-08-14 04:39

TF partial range
 
[QUOTE=James Heinrich;380290]I reworked that page several months ago[/QUOTE]

I noticed that there is much more comprehensive result information now visible, at odds with assertions from some quarters in the past that there is no record of partial (or not) range TF factored result.
I can see a tell-tale asterisk.
Hopefully that indicator is well agreed among the various agents.
[B]... F 729540971773795344071;[TF:69:70*:mfakto 0.14-Win cl_barrett15_71_gs_2][/B]

NBtarheel_33 2014-08-14 04:59

[QUOTE=snme2pm1;380335]Me wonders as to the definition of "night", and impact on mersenne.org response to people's enquiries and submissions, prime95 and misfit.[/QUOTE]

I bet that anything in the "wee hours" (e.g. 0300) in any of the continental US time zones would work fairly well. Might not be perfect, but should work well.

kladner 2014-08-14 14:35

These results included a Factor Found. I don't remember seeing the error below before:
[CODE]Processing result: nHTTP/1.1 502 Gateway Error Server: Microsoft-IIS/5.0[/CODE]

Mark Rose 2014-08-14 15:33

[QUOTE=snme2pm1;380338]I noticed that there is much more comprehensive result information now visible, at odds with assertions from some quarters in the past that there is no record of partial (or not) range TF factored result.
I can see a tell-tale asterisk.
Hopefully that indicator is well agreed among the various agents.
[B]... F 729540971773795344071;[TF:69:70*:mfakto 0.14-Win cl_barrett15_71_gs_2][/B][/QUOTE]

[url=http://v5www.mersenne.org/report_exponent/default.php?exp_lo=68743163&full=1]Nope[/url]. I just factored M68743163 with StopAfterFactor=2 in mfaktc, and this is all it produced: F-PM1 2463386658202793009209. No indication of the agent or that factoring was only partial for the bit depth.

sdbardwick 2014-08-14 15:38

[QUOTE=Mark Rose;380366][url=http://v5www.mersenne.org/report_exponent/default.php?exp_lo=68743163&full=1]Nope[/url]. I just factored M68743163 with StopAfterFactor=2 in mfaktc, and this is all it produced: F-PM1 2463386658202793009209. No indication of the agent or that factoring was only partial for the bit depth.[/QUOTE]
Not sure if that result is representative. It looks like a TF that was interpreted by PrimeNet as P-1; did you report a no-factor found before the TF result to get around the P-1 bug?

Madpoo 2014-08-14 17:30

[QUOTE=snme2pm1;380335]Me wonders as to the definition of "night", and impact on mersenne.org response to people's enquiries and submissions, prime95 and misfit.[/QUOTE]

Fortunately doing an index reorganization is an online task. Performance can be hit a bit since it's doing some disk I/O (well, a lot of disk I/O on large indices).

An index rebuild is an offline task except in the enterprise editions of SQL, which this definitely won't be (that's a lot of extra $$$), so that probably wouldn't happen except in some extreme circumstance. There are some clustered indices in use so during any rebuild there would probably be some features that wouldn't work that specifically use an indexed view, but I don't know for sure. Other queries could run okay but take longer without an index to help them along.

The best times of day to run any maintenance task whether it's SQL, doing a disk defrag, or anything else that impacts some subsystem, is best determined by looking at traffic levels over the course of a few weeks and finding those dips in activity.

I see that the web page is instrumented with Google Analytics which would help see how the web traffic does.

On the API side of things, check in and check out activity is timestamped so it should be fairly easy to gather in some stats for the last XX days or weeks and see if there's any type of pattern. Those are probably more evenly spread because clients don't care what time of day it is, they just check in/out as needed. But maybe there's something surprising in the data... maybe I can do a little analysis on the copy of the DB and see if anything pops out in that regard. Could be interesting, although I suspect it's fairly even.

Mark Rose 2014-08-14 17:33

[QUOTE=sdbardwick;380367]Not sure if that result is representative. It looks like a TF that was interpreted by PrimeNet as P-1; did you report a no-factor found before the TF result to get around the P-1 bug?[/QUOTE]

Ahh, nevermind. I was just going by what I could see on the web. The actual results submitted should look like:

M68743163 has a factor: 2463386658202793009209 [TF:71:72*:mfaktc 0.20 barrett76_mul32_gs]
found 1 factor for M68743163 from 2^71 to 2^72 (partially tested) [mfaktc 0.20 barrett76_mul32_gs]

Madpoo 2014-08-14 17:35

[QUOTE=kladner;380362]These results included a Factor Found. I don't remember seeing the error below before:
[CODE]Processing result: nHTTP/1.1 502 Gateway Error Server: Microsoft-IIS/5.0[/CODE][/QUOTE]

Weird... 502 error is a gateway related error, like a proxy server in the path had an issue. There's no proxy directly in front of the current website... traffic goes directly to IIS from the outside world, not through a load balancer/caching proxy/whatever.

Any proxy seems like it'd be on the client's side, but the IIS/5.0 is kind of a giveaway that it did manage to hit the server in some way... might have been a request going through a proxy that timed out or something...that's my guess.

kladner 2014-08-14 17:56

Last response from the server, 1754 UTC:
[QUOTE]
[B]Warning[/B]: odbc_pconnect() [[URL="http://www.mersenne.org/manual_result/function.odbc-pconnect"]function.odbc-pconnect[/URL]]: SQL error: [Microsoft][ODBC SQL Server Driver]Timeout expired, SQL state S1T00 in SQLConnect in [B]C:\v5\www\2013\v5server\0.96_database.inc.php[/B] on line [B]21[/B]
pnErrorResult=3 pnErrorDetail=Database unavailable ==END== [/QUOTE]In reference to my previous post, NF results were accepted. I still haven't gotten the Factor Found result to go through.

EDIT: I got in to submit results, but the response is the same. The NF results have already been accepted.
[QUOTE]Processing result: no factor for M68762963 from 2^73 to 2^74 [mfaktc 0.20 barrett76_mul32_gs]
Error code: 40,
error HTTP/1.1 502 Gateway Error
Server: Microsoft-IIS/5.0
Date: Thu, 14 Aug 2014 18:00:04 GMT
Connection: close
Content-Length: 186
Content-Type: text/html [B]
CGI Timeout[/B]

The specified CGI application exceeded the allowed time for processing. The server has deleted the process.[/QUOTE]

James Heinrich 2014-08-14 18:03

[QUOTE=Mark Rose;380366]No indication of the agent or that factoring was only partial for the bit depth.[/QUOTE]How was the result submitted? Manually? with misfit?

Mark Rose 2014-08-14 18:09

[QUOTE=Madpoo;380377]Weird... 502 error is a gateway related error, like a proxy server in the path had an issue. There's no proxy directly in front of the current website... traffic goes directly to IIS from the outside world, not through a load balancer/caching proxy/whatever.

Any proxy seems like it'd be on the client's side, but the IIS/5.0 is kind of a giveaway that it did manage to hit the server in some way... might have been a request going through a proxy that timed out or something...that's my guess.[/QUOTE]

While not IIS, I've seen 502's happen with an Nginx+PHP sitation, where PHP fails. If PHP is not in-process, like it often is in Apache, in IIS, a PHP issue may have been what caused that.

Mark Rose 2014-08-14 18:13

[QUOTE=James Heinrich;380380]How was the result submitted? Manually? with misfit?[/QUOTE]

By calling this file: [url]https://github.com/MarkRose/primetools/blob/master/mfloop.py[/url]

It should send every line from mfakt?'s results.txt with a mersenne number string (M123456...) in it.

chalsall 2014-08-14 18:20

[QUOTE=Madpoo;380377]Weird... 502 error is a gateway related error, like a proxy server in the path had an issue. There's no proxy directly in front of the current website... traffic goes directly to IIS from the outside world, not through a load balancer/caching proxy/whatever.[/QUOTE]

I can support the observation that "502s" are quite common. Is it possible that Primenet is behind a proxy you don't know about?

I can further say but (quickly checking some of my spider's logs) can't document that such errors are highly correlative with other errors, such as "CGI Timeouts", "Database unavailable", etc.

If it would help I could instrument my spiders to collect deeper error statistics.

James Heinrich 2014-08-14 18:39

[QUOTE=sdbardwick;380367]It looks like a TF that was interpreted by PrimeNet as P-1[/QUOTE]I would tend to agree. I would poke a little closer at the data... if mersenne.org was working right now.

Once we get the server migrated and more responsive I'll re-implement my results parsing code and hopefully put the whole misinterpreted results issue behind us.

Madpoo 2014-08-14 18:41

[QUOTE=James Heinrich;380389]I would tend to agree. I would poke a little closer at the data... if mersenne.org was working right now.

Once we get the server migrated and more responsive I'll re-implement my results parsing code and hopefully put the whole misinterpreted results issue behind us.[/QUOTE]

I'm trying to login to the current server right now and it's being exceptionally slow, so I'm not sure what's up. I'll probably just look and not touch in case George is already on there as well checking on it. Unless I see something obviously broken, there's the "too many cooks" thing and George would know from past experience what the problem probably is.

chalsall 2014-08-14 19:18

[QUOTE=Madpoo;380391]Unless I see something obviously broken, there's the "too many cooks" thing and George would know from past experience what the problem probably is.[/QUOTE]

Or not...

All of my spiders, both GPU72's and worker client's, have been experiencing many errors today. I suspect many others have been so observing as well.

Perhaps you could keep trying to log into Primenet (and iff finally succesful) to simple observe what's going on without changing anything?

BTW, I would be very happy to underwrite the transport of your new donated server to its new home via UPS or FedEx.

Madpoo 2014-08-14 19:39

[QUOTE=chalsall;380398]All of my spiders, both GPU72's and worker client's, have been experiencing many errors today. I suspect many others have been so observing as well.

Perhaps you could keep trying to log into Primenet (and iff finally succesful) to simple observe what's going on without changing anything?[/QUOTE]

I got in and there's still 3.5 GB of free space, so it's not crunched for that. It's not a lot, but enough. CPU use is low, but the system is still super laggy and I'm assuming it's doing a lot of paging again due to the memory pressure. I'd start up performance monitor and check but it's so slow it might take a while and only add to the overburdened system.

[QUOTE=chalsall;380398]BTW, I would be very happy to underwrite the transport of your new donated server to its new home via UPS or FedEx.[/QUOTE]

Once I get back from my super fun work trip near Newark (</sarcasm>) I'll check back in with George and James to see how things are looking. James supplied some fun new PHP to eliminate the old methods that used a SQL extended proc to verify found factors and run msieve, which I think was the last bit of thing we were dealing with.

I was talking to George about the migration plans to the new system and one approach we were tossing back and forth was to make this test system "live" for long enough to send the new box to Scott and get it swapped out. If we go that route then we could see things get better even sooner.

But as I'm sure we'd all agree, we really just want to make sure that all of the functionality is still good... v4 and v5 clients are able to do everything they need to do and the web site works as expected, just faster, faster and faster. :smile:

Prime95 2014-08-14 20:06

[QUOTE=Madpoo;380391]Unless I see something obviously broken, there's the "too many cooks" thing and George would know from past experience what the problem probably is.[/QUOTE]

Don't worry about getting in this cooks way. I have no idea what is wrong. I follow the only recipe I know: kill IIS, restart MSSQL, restart IIS.

Reboot complete - sorry it took so long, I was out having fun this morning.

Madpoo 2014-08-14 20:51

[QUOTE=Prime95;380402]Don't worry about getting in this cooks way. I have no idea what is wrong. I follow the only recipe I know: kill IIS, restart MSSQL, restart IIS.

Reboot complete - sorry it took so long, I was out having fun this morning.[/QUOTE]

I'm doing some additional perfmon analysis... I've confirmed that really, truly, it's just memory pressure and excessive swapping. It's nothing that can really be solved given the 32-bit nature of the system.

One thing I think might help is to limit how much memory SQL is able to use so that, for instance, the website running on there as well won't be constantly fighting with SQL over how much physical RAM they can get.

I can tell by looking that SQL is already hitting memory limits with just 2 GB so limiting it to 1.5 GB and leaving that other 512MB for IIS and the OS isn't going to matter much to SQL but could matter a whole lot to the other components. In other words, it won't make SQL faster, but won't really make it much slower either so it's a benefit to everything else.

Page faults/sec is averaging 1600 or so... if you've ever had to diagnose memory pressure in Windows you'll know that's pretty high. :) That's with a current SQL tps of just 28-29 on average at the moment.

Anyway, to that end I set SQL to use just 1.5 GB (1536 MB) and restarted it, so we should see soon if it's having any impact on the responsiveness of everything else.

I probably should have set SQL's minimum memory to the same thing, but anyway, right now SQL is still filling it's memory buffer based on the current transactions it's dealing with. Things will reach a steady state, maybe 30-60 minutes, I'm guessing. Then we'll know if this is any better. :smile:

M29 2014-08-14 20:52

[QUOTE=Prime95;380402]sorry it took so long, I was out having fun this morning.[/QUOTE]Did you play 9 or 18? :grin:

Prime95 2014-08-14 20:56

[QUOTE=M29;380404]Did you play 9 or 18? :grin:[/QUOTE]

Guilty as charged -- 18.

James Heinrich 2014-08-14 21:01

[QUOTE=James Heinrich;380389]...hopefully put the whole misinterpreted results issue behind us.[/QUOTE]I see that the extended results data is stored correctly so it will be possible at a future date to go back and fix all the false-PM1 records that were found by mfakt*, at least where the mfakt* version string was recorded. There's no point fixing the data right at this moment since the old results-parsing code is still live and will continue to introduce false records, but when that's changed the data is fixable.

For now, however, I have hack-fixed the display code to display correctly (this is the same factor that is really stored as "F-PM1" in the database but is now shown as "F"):
[url]http://v5www.mersenne.org/report_exponent/default.php?exp_lo=68743163&full=1[/url]

kladner 2014-08-14 21:07

Yay! Got the factor accepted! 2107 UTC

snme2pm1 2014-08-14 22:28

[QUOTE=Madpoo;380403]just memory pressure[/QUOTE]

Sometimes a fear of memory leaks has been mentioned in this place.
Have you spotted any hint of resource leakage?

Madpoo 2014-08-14 22:34

[QUOTE=snme2pm1;380413]Sometimes a fear of memory leaks has been mentioned in this place.
Have you spotted any hint of resource leakage?[/QUOTE]

I haven't seen anything to make me think there's a leak. SQL itself is unlikely to leak and that's the only thing running that's chewing up that much RAM.

On the other hand, it's SQL 2005 RTM... it hasn't had any service packs or rollups applied at all and maybe somewhere in the service pack history is something about a memory leak. I'd be tempted to install SP4 and cumulative update #3 which was the last one for SQL 2005, but with the low disk space I'm not even sure it would succeed in installing. :smile:

Mark Rose 2014-08-14 23:35

Have you looked at disabling any unnecessary background services to free up memory?

Madpoo 2014-08-15 00:18

[QUOTE=Mark Rose;380420]Have you looked at disabling any unnecessary background services to free up memory?[/QUOTE]

In fact, I did just a little bit earlier. SQL integration services got installed and was running, taking a little bit of RAM, so I stopped that. Ditto on the DFS service since there isn't any DFS setup (and it had a surprising amount of page faults since the server started, but probably just because it's happy to give up any physical RAM it has).

Other than that the system is pretty well minimized. A lot of other stuff is already stopped like the HP management software. It's been well squeezed by this point.

Xyzzy 2014-08-15 01:33

[QUOTE]Performance can be hit a bit since it's doing some disk I/O (well, a lot of disk I/O on large indices).[/QUOTE]
Would using enterprise-quality SSD drives improve anything?

snme2pm1 2014-08-15 02:34

[url]http://www.mersenne.org/report_top_500/[/url] seems without refresh
[url]http://www.mersenne.org/primenet/[/url] similarly old
[url]http://www.mersenne.org/account/default.php?details=1[/url] totals also stale

Madpoo 2014-08-15 03:28

[QUOTE=snme2pm1;380436][url]http://www.mersenne.org/report_top_500/[/url] seems without refresh
[url]http://www.mersenne.org/primenet/[/url] similarly old
[url]http://www.mersenne.org/account/default.php?details=1[/url] totals also stale[/QUOTE]

It should be updated now. Maybe it's when I restarted SQL on there after adjusting it's memory settings, but the next few times that job ran it had an error. The last hourly run kicked off okay though, so it's all good.

Madpoo 2014-08-15 03:37

[QUOTE=Xyzzy;380431]Would using enterprise-quality SSD drives improve anything?[/QUOTE]

SSD drives are *always* a good thing. That said, they're also expensive, especially enterprise level.

Thankfully for all but the busiest systems you can get "good enough" I/O with a sufficient RAID of spinners.

On the desktop level especially, SSD is my #1 go-to for increasing performance. I just haven't had too many good business cases with the stuff I work with to justify SSDs on the servers. Our DB's tend to be small enough and we have tons of GB of RAM on the cluster nodes that eventually the whole DBs are cached in RAM essentially, and they're typically more read-heavy than write-heavy.

Someday I'll have a blast configuring a SQL server with heavy write requirements and I'll have a budget to match. Until then, storage arrays of 12+ SAS 6Gb 15K disks will have to do. :yucky:

TheMawn 2014-08-15 18:21

[QUOTE=Madpoo;380442]Until then, storage arrays of 12+ SAS 6Gb 15K disks will have to do. :yucky:[/QUOTE]

oh dear god why

Mark Rose 2014-08-15 20:34

[QUOTE=TheMawn;380477]oh dear god why[/QUOTE]

Because if you get bored you can yell at it:

[YOUTUBE]tDacjrSCeq4[/YOUTUBE]

ROFLMFAO! :smile:

Many years ago I used to do the "Wabbit Twist". This was way back when SCSI was king, and Amiga was pretty cool.

A harddrive would fail to spin up (because of stiction) and because of ineratia giving a strong rotational impulse around the spindal might unstick it. For a very little while...

Edit: Oh, shite... Sorry. In my hast I edited your message, rather than posting my own. I'm currently using Firefox because Chrome has decided to randomly crash, but only the latter can view YouTube videos without installing Flash.

Madpoo 2014-08-16 14:58

[QUOTE=Mark Rose;380482]Because if you get bored you can yell at it...[/QUOTE]

I only yell at them *after* they fail, so... I think I'm okay. :smile:

[QUOTE=Mark Rose;380482]A harddrive would fail to spin up (because of stiction) and because of ineratia giving a strong rotational impulse around the spindal might unstick it. For a very little while...[/QUOTE]

Been there, done that. It's also fun to show someone who doesn't know a thing about hard drives (or physics in general?) the effect of holding a drive in their hand that you just removed from a running system. They're usually surprised by the strong gyroscopic effect. I guess I probably was too the very first time.

Although, it's more fun with the larger 3.5" drives at 15K... just not the same as a 2.5" drive.

Madpoo 2014-08-16 15:21

[QUOTE=TheMawn;380477]oh dear god why[/QUOTE]

Spinning disks are still *VERY* common in datacenters, for better or worse. It's cost versus capacity, pure and simple.

On my work trip this past week, we were moving into a brand new facility so I did a little tour around the floor to see who else had moved in and what kind of kit they run... and let's face it, for the larger customers (the ones who rent cages, not just cabinets), most of the rack space is acres and acres of platters in SANs.

I chatted up another customer who just moved a couple dozen cabinets over from the closing location... works for a major cable TV company and we're talking about... I dunno... a LOT, petabytes of space to store pre and post edited shows that get sent over the Internet to the cable cos for distribution (yeah, I know, I thought it was all done with satellite tech, but not so much anymore. And all of that is SAN replicated to some other datacenter for redundancy. :)

Everyone knows SSDs are fast but the enterprise models with hundreds of thousands of write cycles are still cost prohibitive. Fortunately costs have been coming way down and new tech is making them more reliable over the long term.

I think it's safe to say we're seeing the transition from horse-and-buggy to motor cars, if I could borrow an overused analogy. :) We're still in the phase where the motor car is out there and people are buying them but there are still far more piles of horse manure in the streets than puddles of leaking oil. :smile:

I predict that very soon, a spinning disk will be about as rare and anachronistic as the horse drawn carriages in Central Park. Maybe 5 more years? In the server world, that is. For desktops, I think SSD's should be mandatory... there's not a whole lot of reason to pick a spinner over an SSD in any laptop/desktop anymore, and the performance is so worth it. Stick with rotating platters if you must for your archive of ripped blu-rays but let's boot and run programs from solid state please. :)

In the meanwhile I could see hybrid drives maybe taking a larger chunk of market share.

TheMawn 2014-08-16 16:47

The hybrids are good for people who don't know how to manage two drives, since I believe they just cache the most commonly used files to the SSD portion.

I think the SSD sweet spot has moved from 128GB to 256GB. The 64GB ones aren't much cheaper than 128GB at this point. Even the 256GB ones are getting to be a bit more economical. It's all very exciting.

The thing I'm really looking forward to is the m.2 format that plugs directly into the motherboard and draws its (measly) power needs directly from the board. That's two fewer cables :smile:

retina 2014-08-16 16:56

[wildly offtopic]
SSDs are still too unreliable and have very poor endurance. For short term use, one year or less, after which you intend to throw them away (which is wasteful) then perhaps they have a use but I never use anything for such a short time to justify it. Plus they are harder to securely erase and thus require the use of always on FDE to ensure data security.
[/wildly offtopic]

Mark Rose 2014-08-17 01:06

[QUOTE=Mark Rose;380482]Edit: Oh, shite... Sorry. In my hast I edited your message, rather than posting my own. I'm currently using Firefox because Chrome has decided to randomly crash, but only the latter can view YouTube videos without installing Flash.[/QUOTE]

Dude. So confusing.

kracker 2014-08-17 03:23

[QUOTE=Mark Rose;380547]Dude. So confusing.[/QUOTE]

[url]http://www.computerworld.com/s/article/9174581/Google_s_Chrome_now_silently_auto_updates_Flash_Player[/url] :whistle:

[QUOTE=retina;380533][wildly offtopic]
SSDs are still too unreliable and have very poor endurance. For short term use, one year or less, after which you intend to throw them away (which is wasteful) then perhaps they have a use but I never use anything for such a short time to justify it. Plus they are harder to securely erase and thus require the use of always on FDE to ensure data security.
[/wildly offtopic][/QUOTE]

That was years ago... they still aren't reliable as HDD's I think, but still...

EDIT: For example: Specs on the Intel 730 SSD: [url]http://ark.intel.com/products/81038/Intel-SSD-730-Series-240GB-2_5in-SATA-6Gbs-20nm-MLC[/url]

Madpoo 2014-08-17 03:34

[QUOTE=retina;380533][wildly offtopic]
SSDs are still too unreliable and have very poor endurance. For short term use, one year or less, after which you intend to throw them away (which is wasteful) then perhaps they have a use but I never use anything for such a short time to justify it. Plus they are harder to securely erase and thus require the use of always on FDE to ensure data security.
[/wildly offtopic][/QUOTE]

Yeah... I mean, consumer MLC's have RMW cycles of maybe 3,000. Wear leveling helps and all that, along with having 10% or so of space reserved for replacing failed sections... but still.

Enterprise MLC drives are the current hotness for servers, hitting a sweetish spot of price/performance, but even then they have maybe 30,000 write cycles, and that's only done by picking the best chips out of the bin.

For truly awesome reliability it still boils down to SLC modules with over 100,000 write cycles but they cost quite a few pennies more.

Still, for typical consumer use, 3000 write cycles and wear leveling still means you could write and re-write a lot of data each day for years and years before you reach any limits. For typical server use, 30,000 cycles holds about true... I have spinning disks start failing after maybe 5 years and I'd expect an enterprise MLC might last about as long.

Where you really start to see the argument fall apart is for really high write applications, like constant SQL writes happening, extremely busy email servers, etc. I wouldn't mind getting some HGST SSDs but their price is kind of incredible.

Anyway, yeah, this is prett off-topic. :)

Let's just say that for the relatively minor needs of Primenet, it doesn't take *too* much to get it running. It's problem right now is the hammering of the DB on a single pair of old U320 drives. All that large data is backed up by only 2 GB of RAM which doesn't help either. I guess any system, even SSD, is going to suffer if it has to page RAM to disk... SSD is fast but not as fast as DRAM.

Which reminds me... applications that require the very fastest speed an reliability use DRAM boards that back their data to disk in the event of power failure or reboots. Yup... even with SSD's getting better and better, DRAM based boards are still very popular with the highest end servers.

That's one reason I like to kit out my SQL servers with enough system RAM to hold most or all of the data in memory... SQL's caching can keep the readable bits in memory and unless we're doing a ton of writes for some reason, it's very fast to query once it reaches a steady state.

Batalov 2014-08-17 03:44

(Perhaps the tuning thread needs spawning?)

Mark Rose 2014-08-17 03:51

[QUOTE=kracker;380551][url]http://www.computerworld.com/s/article/9174581/Google_s_Chrome_now_silently_auto_updates_Flash_Player[/url] :whistle:
[/quote]

I meant about him editing my post.

[quote]
That was years ago... they still aren't reliable as HDD's I think, but still...

EDIT: For example: Specs on the Intel 730 SSD: [url]http://ark.intel.com/products/81038/Intel-SSD-730-Series-240GB-2_5in-SATA-6Gbs-20nm-MLC[/url][/QUOTE]

I would say SSDs as a whole are reliable enough. I wouldn't run that drive in a database machine, but it would be excellent for desktop use. Perhaps I'll get one for home. At work, all our MySQL and Cassandra machines are SSD only. Our desktop machines use SSDs for the OS. I've seen more spinning rust fail than SSDs fail over the last two years.

Madpoo 2014-08-17 03:58

[QUOTE=kracker;380551]...they still aren't reliable as HDD's I think, but still...
[/QUOTE]

I love my SSD on my laptop and desktop (and the wife's laptop). But you're right, they can and will eventually die. In fact I had one in my laptop that died last year... it was under warranty so I got a brand new one (and the next model up since Kingston retired the model I had).

But that's just an exclamation point on the notion that EVERYONE should be backing up their important data. Yeah, my drive failed but it was all backed up so it was more just the hassle of reinstalling the OS on a new one, but at least my docs and photos and stuff were okay.

I've had too many times in the past where something died and my backups were either broken or, duh, I forgot to make any. That moment when you format what you thought was a spare floppy (remember those days?) only to find out it had the only copy of your term paper. That kind of stuff changes a man... I back up all my home systems to no less than 3 separate terabyte size drives and keep them in different locations in case of theft/fire/whatever. Maybe I'm paranoid, but I think my wife would kill me if all her photos got lost. :smile:

Let's all be glad George is backing up the Primenet database (and we have copies on the test server now too). It would suck to lose track of which exponents were done, their residues, factors found, etc. How many years of CPU work involved? :) We're talking cloud backups too for the future... don't keep all the eggs in one basket (or two, or even three) if it's important enough.

Madpoo 2014-08-17 04:10

[QUOTE=Mark Rose;380555]I would say SSDs as a whole are reliable enough. I wouldn't run that drive in a database machine, but it would be excellent for desktop use. Perhaps I'll get one for home. At work, all our MySQL and Cassandra machines are SSD only. Our desktop machines use SSDs for the OS. I've seen more spinning rust fail than SSDs fail over the last two years.[/QUOTE]

If I had SSD's on a server, I'd be watching them for signs of failure. At least they report stats on how many bad blocks have been mapped out, and once they use up all of their reserved capacity for that, they really do need to be tossed.

Admins will still use at least RAID 1 no matter what. I mean, I *could* go with RAID 0 or JBOD on a dev box and not lose anything critical, but hey, my time is expensive too and I don't want to spend a day rebuilding a box when I can RAID 1 the thing for an extra couple hundred up front.

Anyway, let's say you have a consumer grade SSD rated at 3,000 write cycles, and let's say you try to maintain a good 20% of free space on the drive, which helps with wear leveling, and it has 10% of capacity reserved for bad block mapping.

You could overwrite the entire contents of the drive several times a day for several years before you really use the thing up, and desktop/laptops don't really do that much writing in general.

Enterprise MLC's like the HGST or Kingston E100 models rated at 30,000 write cycles, well, I think HGST for instance has one of their models rated to be able to write the entire drive contents 25 times a day for 5 years or something before you'd see any degradation.

The good news is that by the time they did fail, there's probably something better, plus I only ever expect the server hardware itself to last about 5 years before it's time to put out to pasture.

Mark Rose 2014-08-17 05:18

[QUOTE=Madpoo;380558]If I had SSD's on a server, I'd be watching them for signs of failure. At least they report stats on how many bad blocks have been mapped out, and once they use up all of their reserved capacity for that, they really do need to be tossed.

Admins will still use at least RAID 1 no matter what. I mean, I *could* go with RAID 0 or JBOD on a dev box and not lose anything critical, but hey, my time is expensive too and I don't want to spend a day rebuilding a box when I can RAID 1 the thing for an extra couple hundred up front.[/QUOTE]

We get around that by having redundancy at the machine level. I can knock any one database server offline and the system stays up. So we use RAID 0. If any part of the hardware fails we launch on new hardware from an image and synchronize the new machine to the cluster (which takes a few hours).

My dev box has no RAID. It basically consists of a large screen for running Chrome, [url=http://konsole.kde.org/]Konsole[/url], and [url=http://kate-editor.org/]Kate[/url]. All my work is remote, even the source code editing. I can do a clean reinstall of everything I need in about 15 minutes. I might spend another 15 minutes setting up mprime (for SoB), mfaktc.exe, and tweaking settings (I should really get around to putting my dotfiles on GitHub...). It's really not worth the expense of RAID.

At home my /home is on a RAID of spinning rust. I'm keen on replacing my / SSD with RAIDed SSDs and putting my /home on them, too. Backup is rsync to a remote machine (having a 175 Mbps symmetric connection is handy).

TheMawn 2014-08-17 22:29

[QUOTE=Batalov;380554](Perhaps the tuning thread needs spawning?)[/QUOTE]

I'd say move everything since we started talking about a new server to an Official "New Server" Thread.

I think the discussion is worth having but this thread was really meant for server problems.

LaurV 2014-08-18 02:11

[QUOTE=TheMawn;380593]I'd say move everything since we started talking about a new server to an Official "New Server" Thread.

I think the discussion is worth having but this thread was really meant for server problems.[/QUOTE]
+1

Prime95 2014-08-18 12:59

The Primenet server will be down starting around 7:00PM EDT. If all goes well, it will be revived on a new temporary home with new IP addresses.

Madpoo 2014-08-18 16:42

[QUOTE=Prime95;380627]The Primenet server will be down starting around 7:00PM EDT. If all goes well, it will be revived on a new temporary home with new IP addresses.[/QUOTE]

Yeehaw. We're all expecting it will work great and we've tested the things that could be, but not under any kind of load.

At that point, I expect this "server problems" thread will once again be devoted to server problems, but more along the lines of "this used to work and now doesn't". Well, I'm hopeful that does NOT happen, but there were a few changes along the way.

Note what George said... this will be a temporary solution using a virtual machine, but the software on it is what the replacement physical machine will use. If nothing else, the disk access on the replacement physical box might be faster since it won't be shared with other virtual boxes, but that's about the only difference. This host machine I'm using to temporarily house the Primenet services is beefy. Dual 10-core E5 v2's in there and a bunch of 300GB SAS 15K drives. It'll do the trick for now.

chalsall 2014-08-18 18:04

[QUOTE=Mark Rose;380555]I meant about him editing my post.[/QUOTE]

Truly sorry about that. I made a mistake.

I thought I had clicked on "Quote", but instead it appears I clicked on "Edit". This was to get the "raw" YouTube code using Firefox so I could watch the video using Chrome (until it crashed (yet) again).

I've been trying to "make friends with my hardware" (which, buy the way, I paid for) for a little while now. It seems like Chrome crashes randomly if its windows are on either of the two Intel screens, but doesn't crash if they're on the (slow) Nvidia 1800 driven display.

Hmmmm....

chalsall 2014-08-18 18:15

[QUOTE=Madpoo;380643]Yeehaw. We're all expecting it will work great and we've tested the things that could be, but not under any kind of load.[/QUOTE]

Extreme coolness. :smile:

Mark Rose 2014-08-18 18:27

[QUOTE=chalsall;380656]Truly sorry about that. I made a mistake.

I thought I had clicked on "Quote", but instead it appears I clicked on "Edit". This was to get the "raw" YouTube code using Firefox so I could watch the video using Chrome (until it crashed (yet) again).

I've been trying to "make friends with my hardware" (which, buy the way, I paid for) for a little while now. It seems like Chrome crashes randomly if its windows are on either of the two Intel screens, but doesn't crash if they're on the (slow) Nvidia 1800 driven display.

Hmmmm....[/QUOTE]

It's probably something to do with hardware acceleration. Flash and Chrome both use it. Have you tried disabling it through chrome://flags and see if the crashing stops?

chalsall 2014-08-18 18:48

[QUOTE=Mark Rose;380662]Have you tried disabling it through chrome://flags and see if the crashing stops?[/QUOTE]

No. Not worth my time.

I can do without videos; and I will never enable Flash.

chalsall 2014-08-18 19:19

[QUOTE=Mark Rose;380662]It's probably something to do with hardware acceleration. Flash and Chrome both use it. Have you tried disabling it through chrome://flags and see if the crashing stops?[/QUOTE]

I find it interesting that Chrome's default settings are not explicitely defined.

And then in the following list I find many options I didn't explicily agree to were set to "Default".

"Careful, these experiments may bite"....

Edit: Carol Burnett and Robin Williams:

[url]https://www.youtube.com/watch?v=jfDyTUiL8xs[/url]

snme2pm1 2014-08-18 22:57

[QUOTE=Prime95;380627]The Primenet server will be down starting around 7:00PM EDT[/QUOTE]

Type Domain Name IP Address TTL
A mersenne.org 66.181.10.42 10 min

Madpoo 2014-08-18 23:08

[QUOTE=snme2pm1;380688]Type Domain Name IP Address TTL
A mersenne.org 66.181.10.42 10 min[/QUOTE]

That's the current (old) address...

I've just taken the old web server offline so I can do a final SQL log ship to the new server.

George will be making the DNS changes shortly, but if you're itching to see the new site in action before that's done, you can modify your hosts file to point to the new server.

As of this moment, we're considering the old server's data "dirty" and the new server as "clean", so rest assured that if you manually point to the new server, any exponent activity will survive.

The entries you'd want to add are:

63.251.213.73 [url]www.mersenne.org[/url]
63.251.213.73 mersenne.org
63.251.213.73 v5.mersenne.org
63.251.213.73 ftp.mersenne.org
63.251.213.73 v5www.mersenne.org

I'd remind you to remove those HOSTS entries once the actual DNS changes are made, because at some point in the near future we'll migrate the services to the replacement hardware back and everything will go back to the old IP addresses.

Madpoo 2014-08-18 23:10

[QUOTE=Madpoo;380689]The entries you'd want to add are:

63.251.213.73 [url]www.mersenne.org[/url]
63.251.213.73 mersenne.org
63.251.213.73 v5.mersenne.org
63.251.213.73 ftp.mersenne.org
63.251.213.73 v5www.mersenne.org

I'd remind you to remove those HOSTS entries once the actual DNS changes are made, because at some point in the near future we'll migrate the services to the replacement hardware back and everything will go back to the old IP addresses.[/QUOTE]

And on that note, the new server is LIVE, pending those DNS changes George will be making.

I was trying to think of something pithy to say at a moment such as this, but all I can come up with are: "Let 'er rip!"

kracker 2014-08-18 23:48

[QUOTE=Madpoo;380691]And on that note, the new server is LIVE, pending those DNS changes George will be making.

I was trying to think of something pithy to say at a moment such as this, but all I can come up with are: "Let 'er rip!"[/QUOTE]

Getting:
[code]
Not Found

HTTP Error 404. The requested resource is not found.
[/code]

TheMawn 2014-08-18 23:48

As of this moment the website is inaccessible for me. Not to sound impatient or anything, just letting you know in case this is unusual.

retina 2014-08-18 23:51

DNS propagation can sometimes take a while.

[code]c:\>nslookup mersenne.org 8.8.8.8
Server: google-public-dns-a.google.com
Address: 8.8.8.8

Non-authoritative answer:
Name: mersenne.org
Address: 66.181.10.42


c:\>nslookup mersenne.org 4.2.2.2
Server: b.resolvers.Level3.net
Address: 4.2.2.2

Non-authoritative answer:
Name: mersenne.org
Address: 66.181.10.42
[/code]

Madpoo 2014-08-18 23:52

[QUOTE=TheMawn;380697]As of this moment the website is inaccessible for me. Not to sound impatient or anything, just letting you know in case this is unusual.[/QUOTE]

Yup. Like I said, we're waiting on some DNS changes right now but if you want to leap forward, you can add those entries to your HOSTS file.

I guess I could put something up on the old server... a place holder saying "if you're seeing this, you're seeing the old server", but that would just confuse anyone who doesn't read this thread and doesn't know what's happening. :smile:

kracker 2014-08-18 23:58

[QUOTE=Madpoo;380699]Yup. Like I said, we're waiting on some DNS changes right now but if you want to leap forward, you can add those entries to your HOSTS file.

I guess I could put something up on the old server... a place holder saying "if you're seeing this, you're seeing the old server", but that would just confuse anyone who doesn't read this thread and doesn't know what's happening. :smile:[/QUOTE]

Got in :razz:

It's fast, but I don't know as there is probably no load on it...(?)

Madpoo 2014-08-19 00:09

[QUOTE=kracker;380701]Got in :razz:

It's fast, but I don't know as there is probably no load on it...(?)[/QUOTE]

Yeah, it's fast. I *hope* it's just as fast when more people are using it.

FYI, George is doing some sanity checks on the migrated database to make sure we're not missing anything critical.

He may also take this time to do some much needed SQL tweaks with the indexes.

Once everything checks out, those DNS changes will happen.

Just on the off chance it's all fubar on the new setup, I'll hold myself personally responsible for taking any changes made on the new site by anyone using a HOSTS file entry to hit the new box and migrate those changes back to the old setup... if it comes to that of course.

Always have a roll-back plan.

TheMawn 2014-08-19 00:28

Cool, cool. I'm in, too.

It certainly is very fast. Work distribution map or top producers list comes up instantly.

I notice the customize... button in the top producers (if you wanted to look at the top-x for all-time stats) is still fairly slow. Small potatoes.

Mark Rose 2014-08-19 00:32

I would lower the TTL on the DNS records to something like 300 seconds instead of 3600 for the next migration so people don't have stale entries for long. Though the current setting of an hour isn't really that bad.

Prime95 2014-08-19 00:38

I'm holding off on the DNS changes while you folks are playing around using the Hosts settings. Consider yourselves beta testers.

I've done a count on the factors and LL results tables on the old server an new server. The counts match. IMO, those are the 2 absolutely critical tables.

retina 2014-08-19 00:44

[QUOTE=Mark Rose;380711]I would lower the TTL on the DNS records to something like 300 seconds instead of 3600 for the next migration so people don't have stale entries for long. Though the current setting of an hour isn't really that bad.[/QUOTE]Even so not all resolvers will honour such short durations. It still can take up to 24 hours to complete an IP address transition. Depending upon your local DNS server selection you might get a new IP within moments or many hours.

Madpoo 2014-08-19 00:49

[QUOTE=retina;380717]Even so not all resolvers will honour such short durations. It still can take up to 24 hours to complete an IP address transition. Depending upon your local DNS server selection you might get a new IP within moments or many hours.[/QUOTE]

Interesting tidbit... last week I was moving equipment around between datacenters so we were temporarily hosting some of our websites out of our LA servers.

We'd lowered our TTL's to 300 seconds during the moves and not too surprisingly, all of the live traffic seemed to move over pretty quick.

What did surprise me was one of the clients in particular that REFUSED to look up the new IP address, even days later.

The bad apple? Pinterest. People can pin our pages, and when they do Pinterest will hit the site to get title, default image, etc and it kept hitting the old IP address even after 4 days. Really weird.

Besides Pinterest everything else got the new one within no more than an hour.

In the past I've seen some really misbehaved proxy servers in particular. I forget the brand name, but it was horrible... it would cache DNS lookups forever, or until it restarted.

preben s 2014-08-19 00:51

Weird!

Entered new IP addr in hosts, and I can connect and login.
If I click anything in the menu from "summary" to "results" the connection will just reset.

Rebooted and still resetting.

Btw. CSS is garbling somewhat, if i address [URL="http://www.mersenne.org"]www.mersenne.org[/URL] instead of mersenne.org.

Prime95 2014-08-19 00:56

We're tracking down a PHP warning message right now....

preben s 2014-08-19 01:07

[QUOTE=preben s;380720]Weird!

Entered new IP addr in hosts, and I can connect and login.
If I click anything in the menu from "summary" to "results" the connection will just reset.

Rebooted and still resetting.

Btw. CSS is garbling somewhat, if i address [URL="http://www.mersenne.org"]www.mersenne.org[/URL] instead of mersenne.org.[/QUOTE]

I guess I have a problem with the transparent proxies, the ISP/country here is using.


All times are UTC. The time now is 13:43.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.