mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > No Prime Left Behind

Reply
 
Thread Tools
Old 2021-08-02, 06:18   #584
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

293516 Posts
Default

Hell no! :-)
gd_barnes is offline   Reply With Quote
Old 2021-08-06, 22:04   #585
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

293516 Posts
Default

Help needed!

Here is the status for SQL databases:

The NPLB stats database is back up.

The PRPnet servers are not back up.

I ran into a space issue on my server machine so I deleted a bunch of old stuff and now there is plenty of space.

The PRPnet servers were previously corrupted as a result of attempting to update themselves when there was not enough space. There appears to be little to no loss of data except maybe the last few results that were submitted.

What I need to do now is dump the databases, drop them, recreate them, and reload them so that the corruption is fixed. This is similar to what I did in Jan. 2020 when there was a corruption. I've done it before so with help from others before I have a good idea of the process involved. Like before I've run into various issues like MySQL not starting. I've gotten that issue fixed with this:

in /etc/mysql/my.cnf I put:
innodb_force_recovery = 5

After rebooting that worked and got SQL restarted. Now I need to dump the databases. But I am stuck. I'm logged in as root. I ran the following to dump one of the databases:

root@jeepford:/etc/mysql# mysqldump -u gary -p prpnet2000 > dump2000.sql

After entering that it prompted me for my password. I entered it. But I'm getting access denied.

Here is the message:
mysqldump: Got error: 1045: Access denied for user 'gary'@'localhost' (using password: YES) when trying to connect

I've also tried it with -u root. Same problem.

I did not have this problem dumping the databases back in Jan. 2020. My password is a standard one that I've used for the life of the machine so that cannot be the issue.

I've done a lot of google searching for this and nothing I have tried seems to work.

I've tried it with SQL both started and stopped. The SQL status looks good at the moment.

Any help would be appreciated.
gd_barnes is offline   Reply With Quote
Old 2021-08-07, 01:32   #586
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

101001001101012 Posts
Default

I have answered my own question in the last post. Now I have another problem. I have now dumped all of the databases. I am now trying to drop them. I get the following error message:

ERROR 1010 (HY000): Error dropping database (can't rmdir './prpnet1400', errno: 13)

Once again I've done a lot of googling. It did tell me that is some sort of permission or access issue. But I've been unable to fix it.

Once again any help would be appreciated.
gd_barnes is offline   Reply With Quote
Old 2021-08-07, 03:17   #587
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

7×11×137 Posts
Default

Once again I answered my own question in the last post.

All of the PRPnet servers are back up and running! No data appears lost.

It is not clear to me whether or not the NPLB stats were corrupted. To be safe I'm going to drop it, restart it, and reload it. It is a huge file (4.3 GB) that will likely take a couple of hours to reload. So the NPLB stats will be down for a couple hours while that is done.

I have extended the expiration time to 1 week for tests that were left in limbo while the servers were down. Since the servers have been down ~5-6 days that'll give everyone ~1-2 days to return their completed tests. If your clients have some completed tests you can restart them and they should return them before requesting new work.

Sorry that the problem occurred right before I left town.

Last fiddled with by gd_barnes on 2021-08-07 at 03:23
gd_barnes is offline   Reply With Quote
Old 2021-08-17, 15:54   #588
AMDave
 
AMDave's Avatar
 
Jan 2006
deep in a while-loop

2·32·37 Posts
Default

Hi Gary. Got your DM. Hi there NPLB! Good to see you again. I will be home tomorrow night (AEST) to login, examine and report.
I have kept my DEV server and the NPLB DR server in 'stasis' in my RedBack rack. Ooh. A few years now I would guess - checking - yup. since late 2016. It has been a while.
I will fire them up for testing. Lets see if those capacitors are as good as the price I paid for them! Else I will yank the HDDs into the workstation. No time to waste.
If the NPLB backup files are good on your end, as we designed them to be, then we have a very high probability of a great outcome.
:)
Sorry everyone. I do not have any pleasant hold music to offer you. You will have to go online and stream your own. It's 2021. Otherwise I will hum. Hum hum hum hum hum. No. You don't want that. Come on. Seriously. Move along please. You can come back later.

Last fiddled with by AMDave on 2021-08-17 at 16:29
AMDave is offline   Reply With Quote
Old 2021-08-17, 17:21   #589
AMDave
 
AMDave's Avatar
 
Jan 2006
deep in a while-loop

10100110102 Posts
Default

Cursory remote examination from afar:
Log files showing the result reception was absent between 02-Aug and 07-Aug but since then all is good with the result reception and the processing mechanism itself appears to be OK also.
The project stats pages appear to be working (on face value)
I will need to check the port-based databases and statistics pages
From this thread it looks like you had an issue with port 2000 but it currently appears to be pumping data.

Maybe one of you fixed the principal issue?
Did you turn it off and on again? :)

I will still log in, shirt-front it and look it up and down and ask it "Are you still a working server?"
Until it says yes ;)

Last fiddled with by AMDave on 2021-08-17 at 17:30
AMDave is offline   Reply With Quote
Old 2021-08-17, 20:36   #590
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

7×11×137 Posts
Default

Quote:
Originally Posted by AMDave View Post
Cursory remote examination from afar:
Log files showing the result reception was absent between 02-Aug and 07-Aug but since then all is good with the result reception and the processing mechanism itself appears to be OK also.
The project stats pages appear to be working (on face value)
I will need to check the port-based databases and statistics pages
From this thread it looks like you had an issue with port 2000 but it currently appears to be pumping data.

Maybe one of you fixed the principal issue?
Did you turn it off and on again? :)

I will still log in, shirt-front it and look it up and down and ask it "Are you still a working server?"
Until it says yes ;)
I'm very happy to hear from you Dave! Thank you for responding!

Around Aug. 1st the hard-drive filled up. This caused a serious corruption of all SQL servers.

Here is what I did on or about Aug. 7th:
1. Deleted a lot of old files to clear some space up. There is now 80-100 GB's of space.
2. Backed up all the SQL servers.
3. Dropped them and re-created them.
4. Reloaded them from the backups.

Yep I've rebooted the machine several times.

This all appeared to work fine. Since then here is what is happening:
1. The PRPnet servers are working fine.
2. The hourly NPLB stats update is not working properly.
3. Sometimes the hourly Port Report works and sometimes it doesn't. At this moment there is nothing in the recent progress by port.
4. The hourly update appears to happen at random times. As of this moment it is ~15:35 local time. The last update was 15:23 just a few mins ago. But the one prior update to that was 13:27. It should be updating once an hour and finish updating at around 5-10 minutes after the hour like it has always done.
5. At this moment the statistics of top participants, tops teams, and stats by server have no data in them. For reference top participants are shown here: http://www.noprimeleftbehind.net/sta...ticipant_stats
When they do work they only show the total stats of the last few months instead of the entire project.

When I re-loaded the NPLB stats DB the grand total stats were all there from Aug. 1st. But for some reason it wants to completely rebuild them each hour or whenever random time that the update actually occurs. I have concluded that the statistics are somehow "added to" once an hour from the grand total that was calculated the previous hour using results received in that prior hour. But for some reason that grand total keeps getting wiped out.

My opinion as to what is happening:
1. Somehow the grand total stats from the previous update time keeps getting wiped out. I suspect this is because it does not have a "pointer" as to when the previous cut-off of the total stats were previously calculated.
2. The server not sensing any previous stats attempts to bring in as many results as it can in order to update the total stats.
3. It realizes there are too many "new" results since it cannot sense the pointer in #1 so it finally stops processing at some random point. This seems to take as long as 2-3 hours sometimes hence the random times of the updates.
4. Based on #3 sometimes the new stats are left blank and sometimes they are just left with stats totals over the last few months. At this moment as I write this they are blank.

I have limited knowledge of SQL DB's. The process here for NPLB stats is complex due to the hourly update but for the PRPnet servers it's not too bad so I was able to get them all reset and running properly. We need some help on the NPLB stats.

Thank you!

Gary

Last fiddled with by gd_barnes on 2021-08-17 at 20:40
gd_barnes is offline   Reply With Quote
Old 2021-08-18, 15:10   #591
AMDave
 
AMDave's Avatar
 
Jan 2006
deep in a while-loop

29A16 Posts
Default

Your port servers are still live and available so keep em coming ;)

Stats Maintenance in progress.
Some issues found.
Some tables rebuilt.
Stats refreshes are back on line.
Tomorrow I will add a preventative patch.

Last fiddled with by AMDave on 2021-08-18 at 15:21
AMDave is offline   Reply With Quote
Old 2021-08-18, 19:05   #592
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

7×11×137 Posts
Default

At this moment (14:05 local time) the hourly port report, top participants, and top teams all show nothing. This includes the progress_crosstab, participant_stats, and team_stats tables.

The stats by server (server_stats) only has the recent 3 servers and the stats are all duplicated.

The last update is at 13:09 local time so that is as expected.

***

Edit: Another update at 14:09. Things are different now. The hourly port report (progress_crosstab), top participants (participant_stats), and stats by server (server_stats) all show nothing. The top teams (team_stats) only has recent teams in it.

This is fairly similar to what was happening before. The main difference is that it updates hourly when it should.

***

Edit 2: Another update at 15:10. Different again. progress_crosstab is empty, participant_stats has only last few months data in it, team_stats is empty, and server_stats only has the last few months servers in it.

It only seems to want to include recent stats. Older stats seem to be wiped out. Another thing that is different right now is that the hourly port report (progress_crosstab) has been empty every hour. Before (since Aug. 7th) it was working a majority of the time.

Last fiddled with by gd_barnes on 2021-08-18 at 21:04
gd_barnes is offline   Reply With Quote
Old 2021-08-19, 14:31   #593
AMDave
 
AMDave's Avatar
 
Jan 2006
deep in a while-loop

2·32·37 Posts
Default

Patches applied.
Stats are back out of maintenance mode.
Monitoring in progress.
AMDave is offline   Reply With Quote
Old 2021-08-19, 19:28   #594
gd_barnes
 
gd_barnes's Avatar
 
May 2007
Kansas; USA

244658 Posts
Default

Looks great!
gd_barnes is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
prime64 causing severe unresponsiveness Freightyard Software 14 2011-11-11 00:22

All times are UTC. The time now is 15:40.


Sun Nov 28 15:40:21 UTC 2021 up 128 days, 10:09, 0 users, load averages: 1.50, 1.25, 1.18

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.