mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > PrimeNet

Reply
 
Thread Tools
Old 2020-05-07, 14:29   #1
Stampeder
 
Jan 2018

5 Posts
Question ntprime64 service sometimes stops communicating with server

Hi,

I'm not sure if this has been reported before, but I'm noticing that when running the ntprime64.exe installed as a Windows service on Windows 10 v1909 (also observed in v1903 before updating to v1909), it frequently does not check-in with PrimeNet (late check-in notification option is activated for all PCs on mersenne.org so that I will know when a computer is late in reporting).

Checking the prime.log for the affected computers shows that the log stopped updating since the last check-in.

Computer setup:
- All 4 desktops are on the same network.
- Firewall is disabled on all PCs and is managed on the domain server's end, and access is verified.
- The Prime95 folder is created either at the root of D:\ or in Public user's Public Documents folder (service account's access is verified for all PCs).
- All computers run 24hrs.

No other known factors that I can think of that actively prevents ntprime64.exe from updating prime.log AND checking-in to PrimeNet. With the setup being very similar, if one computer has a problem, all computers are expected to have the same problem. They may restart automatically due to Windows Updates, but the symptoms are not correlated to such events (e.g. issue occurs even when there are no Updates or restarts).

And the weird thing is that the symptom does not always occur.

For example, one day I could receive the late check-in notification email for PC1 & PC2. Another day, it could be PC3 & PC4. Another day, it could be PC1 and PC4, etc. When the late check-in occurs, I will log in to the computer remotely to restart the service, which usually would cause it to check-in with PrimeNet immediately, but sometimes it doesn't, but would usually still check-in sometime later when the next scheduled communication time comes around. Once, a PC did not check-in for 3 days despite having restarted the service twice during the period, and I had to stop the Service and run prime95.exe to send the results, before exiting prime95 and resuming the service.

The ntprime64.exe is still using the CPU as expected (and similar to prime95.exe's consumption), and the .bu files are still being modified at the time of me logging in to check on the affected computers. Just that no comms with server and prime.log does not show any interim update or any other activities. The last activity is usually the last comms time, which is the same as shown on mersenne.org CPU details.

The ntprime Service version is v29.8 build 6. The GUI version is the same. Both are the 64-bit version.
I've not used the service version before. This is my first time using the Service version. I've always used the normal GUI version in the past couple of years.
I believe I've setup the Service correctly, according to the instructions. What I did was to unzip prime95 version into the directory, then followed by the Service version into the same directory and overwriting everything. I then run prime95 to set things up for new computers or to reconfigure the settings for existing computers before installing the service and then running only the service. I made sure that the Services are running all okay and communicating with the server before I concluded that they are set up correctly.


Any suggestions to overcome this anomaly would be greatly appreciated, if it is not a bug in the program. Let me know if you guys need more information.


Thanks!

Last fiddled with by Stampeder on 2020-05-07 at 14:35 Reason: Clarified on how I set up the Service.
Stampeder is offline   Reply With Quote
Old 2020-05-07, 20:21   #2
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

32×853 Posts
Default

Look in prime.log for a potentially useful message.

You can add this in prime.txt to get more detailed comm data:

[PrimeNet]
Debug=2
Prime95 is offline   Reply With Quote
Old 2020-05-19, 01:10   #3
Stampeder
 
Jan 2018

5 Posts
Default

Hi Prime95,

Thank you! I've since changed the Debug=0 to Debug=2 in prime.txt.

Today, on one of those machines, the ntprime64.exe simply stopped performing work. The last update was on 17 May, and I received the email notification that it was late for reporting on 18 May. I connected to it just now (19 May) and found that it wasn't utilising any CPU at all.

(1) I stopped the NTPrimeService service and then added Debug=2 to prime.txt, then started the service again. prime.log showed the usual comms with server and one past assignment was uploaded successfully, but the Service still doesn't consume any CPU thereafter. This appears to me that NTPrimeService is still able to communicate with the server to send the updates, but doesn't do any actual work.

(2) I stopped the service and then launched prime95.exe to see what the GUI says. The GUI reported nothing out of the ordinary. It started with the pre-test checks and then proceeded to start working on the current assignment. I checked prime.log, but there's no additional lines since the update in (1).

(3) I then exited prime95.exe and started the NTPrimeService service. It used a bit of CPU for a few seconds (similar to doing pre-test checks) before stopping completely.

(4) I then tried restarting Windows, but it didn't change the situation.
Next, I downloaded a fresh set of the NTPrimeService files and replaced all existing files in the folder and started the service, but that didn't change the situation too.
prime.log didn't show any new texts despite setting Debug=2 since (1).

(5) I now fall back to using prime95.exe to do the work, which appears to still work as expected.
The remaining 3 machines are still working fine with the Service version.

The mystery continues...

Last fiddled with by Stampeder on 2020-05-19 at 01:15 Reason: added additional info about the other machines to point (5)
Stampeder is offline   Reply With Quote
Old 2020-07-13, 20:10   #4
jbpace
 
"Jon Pace"
Jan 2018
Germantown, TN

13 Posts
Default Thank the heavens it's not just me!!

Quote:
Originally Posted by Stampeder View Post
when running the ntprime64.exe installed as a Windows service on Windows 10, it frequently does not check-in with PrimeNet
I'm currently running ntprime64 on 17 Win10/Server19 machines and experiencing somewhat similar problems. Here's what I'd identified before finding your post:
  • communicating properly - PCs with Win7 in-place upgrade to Win10
  • not communicating - PCs with Win10/Server19 clean install
  • (restarting ntprime64 on problem machines always results in immediate check-in)

After reading George's (@Prime95) reply, I set debug=2 on one problematic machine and checked the log after 7 days (i.e. during my [annoying manual] weekly ntprime64 service restarts):
  • communication (check-in) with PrimeNet immediately upon ntprime64 restart
  • communication (check-in) with PrimeNet once after that (I think it was 6 or 12 hours later)
  • zero further communication attempts

Unlike yours, all my machines continue processing as long as they have work queued. My machines simply never check-in, not even after completing assignments.

Since I'd noted a difference depending on whether ntprime64 was already installed when upgraded to Win10, I've been (unsuccessfully) trying to figure out if it's some sort of permission setting.

Manually restarting services is getting pretty old, so I'd love to find a solution to whatever is causing the problem. Does Anyone Have Any Idea What Could Be Causing This Problem

Quote:
(late check-in notification option is activated for all PCs on mersenne.org so that I will know when a computer is late in reporting)
Are you receiving late check-in notices? All my machines have that flag set on mersenne.org, but I haven't received a notice in over two years. I just assumed the notification system was broken.

Thanks,
JP
jbpace is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Interglovent Stops storm5510 MISFIT 12 2019-03-06 19:35
Windows Server 2012 R2 Prime Service automatisation Tutti Information & Answers 1 2016-11-25 23:42
ex-Prime95 stops responding, bereft of life TheMawn Software 11 2015-05-29 02:22
Prime95 Stops Mid-Test, Starts New One jinydu Lounge 25 2008-09-08 02:35
How to fix the Prime95 Service E_tron Software 1 2003-10-05 03:55

All times are UTC. The time now is 13:25.


Tue Nov 30 13:25:05 UTC 2021 up 130 days, 7:54, 0 users, load averages: 0.98, 1.26, 1.24

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.