mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2013-02-05, 21:32   #1
TObject
 
TObject's Avatar
 
Feb 2012

34×5 Posts
Default CLOCK_WATCHDOG_TIMEOUT (101) analysis

Does prime95 issue Inter-processor Interrupts (IRQ 29, Hex 1D)?

Since IRQ 29 is higher than the clock (IRQ 28), if IRQ 29 not processed before the timeout it can trigger the bug check.

I have ntprime running on a dual processor (four logical cores) 32-bit Xeon machine. One logical core on each processor is dedicated to LL testing its own exponent. Prime.txt throttle setting is below 100.

The system is running at stock speed – no overclocking of any kind. ECC memory.

Code:
CLOCK_WATCHDOG_TIMEOUT (101)
An expected clock interrupt was not received on a secondary processor in an
MP system within the allocated interval. This indicates that the specified
processor is hung and not processing interrupts.
Arguments:
Arg1: 00000030, Clock interrupt time out interval in nominal clock ticks.
Arg2: 00000000, 0.
Arg3: 80739120, The PRCB address of the hung processor.
Arg4: 00000003, 0.


0: kd> !prcb 3
PRCB for Processor 3 at 80739120:
Current IRQL -- 0
Threads--  Current 86f89418 Next 00000000 Idle 8073d1e0
Number 3 SetMember 8
Interrupt Count -- 08dd14e2
Times -- Dpc    0000016d Interrupt 000001b4 
         Kernel 0088cd33 User      085447af 

0: kd> !pcr 3
KPCR for Processor 3 at 80739000:
    Major 1 Minor 1
	NtTib.ExceptionList: ffffffff
	    NtTib.StackBase: 00000000
	   NtTib.StackLimit: 00000000
	 NtTib.SubSystemTib: 8073b130
	      NtTib.Version: 065f49b9
	  NtTib.UserPointer: 00000008
	      NtTib.SelfTib: 7ffdd000

	            SelfPcr: 80739000
	               Prcb: 80739120
	               Irql: 0000001d
	                IRR: 00000000
	                IDR: ffffffff
	      InterruptMode: 00000000
	                IDT: 80740950
	                GDT: 80740550
	                TSS: 8073b130

	      CurrentThread: 86f89418
	         NextThread: 00000000
	         IdleThread: 8073d1e0

	          DpcQueue: 

3: kd> !thread
THREAD 86f89418  Cid 00e4.0204  Teb: 7ffdd000 Win32Thread: 00000000 RUNNING on processor 3
Not impersonating
DeviceMap                 8d008728
Owning Process            86f7cd90       Image:         ntprime.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      148706537      Ticks: 203 (0:00:00:03.171)
Context Switch Count      39261706       IdealProcessor: 3             
UserTime                  25 Days 06:17:22.093
KernelTime                00:03:07.812
Win32 Start Address 0x01e0c145
Stack Init 8f091000 Current 8f090d00 Base 8f091000 Limit 8f08e000 Call 0
Priority 1 BasePriority 1 PriorityDecrement 0 IoPriority 2 PagePriority 5
ChildEBP RetAddr  Args to Child              
00000000 00000000 00000000 00000000 00000000 0x0
The analysis above seems to point that ntprime (the service version of Prime95) may have been responsible for putting the processor in IRQ 29 for too long. I realize Prime95 is an extremely stable and well regarded piece of software engineering. So I am not proposing that it is buggy.

Anyone would like to discuss my findings; and maybe suggest additional debugging techniques to figure out why the computer crashed?
TObject is offline   Reply With Quote
Old 2013-02-05, 21:50   #2
TObject
 
TObject's Avatar
 
Feb 2012

34·5 Posts
Default

BTW, processor 1 (another Prime95 LL thread), was also at IRQ Level 29 at the itme:

Code:
1: kd> !pcr
KPCR for Processor 1 at 806d1000:
    Major 1 Minor 1
	NtTib.ExceptionList: ffffffff
	    NtTib.StackBase: 00000000
	   NtTib.StackLimit: 00000000
	 NtTib.SubSystemTib: 806d3130
	      NtTib.Version: 03aa5b5c
	  NtTib.UserPointer: 00000002
	      NtTib.SelfTib: 7ffdb000

	            SelfPcr: 806d1000
	               Prcb: 806d1120
	               Irql: 0000001d
	                IRR: 00000000
	                IDR: ffffffff
	      InterruptMode: 00000000
	                IDT: 806d8950
	                GDT: 806d8550
	                TSS: 806d3130

	      CurrentThread: 86f896d0
	         NextThread: 00000000
	         IdleThread: 806d51e0

	          DpcQueue: 
1: kd> !thread
THREAD 86f896d0  Cid 00e4.000c  Teb: 7ffdb000 Win32Thread: 00000000 RUNNING on processor 1
Not impersonating
DeviceMap                 8d008728
Owning Process            86f7cd90       Image:         ntprime.exe
Attached Process          N/A            Image:         N/A
Wait Start TickCount      148706535      Ticks: 205 (0:00:00:03.203)
Context Switch Count      15073898       IdealProcessor: 1             
UserTime                  25 Days 06:18:46.234
KernelTime                00:02:53.109
Win32 Start Address 0x01e0c145
Stack Init 8f309000 Current 8f308d00 Base 8f309000 Limit 8f306000 Call 0
Priority 1 BasePriority 1 PriorityDecrement 0 IoPriority 2 PagePriority 5
ChildEBP RetAddr  Args to Child              
00000000 00000000 00000000 00000000 00000000 0x0
Could the two Prime95 threads deadlock on each other?
TObject is offline   Reply With Quote
Old 2013-02-05, 22:07   #3
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11101010001002 Posts
Default

Quote:
Originally Posted by TObject View Post
Could the two Prime95 threads deadlock on each other?
Deadlocks and race conditions are some of the hardest bugs to find and fix. Could there be a multithreaded problem in prime95? Certainly. I'd expect such problems to be rare as this is the first I've heard of IRQ 29.
Prime95 is online now   Reply With Quote
Old 2013-02-05, 23:26   #4
TObject
 
TObject's Avatar
 
Feb 2012

6258 Posts
Default

Thank you for the quick reply. Unfortunately the system was configured to create a Kernel memory dump, rather than Complete Memory Dump, and the memory pages that could shed the light on exactly what was going on are missing.

The Complete memory dump option is not available on computers that are running 32-bit Windows with 2 gigabytes or more of RAM.
TObject is offline   Reply With Quote
Old 2013-02-05, 23:53   #5
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

22×1,873 Posts
Default

Quote:
Originally Posted by TObject View Post
Thank you for the quick reply. Unfortunately the system was configured to create a Kernel memory dump, rather than Complete Memory Dump, and the memory pages that could shed the light on exactly what was going on are missing.

The Complete memory dump option is not available on computers that are running 32-bit Windows with 2 gigabytes or more of RAM.
No worries. Sorry, but I'm not going to do any debugging by looking at a memory dump.
Prime95 is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Introduction to Aliquot Analysis Dubslow Aliquot Sequences 25 2015-12-26 13:22
Complexity analysis of 3 tests kurtulmehtap Math 10 2013-03-20 14:15
Dimensional analysis davieddy Puzzles 9 2011-08-02 09:59
Analysis puzzle Kees Puzzles 19 2007-04-12 14:47
mersenne analysis troels munkner Miscellaneous Math 2 2006-07-17 03:18

All times are UTC. The time now is 20:02.

Thu May 13 20:02:13 UTC 2021 up 35 days, 14:43, 0 users, load averages: 2.81, 2.67, 2.53

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.