there are bugs in cpus!

so don’t always say “the problem sits in front of the computer” – that’s simply not always true – if the hardware and software leaves the factory – with problems “preinstalled”.

the problem started with the invention of the computer… and it goes on and on and on… an causes a lot of stress.

people are not gods… they make mistakes… when these mistakes accumulate… you have something like tschernobyl happening.

or a 3rd world war…

or a company going out of business because their IT infrastructure has collapsed like a card board house.

MS: “This problem occurs because incorrect interrupts are generated on the computer that uses Intel processors that are code-named Nehalem.

These interrupts are caused by a known erratum that is described in the following Intel documents.”

So if intel is a fair player… they take back their faulty cpus and give us new ones that actually work.

… this problem only seems to happen with the cpu beeing under load.

what tech companies say and suffer

“We confirm the same exact error with 2K8/R2 HyperV servers with 5500 Series Xeons.

The problem seems intermittent and hits us at least every other day when the machines are under load.

We are losing credibility quickly with our customers after selling them on the stability benefits of R2 and the incredible performance advantages.”

http://support.microsoft.com/kb/975530

what “arrogant” intel says (i don’t understand a word)

AAK1. MCi_Status Overflow Bit May Be Incorrectly Set on a Single Instance of a DTLB Error Problem: A single Data Translation Look Aside Buffer (DTLB) error can incorrectly set the Overflow (bit [62]) in the MCi_Status register. A DTLB error is indicated by MCA error code (bits [15:0]) appearing as binary value, 000x 0000 0001 0100, in the MCi_Status register. Implication: Due to this erratum, the Overflow bit in the MCi_Status register may not be an accurate indication of multiple occurrences of DTLB errors. There is no other impact to normal processor functionality. Workaround:None identified. Status: For the steppings affected, see the Summary Tables of Changes.

http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-5500-specification-update.pdf

what microsoft says

http://social.technet.microsoft.com/Forums/windowsserver/en-US/116a0220-6082-47d7-9bcf-bdde87c3ddf7/hyperv-server-2008-r2-bluescreenprobably-clockwatchdogtimeout?forum=windowsserver2008r2virtualization

Consider the following scenario:

A computer is running Windows Server 2008 R2 and has the Hyper-V role installed. This computer uses one or more Intel CPUs that are code-named Nehalem. For example, the Nehalem CPU for a server is from the Intel Xeon processor 5500 series, and the Nehalem CPU for a client is from the Intel Core-i processor series. In this scenario, you receive the following Stop error message:

0x00000101 (parameter1, 0000000000000000, parameter3, 000000000000000c) CLOCK_WATCHDOG_TIMEOUT

This problem occurs because incorrect interrupts are generated on the computer that uses Intel processors that are code-named Nehalem. These interrupts are caused by a known erratum that is described in the following Intel documents. To view these Intel documents, click the following links:

Intel Xeon Processor 5500 Series Specification Update, September 2009

http://www.intel.com/assets/pdf/specupdate/321324.pdf

Intel Core i7-800 and Intel Core i5-700 Desktop Processor Series Specification Update, October 2009

http://download.intel.com/design/processor/specupdt/322166.pdf

Microsoft provides third-party contact information to help you find technical support. This contact information may change without notice. Microsoft does not guarantee the accuracy of this third-party contact information.

possible workaround

disable ACPI in bios or play around with windows registry:

You can disable the Advance Configuration and Power Interface (ACPI) C-states by using a BIOS firmware option on the computer.

If the firmware does not include this option, a software workaround is available.

You can disable the ACPI C2-state and C3-state by setting a registry key.

To do this, follow these steps:

At a command prompt, run the following command:

reg add HKLMSystemCurrentControlSetControlProcessor /v Capabilities /t REG_DWORD /d 0x0007e066

Restart the computer.

Note The computer idle power consumption will increase significantly if the deeper ACPI C-states (processor idle sleep states) are disabled. Windows Server 2008 R2 uses these deeper C-states on the Xeon 5500 series as a key energy-saving feature.

To continue to benefit from these energy-saving states, remove the registry key that you set in step 1 after you install the hotfix that this article describes. To do remove the registry key, follow these steps: At a command prompt, run the following command: reg delete HKLMSystemCurrentControlSetControlProcessor /v Capabilities /f Restart the computer.

complete crashdump

Loading User Symbols
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 101, {60, 0, fffff880009ea180, 1}

Probably caused by : ntkrnlmp.exe ( nt! ?? ::FNODOBFM::`string'+4e3e )

Followup: MachineOwner
---------

0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

CLOCK_WATCHDOG_TIMEOUT (101)
An expected clock interrupt was not received on a secondary processor in an
MP system within the allocated interval. This indicates that the specified
processor is hung and not processing interrupts.
Arguments:
Arg1: 0000000000000060, Clock interrupt time out interval in nominal clock ticks.
Arg2: 0000000000000000, 0.
Arg3: fffff880009ea180, The PRCB address of the hung processor.
Arg4: 0000000000000001, 0.

Debugging Details:
------------------

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  DRIVER_FAULT

BUGCHECK_STR:  0x101

CURRENT_IRQL:  d

LAST_CONTROL_TRANSFER:  from fffff8000282e443 to fffff80002881f00

STACK_TEXT:
fffff880`0315c508 fffff800`0282e443 : 00000000`00000101 00000000`00000060 00000000`00000000 fffff880`009ea180 : nt!KeBugCheckEx
fffff880`0315c510 fffff800`0288a5f7 : fffff800`00000000 fffff800`00000001 00000000`00002626 00000000`00000000 : nt! ?? ::FNODOBFM::`string'+0x4e3e
fffff880`0315c5a0 fffff800`02e01090 : 00000000`00000000 fffff880`0315c750 fffff800`02e1c3c0 fffff800`00000000 : nt!KeUpdateSystemTime+0x377
fffff880`0315c6a0 fffff800`0287e3f3 : 00000000`2c9cd682 fffff800`02e1c3c0 00000000`00001000 fffff780`00000001 : hal!HalpRtcClockInterrupt+0x130
fffff880`0315c6d0 fffff800`0288fdf8 : fffffa80`00000000 00000000`00000001 00000000`00000000 00000000`00000000 : nt!KiInterruptDispatchNoLock+0x163
fffff880`0315c860 fffff800`02895f70 : 00000000`00003d55 00000000`00000001 00000000`00000002 00000000`00003d4d : nt!KeFlushMultipleRangeTb+0x258
fffff880`0315c930 fffff800`02902a0e : fffff800`02a0cb40 fffff880`00000001 00000000`00000001 fffff880`0315cbb0 : nt!MiAgeWorkingSet+0x651
fffff880`0315cae0 fffff800`028966e2 : 00000000`000065ab 00000000`00000000 fffffa80`00000000 00000000`00000008 : nt! ?? ::FNODOBFM::`string'+0x49926
fffff880`0315cb80 fffff800`0289696f : 00000000`00000008 fffff880`0315cc10 00000000`00000001 fffffa80`00000000 : nt!MmWorkingSetManager+0x6e
fffff880`0315cbd0 fffff800`02b25166 : fffffa80`018cbb60 00000000`00000080 fffffa80`018ad040 00000000`00000001 : nt!KeBalanceSetManager+0x1c3
fffff880`0315cd40 fffff800`02860486 : fffff800`029fae80 fffffa80`018cbb60 fffff800`02a08c40 00000000`00000000 : nt!PspSystemThreadStartup+0x5a
fffff880`0315cd80 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxStartSystemThread+0x16

FOLLOWUP_IP:
nt! ?? ::FNODOBFM::`string'+4e3e
fffff800`0282e443 cc               int     3

SYMBOL_STACK_INDEX:  1

FOLLOWUP_NAME:  MachineOwner

SYMBOL_NAME:  nt! ?? ::FNODOBFM::`string'+4e3e

MODULE_NAME:  nt

IMAGE_NAME:  ntkrnlmp.exe

DEBUG_FLR_IMAGE_TIMESTAMP:  4a5bc600

STACK_COMMAND:  kb

FAILURE_BUCKET_ID:  X64_0x101_nt!_??_::FNODOBFM::_string_+4e3e

BUCKET_ID:  X64_0x101_nt!_??_::FNODOBFM::_string_+4e3e

Followup: MachineOwner
---------

liked this article?

  • only together we can create a truly free world
  • plz support dwaves to keep it up & running!
  • (yes the info on the internet is (mostly) free but beer is still not free (still have to work on that))
  • really really hate advertisement
  • contribute: whenever a solution was found, blog about it for others to find!
  • talk about, recommend & link to this blog and articles
  • thanks to all who contribute!
admin