Capturing Application Crash Dumps

Capturing Application Crash Dumps

Following on from our posts on the Basic Troubleshooting Toolkit and Basic Debugging of an Application Crash, let's talk about actually capturing Application crash dumps and failures. Most administrators are familiar with the Dr. Watson for Windows tool that has been around since the days of Windows NT. An updated version of this tool, DrWtsn32, still exists in Windows XP and Windows Server 2003 - but not in Windows Vista or Windows Server 2008. So how do we capture user-mode dump files? We're going to cover several different methods for capturing dump files for User-mode application crashes.

First - let's quickly cover Dr. Watson for Windows XP and Windows Server 2003. Dr. Watson captures user-mode dump information. Whenever a user-mode process (such as Internet Explorer or the Print Spooler) crashes, Dr. Watson creates a text file, DrWtsn32.log. Dr. Watson can also be configured to create a crash dump file that can be loaded into a debugger. Let's look at the configuration for Dr. Watson. The first thing we have to do is configure Dr. Watson as our default debugger. To do this we run the following command: drwtsn32 -i. What this does is modify two registry values located in HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\AeDebug. The values are as follows:

Value Name = AutoType = String (REG_SZ)Data Value = 1 or 0. (Default is 1)

Value Name = DebuggerType = String (REG_SZ)Data Value = drwtsn32 -p %ld -e %ld -gNOTE: This data value (drwtsn32 -p %ld -e %ld -g) is specific to Dr. Watson. Alternative debuggers will have their own values and parameters.

So now that Dr. Watson is our default debugger, it's time to go set our parameters. Run the drwtsn32 command, to bring up the configuration options shown below:

The first two options are fairly self-explanatory - the location in which Dr. Watson should save the Log File and Crash Dump when they are generated. By default, this is in the All Users profile path:drive:\Documents and Settings\All Users.WINNT\Application Data\Microsoft\Dr Watson. If you change these locations, you must ensure that all users have write permission to the new location. Otherwise, users will be prompted to select a location in which to save the files.

The "Number of Instructions" parameter specifies how many instructions preceding and following the faulty instruction are included in the disassembly portion of the log file. The possible values for this parameter range from 0 to 500. The disassembly portion includes the function being executed when the error occurred, the memory address, raw machine instruction, and decoded machine instruction for each adjacent instruction and an analysis of the faulty instruction. The default value is 10 (0xA in Hexadecimal).

The "Number of Errors to Save" parameter

specified how many errors should be maintained in its application error viewer and in the Event Viewer application log. By default, this value is 10 (0xA in Hexadecimal). The possible values range from 0 to 4,294,967,295! In reality though, you would not want to set the value this high. When the number of recorded errors reaches the value of this entry, Dr. Watson will continue to add errors to the log file and the dump - but will not add errors to its own log viewer or the Application Event log until it is reset using the "Clear" button or the value is increased.

The "Dump Symbol Table" dumps the symbol table for each module. Selecting this option can cause log files to become very large! The "Dump all Thread Contexts" specifies whether Dr. Watson will log a state dump for each thread in the program that failed or only the faulting thread. The other options are self-explanatory. Each of the parameters in this dialog are stored in the HKEY_LOCAL_MACHINE\Software\Microsoft\DrWatson key. The Log File and Dump File path are not present by default - if you change the location of these files, you will see the Registry values for these options.

So that wraps up Dr. Watson - but what if you're running Windows Vista or Windows Server 2008, or you prefer to use something other than Dr. Watson? There are two different options to use - DebugDiag or ADPlus. For pre-Vista Operating Systems, you can also use Userdump.

Let's look at ADPlus first. ADPlus is a VBScript that is included with the Debugging Tools for Windows. ADPlus has two different modes - Hang and Crash. ADPlus is configured through the use of command line switches to specify the parameters. The basic command line switches are as follows:

Switch Function-crash Runs ADPlus in crash mode-hang Runs ADPlus in hang mode-p <PID> Defines the Process ID to be monitored-pn <ProcessName>

Defines the Process Name to be monitored

-o <directory> Defines the directory where dumps & logs are created

So for example, to run ADPlus to monitor the Print Spooler and output the files to a folder on the D: drive, you would run the following command: cscript adplus.vbs -crash -pn spoolsv.exe -o d:\Spooler_Dumps. A key feature to remember about ADPlus is that unlike Dr. Watson or DebugDiag, which we are about to discuss, it can capture crash dumps of a failing 64-bit process. However, there are some caveats with ADPlus. For example, when you use ADPlus in hang mode, you must wait until the process or processes stop responding before you run the script (unlike crash mode, hang mode is not persistent). You can find more information regarding ADPlus usage in the Debugging Tools help file and Microsoft KB Article 286350.

Moving on, let's take a look at the Userdump tool. Similar to ADPlus, Userdump can be used to generate a process dump of an application that has crashed or is hanging. When you install Userdump, the setup program installs a kernel-mode driver. Once the program is installed, launch the Process Dumper Applet from the Control Panel. Click the "New" button and enter the name of the executable to monitor (for example WINWORD.EXE or SPOOLSV.EXE). You do not need to enter the full path. Once the application has been added to the monitor it should look like the screenshot on the right.

Now, highlight the app and click the "Default Settings" button. Ensure that the "All Exceptions" checkbox is not selected. Click on the "Select All" button (which is different than selecting the "All Exceptions" checkbox). Your configuration window should look like the screenshot to the left.

Let's discuss a couple of the other options here. The option "Bugcheck after Dumping" will cause the machine to create a Kernel dump after creating the user-mode dump. Be very careful with this option!

The other feature to note here is under the "Exit Monitor" section. In some instances you will encounter processes that seem to close gracefully for no apparent reason. If you check the "Monitor Process Exit" box, you will get a dump file created every time the process is closed. Again - this is an option to use judiciously - even if a user closes the application being monitored deliberately you will create a dump file. For an application like Internet Explorer, Word, or Excel, this could result in a large number of unnecessary dump files (and quite a bit of wasted disk space!)

Finally, let's look at the Debug Diagnostic Tool (aka DebugDiag). DebugDiag was originally released as part of the IIS Diagnostic Toolkit. Version 1.1 is a standalone tool that can be used to troubleshoot hangs or crashes in any 32-bit user-mode process. Following the initial installation of DebugDiag, the configuration process is wizard-driven. So let's go ahead and set up a rule to monitor our machine for spooler crashes.

Since IIS is not installed on this machine, all of the IIS options are disabled. After selecting "Crash", and clicking "Next" we are presented with the option to select our Target Type - which could be a specific process (such as Internet Explorer, Excel, Outlook etc), a specific MTS / COM+ application, an IIS web application pool, or a specific NT service. For our example, we will go ahead and select "A specific NT service" and click "Next".

Now we can select our target as shown below:

Remember that we can only use DebugDiag against 32-bit user-mode processes. If you were to try and monitor the spooler on a 64-bit machine, you would receive an error similar to this:

After Clicking "Next", we are presented with the Advanced Configurations screen. For our purposes, we leave this at the default settings, and click "Next", which allows us to name our rule, and select a location for our dump files (see the image below). Once we have completed these options, we click next and activate our rule. Now the Spooler process is being monitored for crashes.

If / When the spooler crashes, we will find a new folder created for each instance of the crash in the Userdump location listed above. This will include the dump file and the log file.

Finally, there is one last option to capture a process dump that is available on Windows Vista. If you open up Task Manager, you can right-click on an application name, the process name or the service and select the option to "Create Dump File". This will create a dump of the process, but not terminate it - so you can capture multiple dumps of a running process!

And that brings us to the end of our post on Application Crash Dumps. Until next time ...

How to disable or enable Dr. Watson for Windows

To disable Dr. Watson

1. Click Start, click Run, type regedit.exe in the Open box, and then click OK.

2. Locate and then click the following registry key:

HKEY_LOCAL_MACHINE\Software\Microsoft\Windows NT\CurrentVersion\AeDebug

NOTE: Steps three and four are optional, However, they are necessary if you want to restore the

default use of Dr. Watson.

3. Click the AeDebug key, and then click Export Registry File on the Registry menu.

4. Type a name and location for the saved registry file, and then click Save.

5. Delete the AeDebug key.

Registry entries for debugger programs are located in the AeDebug key in Windows. The Dr. Watson

program is installed by default in Windows, and is configured to run when an application error occurs (with a

data value of 1 for the Auto value). The default values are as follows:

Value Name = Auto

Type = String (REG_SZ)

Data Value = 1 or 0. (Default is 1)

Value Name = Debugger

Type = String (REG_SZ)

Data Value = drwtsn32 -p %ld -e %ld -g

NOTE: This data value (drwtsn32 -p %ld -e %ld -g) is specific to Dr. Watson. Alternative debuggers will have

their own values and parameters.

To enable Dr. Watson

1. At a command prompt, type the following line, and then press ENTER:

drwtsn32 -i

2. Double-click the .reg file that you created in steps three and four that were discussed earlier.

To create a dump (.dmp) file for a process that shuts down with an exception

1. Run the Setup.exe program for your processor.

Notes

o By default, this Setup.exe program is included with the Userdump.exe tool in the C:\kktools\

userdump8.0 folder.

o This Setup.exe program installs a kernel-mode driver, installs the Userdump.sys file, and

creates the Process Dump icon in Control Panel.

o Unless you have a specific need, disable the "dump on process termination" feature when

you run the Setup.exe program.

2. In Control Panel, double-click Process Dump.

3. On the Exception Monitoring tab, click New, add the appropriate program name to the Monitor

list, and then click OK. For example, add a program name such as Lsass.exe, Winlogon.exe,

Mtx.exe, or Dllhost.exe.

4. In the Monitor box, click the program name that you added in step 3, and then click Rules.

5. Click to select Custom Rules, select the type of error that you want to trigger for the program that

you added in step 3 in the Custom rules list, and then click OK.

For example, select the Access violation (c0000005) error type.

When the monitored program generates an access violation error message, the Userdump.exe tool starts,

and then the Userdump.exe tool creates a dump (.dmp) file in the %SystemRoot% folder. By analyzing

this .dmp file, you may be able to isolate the cause of Winlogon access violation error messages.

To create a dump file for a hanging process

1. Run the Setup.exe program for your processor.

o By default, this Setup.exe program is included with the Userdump.exe tool in the C:\kktools\

userdump8.0 folder.

o This Setup.exe program installs a kernel-mode driver, installs the Userdump.sys file, and

creates the Process Dump icon in Control Panel.

o Unless you have a specific need, disable the "dump on process termination" feature when

you run the Setup.exe program.

2. When the program stops responding, move to the version of Userdump.exe for your processor at the

command prompt, and then type the following command:

userdump PID

Note In this command, PID is a placeholder for the process ID (PID) of the program that has stopped

responding. To obtain the PID of the program, open Task Manager, and then click the Process tab.

When you run the userdump PID command, a .dmp file is generated. You can use this dump file to perform

post-mortem debugging with a program such as the Windbg.exe tool.

If you run Setup to install the Userdump.exe tool, some additional features are enabled. These additional

features of the Userdump.exe tool are described in detail in the Userdocs.doc file that accompanies the

Userdump.exe tool. These additional features include the following:

Process self-dumping. You can configure the Userdump.exe tool to automatically create a dump

file for a certain program when that program encounters a certain kind of error, such as an

exception handler block or a top-level unhandled exception filter.

Hot-key process snapshot. You can associate a single keystroke with an image binary to trigger

the Userdump.exe tool to create a dump file.

Exception monitoring. The Userdump.exe tool can monitor programs for exceptions and can

automatically generate dump files when certain exceptions occur. You can configure whether an

exception triggers a dump file for each program by using the Process Dump utility. You can access

the Process Dump utility from Control Panel.

New features for ADPlus Version 6.0ADPlus V6.0 has been completely rewritten. The tool has new switches and new capabilities. You can now configure the tool through an external configuration file. You can view updated information about the new features and switches in the debugger help file (Debugger.chm) that is included in the Microsoft Windows Debuggers package. To obtain the package, visit the following Microsoft Web site:

Debugger.chm is located in the same folder as ADPlus.vbs. To locate the documentation for ADPlus, click the

Contents tab, and then click through the following items:

Using Debugging Tools for Windows

Crash Dump Files

User-Mode Dump Files

Creating a User Mode Dump File

ADPlus

You can also find documentation for ADPlus by clicking the Index tab. Type ADPlus in the keyword text box.

What does ADPlus do?

ADPlus is console-based Microsoft Visual Basic script. It automates the Microsoft CDB debugger to produce

memory dumps and log files that contain debug output from one or more processes. Each time that you run

ADPlus, debugging information (memory dumps and text files that contain debug information) is put in a

new, uniquely named folder (such as C:\Temp\Crash_Mode__Date_01-22-2001__Time_09-41-08AM) on the

local file system or on a remote network share. Additionally, each file that ADPlus creates has a unique name

(such as PID-1708__Inetinfo.exe__Date_01-22-2001__Time_09-41-08AM.log) to avoid overwriting older files

with newer ones.

ADPlus works with any user mode process or service such as Internet Information Services (IIS), Microsoft

Transaction Server (MTS), or Microsoft COM+ applications.

The following are some of the features of ADPlus:

ADPlus uses the latest Microsoft debuggers for improved features, speed, and reliability.

When ADPlus is dumping memory for multiple processes, it does so asynchronously so that each

process is frozen and dumped at the same time. This method can provide an effective

"snapshot" of the whole application at the time that ADPlus was run. You must capture all the

processes that compose an application, and all the processes that the application uses at the

same time, to capture the state of the application at the time that the problem occurs. This is

especially important for applications that make remote procedure calls to other processes.

ADPlus has a command-line interface. Because ADPlus does not have a graphical user interface,

you can run it in quiet mode (to suppress dialog boxes) from a remote command shell (a

command shell that is remoted out by using Remote.exe). In quiet mode, errors appear in the

console and are written to the event log. For more information about how to run ADPlus from a

remote command shell, see the "Usage Scenarios" section of this article.

If you use the -notify switch when ADPlus monitors for crashes, and the Windows Messenger

service is started, ADPlus can alert a user or computer of a crash through the Windows

Messenger service.

When ADPlus monitors a process in crash mode, if a crash occurs, ADPlus sends important

information about the type of crash to the event log.

ADPlus supports XCOPY deployment. If you install the debuggers package that is included with

ADPlus on a test computer, you can copy the folder where the debuggers were installed to

another computer. Additionally, ADPlus does not require that you register any custom

Component Object Model (COM) components on the system. Because of this, you can use ADPlus

on production servers that have a locked-down software configuration. To remove ADPlus, delete

the folder where it was installed or copied to.

When should you use ADPlus?

ADPlus is intended to provide Microsoft PSS support professionals with debugging information that they must

have to isolate the cause of problems that occur in complex environments.

Use ADPlus to capture debugging information if you are experiencing the following problems:

Processes that stop responding.

Processes that consume 100 percent CPU on a single processor computer, 50 percent CPU on a

dual processor computer, 25 percent CPU on a quad processor computer, and so on.

Processes that crash or shut down unexpectedly.

When should you not use ADPlus?

Do not use ADPlus in the following situations:

If you must troubleshoot a program or process that quits unexpectedly during startup. You can only

use ADPlus for processes that start successfully. To troubleshoot processes that quit unexpectedly

during startup, User Mode Process Dump may be a better solution. For more information about User

Mode Process Dump, click the following article number to view the article in the Microsoft Knowledge

Base:

253066 (http://support.microsoft.com/kb/253066/ ) OEM Support Tools Phase 3 Service Release 2

availability

http://support.microsoft.com/kb/253066/

Alternatively, you can use the latest debuggers to manually debug the process. For more

information about the latest debuggers, visit the following Microsoft Web site:

http://www.microsoft.com/whdc/devtools/debugging/default.mspx

(http://www.microsoft.com/whdc/devtools/debugging/default.mspx)

If there is a noticeable effect on performance when you use ADPlus in crash mode. Typically, this is

caused by dynamic-link libraries (DLLs) or programs that throw many Microsoft Visual C++ EH

exceptions. (These exceptions occur when you use the C++ throw statement or when you use

try/catch blocks.) Programs that write a lot of information to the debug output stream can also

cause performance to decrease. In the vast majority of cases, ADPlus does not affect performance

noticeably when it is running in crash mode.

If you are running in a clustering environment certain precautions should be taken when you use

ADPlus. For more information, click the following article number to view the article in the Microsoft

Knowledge Base:

841673 (http://support.microsoft.com/kb/841673/ ) A server in a cluster may fail over when you try to

create a dump file of the information store using ADPlus or Userdump in Exchange 2000 Server or

Exchange Server 2003

Where do you obtain ADPlus?

ADPlus is included with the latest Microsoft Debugging Tools for Windows. To obtain the latest Microsoft

Debugging Tools for Windows, visit the following Microsoft Web site:



How does ADPlus work?

ADPlus has two modes of operation:

"Hang" mode is used to troubleshoot process hangs, 100 percent CPU utilization, and other

problems that do not involve a crash. When you use ADPlus in hang mode, you must wait until the

process or processes stop responding before you run the script (unlike crash mode, hang mode is

not persistent).

"Crash" mode is used to troubleshoot crashes that result in Dr. Watson errors, or any other type of

error that causes a program or service to quit unexpectedly. When you use ADPlus in crash mode,

you must start ADPlus before the crash occurs. You can configure ADPlus to notify an administrator

or a computer of a crash through the -notify switch.




Hang mode

In this mode, ADPlus immediately produces full memory dumps for all the processes that are specified on the

command line after the script has completed. Each .dmp file that is created is put in a folder that contains

the date/time stamp when ADPlus was run. Each file name contains the process name, the process ID, and

the date/time stamp when ADPlus was run. While the process memory is being dumped to a file, the process

is frozen. After the memory dump file has been created, the process is resumed by using a noninvasive

attach/detach with the CDB debugger.

Usage Tip You can use ADPlus in hang mode instead of Userdump.exe to dump the memory for one or more

processes. Additionally, hang mode works inside a Terminal Server session.

Crash mode

In this mode, ADPlus attaches the CDB debugger to all processes that are specified on the command line.

ADPlus automatically configures the debugger to monitor for the following types of exceptions:

Invalid Handle

Illegal Instruction

Integer Divide by Zero

Floating Point Divide by Zero

Integer Overflow

Invalid Lock Sequence

Access Violation

Stack Overflow

C++ EH Exception

Unknown Exception

You can use ADPlus in crash mode instead of the IIS Exception Monitor or Userdump.exe when you are

troubleshooting these types of exceptions. Because crash mode uses an "invasive" attach through the CDB

debugger, it does not work inside a Microsoft Windows NT 4.0 or Windows 2000 Terminal Server session.

Only hang mode works inside a Terminal Server session on these operating systems because they require

the use of a noninvasive attach. For more information about how to invasively and noninvasive attach to a

process with the latest debuggers, see the "Using Debugging Tools for Windows: Attaching to a Running

Process (User Mode)" section in the debuggers help.

Note Crash mode is supported in a Terminal Server session on Windows XP and Microsoft Windows Server

2003 operating systems.

When ADPlus is running in crash mode, a debugger remains attached to each process that is specified on the

command line for the lifetime of that process until a fatal exception is trapped and the process quits

unexpectedly, or until a user presses the CTRL+C key combination to detach the debugger from that

process. To manually detach the debugger from the process, you must maximize the debugger window, and

then press CTRL+C to break into the debugger.

When you press CTRL+C, ADPlus traps this command, starts to list the stacks for all threads to a log file, and

then produces a mini memory dump record of the process before it detaches from the debugger. Because

crash mode performs an invasive attach, the process is stopped when the debugger is detached. You must

restart the process. If it is an MTS or COM+ process, the process is restarted automatically the next time that

a call is made to a component in that package.

First chance exceptions

Each type of exception (such as an access violation or a stack overflow) can be raised to a debugger as

either a first chance exception or a second chance exception. By definition, a first chance exception is non-

fatal unless it is not handled correctly by using an error handler. If this problem occurs, the exception is

raised again as a second chance exception (only a debugger can handle these). If no debugger handles a

second chance exception, the application quits.

For more information about first and second chance exceptions and the Windows NT SEH (structured

exception handling), click the following article number to view the article in the Microsoft Knowledge Base:

105675 (http://support.microsoft.com/kb/105675/ ) First and second chance exception handling

By default, when ADPlus detects a first chance (non-fatal) exception for all types of exceptions except

unknown and EH exceptions, it takes the following actions:

1. Pauses the process to log the date and time that the exception occurred in the log file for the

process that is being monitored.

2. Logs the thread ID and call stack for the thread that raised the exception in the log file for the


3. Produces a uniquely named mini memory dump record (.dump -u /m) of the process at the time

that the exception occurred, and then resumes the process.

Note By default, ADPlus does not produce a unique mini memory dump record for first chance EH and

unknown exceptions because these exceptions occur frequently. Typically, such exceptions are handled by

error handling code in a process or DLL. Because these are handled exceptions, they do not become second


chance (unhandled) exceptions and they do not end the process.

However, you can configure ADPlus to produce unique mini memory dumps for first chance EH and unknown

exceptions. To do this, you must use a configuration file to customize ADPlus.

Second chance exceptions

When ADPlus detects a second chance (fatal) exception for all types of exceptions (including EH and

unknown exceptions), it takes the following actions:

1. Pauses the process to log the date and time that the exception occurred in the log file for the


2. Logs the thread ID and call stack for the thread that raised the exception in the log file for the


3. Produces a full memory dump of the process at the time that the fatal exception occurred, and then

exits the debugger. This action destroys the process.

Note For Microsoft PSS support professionals to analyze memory dumps, they may have to obtain copies of

any custom components or DLLs and their corresponding symbol files. For more information about how to

create symbol files for your DLLs, click the following article numbers to view the articles in the Microsoft

Knowledge Base:

121366 (http://support.microsoft.com/kb/121366/ ) PDB and DBG files - what they are and how they work

291585 (http://support.microsoft.com/kb/291585/ ) How to create debug symbols for a Visual C++ application

For more information about how to obtain symbols for Microsoft products (necessary for analyzing memory

dumps with the debuggers), visit the following Microsoft Web site:

http://www.microsoft.com/whdc/DevTools/Debugging/symbolpkg.mspx

(http://www.microsoft.com/whdc/DevTools/Debugging/symbolpkg.mspx)

ADPlus command line switches

To use ADPlus, you must specify a series of command line switches or arguments to the script. At a

minimum, ADPlus requires two switches: one that specifies the mode of operation, and one that specifies a

target process to operate against.

The following are the most frequently used switches. You can also view the complete list of switches by

running ADPlus –help, or by viewing the debuggers help file (Debugger.chm).

http://www.microsoft.com/whdc/DevTools/Debugging/symbolpkg.mspx



-hang

This switch configures ADPlus to run in hang mode. You must use this switch with the -iis, -pn, or -p

switches. You cannot use -hang with the -crash switch.

Note When ADPlus is running in hang mode, you must start ADPlus after the process stops

responding or is consuming a high percentage of the CPU.

-crash

This switch configures ADPlus to run in crash mode. You must use this switch with the -iis, -pn, or -p

switches. You cannot use -crash with the -hang switch.

Note When ADPlus is running in crash mode, you must start ADPlus before the process quits

unexpectedlys or becomes unstable.

-pn process name

The -pn switch is used to specify a process name that you want ADPlus to analyze. To specify more

than one process, use multiple -pn process name switches. For example:

-pn process1.exe -pn process2.exe

-p process ID

The -p switch is used to specify the process ID (PID) of a process that you want ADPlus to analyze.

To specify more than one process, use multiple -p PID switches. For example:

-p 1896 -p 1702

-scspawning command

Unlike the -pn and -p switches, which specifiy processes that are already running to attach the

debugger to, the -sc switch defines the application and parameters to be started (or spawned) in the

debugger. For example:

-sc "c:\windows\system32\notepad.exe

-iis

The -iis switch is used for debugging server computers that are running Internet Information Server

(IIS) 4.0 or later. When you use ADPlus with the -iis switch, ADPlus monitors all the IIS in-process

(Inetinfo.exe) and out-of-process (Mtx.exe/Dllhost.exe) applications. You can use the -iis switch with

the -pn switch or the -p switch, or you can use it alone to analyze IIS and all running MTS/COM+

applications in either crash mode or hang mode.

If you are trying to analyze a server computer running IIS 3.0 or earlier, use the -pn switch and

specify Inetinfo.exe as the process to monitor.

-notify computer name or user name

This switch is only valid when ADPlus is running in crash mode. This switch instructs ADPlus to alert

the specified user name or computer name of a crash. When the debugger detaches from the

process because of a second chance exception, or when a user presses CTRL+C to stop debugging,

a notification is sent to the remote user or computer through the local messenger service. This

notification occurs only if the local messenger service is started on the computer that is being

debugged.

-quiet

This switch instructs ADPlus to suppress all modal dialog boxes. This switch is useful if you are

running ADPlus from a remote command shell where modal dialog boxes can cause ADPlus to wait

indefinitely for a user to click OK. For best results, make sure that this is the first switch that is

passed to ADPlus.vbs.

-o output directory

This switch instructs ADPlus where to put the debug output files. If you use long file names, you

must enclose them in double quotation marks. Additionally, you can use a UNC path (\\server\share).

If you use a UNC path, ADPlus creates a new folder immediately below the UNC path that you

specified. The folder is named for the server where ADPlus is running (for example, \\server\share\

Web1 or \\server\share\Web2). This switch is useful if ADPlus is running on multiple computers in a

Web farm that are all putting their output on the same network share.

Run ADPlus for the first time

By default, debuggers install to the C:\Program Files\Debugging Tools for Windows folder. To change the

installation folder, do a custom install when you install the debuggers, and specify a different folder.

Alternatively, if a typical installation was performed, copy the contents of the Program Files\Debugging Tools

for Windows folder to a different folder.

To run ADPlus, open a command shell, switch to the folder where the debuggers were installed or copied,

and then type ADPlus.vbs.

You may be prompted to change your default script interpreter from Wscript.exe to Cscript.exe. Microsoft

strongly recommends that you allow ADPlus to configure CSCript as the default script interpreter.

Syntax

ADPlus uses the following syntax: ADPlus.vbs mode of operation processes to monitor optional switches

where mode of operation is -hang, or -crash

where processes to monitor is -iis, -pn process.exe, or -p PID

where optional switches is -notify, -o, or -quiet.

Back to the top

Prepare the server for crash mode debugging

Before you run ADPlus in crash mode, you must prepare the server to obtain the most information from the

ADPlus crash mode debugging sessions.

Steps to prepare a Windows 2000-based server for debugging in crash mode

1. Install the Windows 2000 SP1 or SP2 symbols to the C:\WINNT\Symbols folder on your servers. You

can download the symbols from the following Microsoft Web sites:

Windows 2000 SP1

http://download.microsoft.com/download/win2000platform/SP/SP1/NT5/EN-US/SP1SYM.exe

(http://download.microsoft.com/download/win2000platform/sp/sp1/nt5/en-us/sp1sym.exe)

Windows 2000 SP2

http://download.microsoft.com/download/win2000platform/SP/SP2/NT5/EN-US/SP2SYM.exe

(http://download.microsoft.com/download/win2000platform/sp/sp2/nt5/en-us/sp2sym.exe)

After you download Sp1sym.exe or Sp2sym.exe, run the file from the designated folder.

2. When you are prompted, extract the files to a new temporary folder, such as C:\Sp1sym or C:\

Sp2sym, or to a drive or folder that has sufficient disk space.

3. Run C:\Sp1sym\Support\Debug\Symbols\i386\Symbols_spexe or C:\Sp2sym\Support\Debug\Symbols\

i386\Symbols_spexe (where C:\Sp1sym or C:\Sp2sym is the folder where you extracted the files in

the previous step).

4. When you prompted with the EULA, click Yes.

5. When you are prompted for a folder where you can extract the files, click C:\WINNT\Symbols, and

then click OK. Notice that a new C:\WINNT\Symbols folder appears. This folder contains various

subfolders that have names such as DLL and EXE.

6. Copy the symbols for your custom DLLs and any post SP1 or SP2 hotfixes to the C:\WINNT\Symbols\

Dll folder.

http://download.microsoft.com/download/win2000platform/sp/sp2/nt5/en-us/sp2sym.exe

http://download.microsoft.com/download/win2000platform/sp/sp1/nt5/en-us/sp1sym.exe

http://support.microsoft.com/kb/286350/#top

http://support.microsoft.com/kb/286350/#top

7. Copy the symbols for your custom .exe files to the C:\WINNT\Symbols\Exe folder. Additionally, you

must obtain any .pdb or .dbg files from your developers, and then put these files in the C:\WINNT\

Symbols\Dll folder.

8. Overwrite any .dbg or .pdb files that already exist in the C:\WINNT\Symbols\Dll folder with versions

from your hotfixes.

Note You can use the latest version of Winzip to open hotfix packages. You can extract the symbols

from the \Debug subfolder. The \Debug subfolder is contained in each hotfix self-installer.

9. Create an _NT_SYMBOL_PATH environment variable, and then set it equal to C:\WINNT\Symbols.

This variable can be either a system variable or a user environment variable.

Steps to prepare a Windows NT 4.0-based server for debugging in crash mode

1. Assume that you are running Windows NT 4.0 Service Pack 6a. Install the Windows NT 4.0 SP6a

symbols to the C:\WINNT\Symbols folder on your servers.

For more information about Windows NT 4.0 Service Pack 6/6a, click the following article number to

view the article in the Microsoft Knowledge Base:

241211 (http://support.microsoft.com/kb/241211/ ) List of bugs fixed in Windows NT 4.0 Service Pack 6/6a

(Part 1)

After you download Sp6symi.exe, run it from the designated folder.

2. When you are prompted, extract the files to the C:\WINNT folder (or substitute the appropriate \

WINNT folder if the symbols were not installed to C:\WINNT). Notice that a new C:\WINNT\Symbols

folder appears that has various subfolders named DLL, EXE, and others.

3. Copy the subfolders from the C:\WINNT\Symbols\IIS4 folder to C:\WINNT\Symbols. When you are

prompted to overwrite all the files, click Yes.

4. Copy the symbols for your custom DLLs and any post SP6a hotfixes to the C:\WINNT\Symbols\Dll

folder.

5. Copy the symbols for your custom .exe files to the C:\WINNT\Symbols\Exe folder. Additionally, you

must obtain any .pdb or .dbg files from your developers, and then put these files in the C:\WINNT\

Symbols\Dll folder.

6. Overwrite any .dbg or .pdb files that already exist in the C:\WINNT\Symbols\Dll folder with the

versions from your hotfixes.


Note You can use the latest version of Winzip to open hotfix packages. You can extract the symbols

from the \Debug subfolder. This subfolder is included in each hotfix self-installer.

7. Create an _NT_SYMBOL_PATH environment variable, and then set it equal to C:\WINNT\Symbols.

This variable can be either a system variable or a user environment variable.

Although you do not have to download and install symbols on the servers that you are debugging, it is highly

recommended. When you download and install symbols on the server, the output that the log files capture is

much more useful to Microsoft PSS.

For more information about how to obtain Microsoft Debug Symbols, click the following article number to

view the article in the Microsoft Knowledge Base:

(http://support.microsoft.com/kb/268343/ ) Umdhtools.exe: How to use Umdh.exe to find memory leaks

After you configure your servers, you can run ADPlus in crash mode. This mode is described in the "Typical

ADPlus Usage Scenarios" section.

Typical ADPlus usage scenarios

This section describes some of the typical scenarios where you may have to run ADPlus.

Process stops responding or consumes 100 percent CPU utilization

In this scenario, a process may randomly consume 100 percent CPU for sustained periods or indefinitely. Run

ADPlus in hang mode to obtain a memory dump of the process or processes that are consuming the CPU

after the problem occurs. For example, use one of the following command syntaxes:

ADPlus -hang -p 1896

This command runs ADPlus in hang mode and produces a full memory dump file of a process that has the

PID 1896.

ADPlus -hang -pn myapp.exe

This command runs ADPlus in hang mode and produces full memory dump files of all processes that are

named Myapp.exe.

ADPlus -hang -iis -pn myapp.exe -o c:\temp

This command runs ADPlus in hang mode and produces full memory dump files of IIS, all instances of

Mtx.exe or Dllhost.exe, and all processes that are named Myapp.exe. It then puts the memory dump files in

the C:\Temp folder.

When you run ADPlus in hang mode during the 100 percent CPU condition, the tool produces memory dump

files of the process or processes that you specify on the command line.

Note In certain rare situations, the debugger may not be able to attach to the process after the 100 percent

CPU condition or hang has occurred. If you run ADPlus in hang mode after the problem has occurred, the tool

may not produce memory dump files. In these scenarios, it may be best to attach the debugger before the

problem has occurred. To do this, use one of the following command syntaxes to run ADPlus in crash mode:

ADPlus -crash -p 1896

This command runs ADPlus in crash mode for a process that has the PID 1896. ADPlus waits for an exception

to occur, or for a user to press CTRL+C in the minimized debugger window, to generate a memory dump file

and to detach the debugger.

ADPlus -crash -pn myapp.exe

This command runs ADPlus in crash mode for the process named Myapp.exe. ADPlus waits for an exception

to occur, or for a user to press CTRL+C in the minimized debugger window, to generate a memory dump file

and to detach the debugger.

ADPlus -crash -iis -pn myapp.exe -o c:\temp

This command runs ADPlus in crash mode for all instances of the processes named Myapp.exe and

Inetinfo.exe, and for all instances of Mtx.exe or Dllhost.exe. ADPlus waits for an exception to occur, or for a

user to press CTRL+C in one or more of the minimized debugger windows, to generate the memory dump

file (or files) and to detach the debugger (or debuggers). ADPlus puts the memory dump files and the log

files in the C:\Temp folder.

Then, after the process hangs or is consuming 100 percent CPU utilization, the user can press CTRL+C in the

minimized debugger window (or windows) that ADPlus generates so that the debugger can generate a

memory dump file for the process (or processes).

Note By default, ADPlus only produces mini memory dump records when the user presses CTRL+C. This

setting conserves disk space. In this scenario, it may be useful to configure ADPlus to generate a full memory

dump file when the user presses CTRL+C. To do this, use the –CTCF switch. Additionally, it is frequently

helpful to capture a performance log file or a system monitor log file for the time period up to and including

the 100 percent CPU utilization condition. At a minimum, this log file should capture the following objects at

1 to 5 second intervals:

Memory

Process

Processor

System

Thread

Process quits unexpectedly

In this scenario, a process may randomly quit (or crash) unexpectedly. Run ADPlus in crash mode to obtain a

memory dump file of the process or processes that quit before the problem occurs. For example, use one of

the following command syntaxes:

ADPlus -crash -iis

This command runs ADPlus in crash mode and causes it to attach the CDB debugger to Inetinfo.exe and to all

Mtx.exe or Dllhost.exe processes that are running on the computer. ADPlus then waits for any first chance

and second chance exceptions to occur. By default, ADPlus puts all files in a subfolder of the installation

folder because the -o switch is omitted.

ADPlus -quiet -crash -iis -notify remote computer -o c:\temp

This command runs ADPlus quietly (no dialog boxes, log all output to the event log) in crash mode and

causes it to attach the CDB debugger to Inetinfo.exe and to all Mtx.exe or Dllhost.exe processes that are

running on the computer. Because the -notify switch is used, the debuggers notify all users who are logged

on to the computer named remote computer whenever a crash is detected or when the process that is being

monitored quits. Because the -o switch is used, ADPlus puts all output in the C:\Temp folder. If the folder

does not exist, ADPlus creates it.

ADPlus -crash -iis -o \\server\share

This command is the same as the previous command except that it logs all output to a network server.

ADPlus creates a new subfolder in \\server\share and names the subfolder for the local computer. Therefore,

if you are running ADPlus in a Web farm, each server in the farm that has ADPlus running logs its own unique

folder under \\server\share. (You do not have to create unique folders for each server. ADPlus does this

automatically.)

Note If you are running ADPlus in crash mode from the local console (instead of from a remote command

shell as described in the next section), you must remain logged on to the console for the duration of the

debug session.

For example, assume that you start ADPlus in crash mode and you use the -iis switch to monitor IIS. When

you log out of the console, the copies of Cdb.exe that are running on the console (and all other running

applications) quit . As a result, debugging stops, and the process that is being monitored is ended.

To avoid this issue, you can lock the console session (press the CTRL+ALT+DEL key combination, and then

click Lock Computer) or run ADPlus from a remote command shell that you have scheduled to run non-

interactively (that is, it does not require an interactive logon).

For more information about how to schedule a remote command shell to run non-interactively, see the

"Typical ADPlus Usage Scenarios: Run in Crash Mode Remotely" section.

MTS or COM+ server application quits unexpectedly

Custom Component Object Model (COM) components that run in an MTS or COM+ server application actually

run in a surrogate process (Mtx.exe or Dllhost.exe). These surrogate processes have properties and settings

that you can configure through the MTS Explorer (for Windows NT 4.0) or through the Component Services

Microsoft Management Console (MMC) snap-in (for Windows 2000, Windows XP, and Windows Server 2003).

By default, MTS or COM+ server applications are configured to quit after three minutes of idle time. To make

sure that these processes remain running while the debugger is attached and monitoring for exceptions, you

must configure them to Leave running when idle.

Additionally, MTS and COM+ implement a failfast. A failfast is a safeguard that is designed to fail (or quit)

MTS/COM+ processes that generate unhandled access violations.

By default, the failfast is enabled in MTS or COM+ applications that raise unhandled access violation

exceptions. As a result, a failing MTS/COM+ server application cannot raise a second chance access violation

exception (that is, it quits after the first chance acess violation). By default, ADPlus is configured to produce

only a mini memory dump record when first chance exceptions occur.

To successfully debug MTS/COM+ server applications, followthese steps:

1. Configure the MTS/COM+ server application to Leave running when idle.

2. Use the FullOnFirst switch to create full dump files on first chance exceptions.

3. Run ADPlus in crash mode, and then wait for the application to fail.

Note Because MTS and COM+ shut down a server application, and because the failfast policy prevents the

process from raising a second chance exception, you may only be able to obtain a first chance access

violation memory dump file.

Run in crash mode remotely

There are many occasions when you must initiate ADPlus in crash mode from a local client computer to

monitor a process that quits unexpectedly on one or more remote servers in a server farm. Typically, on

Windows 2000, you do this through Windows Terminal Services. However, you cannot debug applications

that are running in different window stations on Windows NT 4.0 and Windows 2000. Therefore, ADPlus

disables crash mode functionality when it detects that it is running in a Terminal Services session. To resolve

this issue, share the remote server by using the Remote.exe utility, create a batch file that starts a command

shell on the remote server, and then schedule this batch file to run on the target server by using the AT

command. (The AT command causes the command shell to run non-interactively, similiar to a service.) The

remote command shell is then connected to a local workstation or client computer that uses the same

Remote.exe utility that you used to start the command shell.

To start a remote command shell on a server by using the AT command, follow these steps:

On the remote server

Assume that the debuggers are installed to C:\Debuggers. Follow these steps:

1. In the C:\Debuggers folder, create a new batch file named Remoteshell.cmd.

2. Add the following line to this batch file:

c:\debuggers\remote.exe /s "cmd.exe" remoteshell

3. At the console on the server, or in a Terminal Services session, open a new command shell,

and then type the following command:

AT 15:00 c:\debuggers\remoteshell.cmd

where 15:00 is one minute later than the current time. For example, if the current time is

14:59, type 15:00.

4. Wait for the AT command to run.

5. At the command prompt, type AT with no parameters to verify that the task has run with no

errors.

On the local client:

Install the debuggers on the local client computer or (at a minimum) copy the Remote.exe utility

locally. (By default, the utility is installed with the debuggers in the root installation folder.)

Assume that the debuggers and the Remote.exe utility are installed to C:\Debuggers. Follow these

steps:

1. At a command prompt, switch to the C:\Debuggers folder.

2. Type the following command:

remote.exe /c remote server remoteshell

where remote server is the name of the remote server.

3. Your local command shell is now connected to the remote command shell that is running on

the server, and all commands that you type locally will be carried out on the remote server

(the DIR c:\ command lists the contents of drive C on the remote server).

4. In the remote command shell, you can now run ADPlus in crash mode as if you were running

it locally from the console. However, you must use the -quiet switch to supress all dialog

boxes that ADPlus generates by default. If you do not use the -quiet switch, the remote

command shell will stop responding after you run ADPlus, and will not return to a prompt. If

this problem occurs, you must quit the remote command shell (Cmd.exe) on the server, and

then start a new instance.

5. To send a debug break (CTRL+C) to a process that ADPlus is currently debugging remotely

through crash mode, you must use the Breakin.exe utility. By default, Breakin.exe is

installed with the debuggers in the root of the debuggers folder. For example, to stop

debugging IIS (Inetinfo.exe) that is running with a process ID of 1975, type the following

command in the remote command shell:

breakin.exe 1975

Alternately, you can use the Kill.exe command (located in the root debuggers folder) to quit

any processes that are being debugged.

Additional information and known issues

How can you determine if ADPlus has captured information about a crash or if a process that is

being monitored in crash mode has quit?

There are several ways to determine this:

o Use the -notify switch, and verify that the messenger service is started on the server that

is being debugged and on the client computer that will receive the notifications.

o In a text editor, open the .log file that appears in the output folder for each process, and

then scroll to the end of the file. Locate the following text:

o 0:070> * -------- AutodumpPlus 4.01 finished running at:

--------

o 0:070> .time

o Debug session time: Mon Aug 06 15:25:15 2001

o System Uptime: 3 days 17:00:34

o Process Uptime: 1 days 3:10:38

0:070> *

-------------------------------------------------------

o In the output folder, locate any .dmp files that contain the phrase "__2nd_chance". If this

phrase appears in the label of a memory dump record, a process has quit unexpectedly.

o In the output folder, locate any .dmp files that contain the phrase

"__Process_was_shutdown". If this phrase appears in the label of a memory dump record, an

administrator quit the process or, if it is an MTS/COM+ application, the process quit because

it reached the idle limit.

o In the output folder, locate any .dmp files that contain the phrase "__CTRL-C". If this phrase

appears in the label of a memory dump record, either a debug break exception was thrown

from a DLL that was running in the process or someone pressed CTRL+C from the console

(or used Breakin.exe if ADPlus is running remotely) to stop the current debugging session.

You must install Windows Scripting Host components on the system for ADPlus to run. To download

the Windows Scripting Host, visit the following Microsoft Web site:

http://msdn2.microsoft.com/en-us/library/ms950396.aspx

(http://msdn2.microsoft.com/en-us/library/ms950396.aspx)

Note Windows Scripting Host components may already be installed if you have any of the following

Microsoft products installed:

o Microsoft Internet Explorer 5

o Microsoft Office 2000

o Microsoft Visual InterDev 6.0

o Microsoft Visual Studio 6.0

o Microsoft Windows NT Option Pack

o Microsoft Windows 2000

http://msdn2.microsoft.com/en-us/library/ms950396.aspx

o Microsoft Windows XP

o Microsoft Windows Server 2003

o Microsoft Windows Vista

The -iis switch works only if Internet Information Server (IIS) 4.0 or Internet Information Services

(IIS) 5.0.x is installed.

When you run ADPlus in quiet mode, the tool logs information to the event log.

If you use the -o switch, the specified path must not contain more than one nonexistent folder. For

example:

1. You specify -o c:\temp1\temp2. However, the C:\Temp1 and \Temp2 folders do not exist.

2. You receive an error message from ADPlus that states that the folders do not exist, and

ADPlus will not create them.

If you specify only -o c:\temp1, ADPlus creates the C:\Temp1 folder if does not exist, and then puts

all output files in that folder. If you want to specify multiple subfolders, and you use the -o switch,

verify that all the subfolders exist before you run ADPlus.

In COM+, you can configure a server package to start in the debugger on the Advanced tab in the

Properties dialog box of the package. If you enable the Launch in Debugger option, ADPlus

cannot attach the CDB debugger to a process. By default, only one debugger can be attached to a

process at a time.

When a remote procedure call (RPC) is made from a process that ADPlus is analyzing in crash mode

to another process that has quit (intentionally or unexpectedly), the log file that ADPlus creates for

the process that it is analyzing may contain one or more of the following exceptions:

Unknown exception - code 80010105 (first chance)

Unknown exception - code 800706be (first chance)

Unknown exception - code 800706ba (first chance)

These exceptions are typical. RPC raises these exceptions when a call is made from a process that is

being monitored to a nonexistent or failed process.

Additionally, if ADPlus is monitoring Inetinfo.exe in the ADPlus debug log for that process, the

following exception may appear in the log:

Unknown exception - code 800706bf (first chance)

This exception typically appears after IIS makes a call to an out-of-process (high-isolation) Web site

that has failed. It may be followed by two instances of the following exception:

Unknown exception - code 800706ba (first chance)

How to create a user-mode process dump file

To create a user-mode process dump file in Windows Vista, use one of the following methods.

Method 1: Use Task Manager

Windows Vista

To use Windows Task Manager to create a user-mode process dump file in Windows Vista, follow these steps:

1. Start Task Manager. To do this, use one of the following methods:

o Right-click an empty area of the task bar, and then click Task Manager.

o Press CTRL+SHIFT+ESC.

2. Click the Processes tab.

3. Right-click the name of the process that you want, and then click Create Dump File.Collapse this

imageExpand this image

If you are prompted for an administrator password or confirmation, type your password or click

Continue.

A dump file for the process is created in the following folder:

Drive:\Users\UserName\AppData\Local\Temp

4. When you receive a message that states that the dump file was successfully created, click OK.

Windows 7

To use Windows Task Manager to create a user-mode process dump file in Windows 7, follow these steps:

1. Start Task Manager. To do this, use one of the following methods:

o Right-click an empty area of the task bar, and then click Start Task Manager.

o Press CTRL+SHIFT+ESC.

2. Click the Processes tab.

3. Right-click the name of the process that you want, and then click Create Dump File.Collapse this

imageExpand this image


Continue.

A dump file for the process is created in the following folder:

Drive:\Users\UserName\AppData\Local\Temp

4. When you receive a message that states that the dump file was successfully created, click OK.

Method 2: Use the ADPlus tool

You can use the ADPlus tool to create a user-mode process dump file. The ADPlus tool is included in

"Debugging Tools for Windows." For more information about "Debugging Tools for Windows" and about how

to download it, visit the following Microsoft Web site.



For more information about how to use the ADPlus tool, click the following article number to view the article

in the Microsoft Knowledge Base:

286350 (http://support.microsoft.com/kb/286350/ ) How to use ADPlus to troubleshoot "hangs" and "crashes"

Back to the top

How to determine the approximate size of a user-mode process dump file that will

be created

You can use Performance Monitor to determine the approximate size of a user-mode process dump file that

will be created. To do this, follow these steps:

1. Click Start Collapse this imageExpand this image

, type perfmon in the Start Search box, and then click perfmon.exe in the Programs

list.Collapse this imageExpand this image


Continue.

2. Expand Monitoring Tools, and then click Performance Monitor.

3. Right-click an empty area of the display pane, and then click Add Counter.

http://support.microsoft.com/kb/931673#top




4. Under Available counters in the Add Counters dialog box, click the down arrow next to the

Process performance object, and then click the Virtual Bytes counter.

5. Under Instances of selected object, click the name of the process, click Add, and then click OK.

The value that appears is the approximate size of the dump file. When you create a user-mode

process dump file, make sure that sufficient free space is available on the hard disk where the dump

file will be stored.

To collect user-mode dumps

Important This section, method, or task contains steps that tell you how to modify the registry. However,

serious problems might occur if you modify the registry incorrectly. Therefore, make sure that you follow

these steps carefully. For added protection, back up the registry before you modify it. Then, you can restore

the registry if a problem occurs. For more information about how to back up and restore the registry, click

the following article number to view the article in the Microsoft Knowledge Base:

322756 (http://support.microsoft.com/kb/322756/ ) How to back up and restore the registry in Windows

This feature is not enabled by default. Enabling the feature requires administrator privileges. To save these

user mode memory dumps locally using Windows Error Reporting, create the following Registry key:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps

Value Name = DumpType

Data type: REG_DWORD

Value Data = 1

Data Values Descriptions:

0 = Create a custom dump

1 = Mini dump

2 = Full dump

Windows Error Reporting in Windows Vista generates the following information:

A report manifest file (Report.wer)

Once enabled, the report can be found at %LOCALAPPDATA%\Microsoft\Windows\WER. Reports can be

viewed using the wercon.exe tool. The following error information is sent to the Windows Error Reporting

Server:


Operating system version information (.version.txt)

Application information (.appcompat.txt)

A heap dump file (.hdmp)

A mini-dump file (.mdmp)

For more Information on custom dump types click the link below:

http://msdn.microsoft.com/en-us/library/bb787181(VS.85).aspx

(http://msdn.microsoft.com/en-us/library/bb787181(VS.85).aspx)

Dump files are stored in the default location %LOCALAPPDATA%\CrashDumps

For more information about collecting user-mode dumps, visit the following Microsoft Web site:


Windows Error Reporting (WER): Classifications

The Microsoft Windows Error Reporting (WER) service captures both kernel-mode (operating system) and user-mode (application) crashes, including information on drivers and applications, as well as on the modules (controls and plug-ins) running at the time of the crash.

When an end user chooses to send an error report to Microsoft over the Internet, the WER service collects technical information about the crash. This data is used for quality control purposes only and is not used for tracking individual users or installations for any marketing purpose. If information is available that will help the end user solve the problem, Windows displays a message to the user with a link to that information.

WER classifies error reports for the same problem into one bucket. When a customer sends an error report, WER determines if a bucket for that problem already exists. If it does, then the report is added to the existing bucket. If not, then a new bucket is created.

The types of data collected and the schemas for defining a bucket are different for user-mode crashes and for kernel-mode crashes.

Classifying Kernel-Mode Crashes

Kernel-mode crashes are first grouped by stop codes and then by additional parameters, depending on the individual stop code. The bucket name is based on the type of error and the device. For example:

Bucket name Error

OLD_IMAGE_SAMPLE.SYS_DEV_3577 Crash caused by an old version of sample.sys on device ID 3577

0x44_BUGCHECKING_DRIVER_ SAMPLE Driver sample.sys may have caused bugcheck 0x44

POOL_CORRUPTION_ SAMPLE Driver sample.sys may have caused pool corruption

0xBE_sample!bar+1a Driver sample.sys crashed in routine bar

An error report for a kernel-mode crash consists of a minidump file generated at the time of the crash and an XML file generated when the computer restarts and is about to send the error report.

When Windows stops responding, it reverts to a low-level troubleshooting mode. In this mode, a dump file is captured that contains low-level operating system data structures that identify what was happening in the



computer at the time of the crash. These data structures include the functions being executed by the processor at the time of the crash, the CPU register state, and stack, thread, and process information. This data can be viewed in a debugger and used to identify the faulting component.

The dump file also contains the list of all drivers loaded in the computer at the time of the crash. This data is used by the debugger to determine which driver images and symbols need to be loaded to debug the crash. The list of modules also helps determine whether known bad or outdated drivers are running on the computer.

In Windows XP Service Pack 1 (SP1) and later, the dump files have been enhanced to allow a driver to store information in the crash dump file that can be used for troubleshooting. The routine for collecting crash data from a driver is KeRegisterBugCheckCallback.

Classifying User-Mode Crashes

User-mode crashes are classified according to the following parameters:

•Application name — for example, winword.exe

•Application version — for example, 10.0.2627.0

•Module name — for example, mso.dll

•Module version — for example, 10.0.2613.1

•Offset into module — for example, 00003cbb

The .cab files for user-mode crashes include such information plus a minidump file. The minidump file for user-mode crashes contains the state of the process at the time the crash occurred—specifically, the registers and stack for every thread in the application. This information is used to identify which application component caused the crash. The minidump also includes a list of all modules loaded in the application at the time of the crash, so you can get information about each module loaded in the process and to get symbols for each of these modules.

Windows Error Reporting (WER) is a set of Windows technologies that capture software crash and hang data from end-users of Windows. Through the Winqual website, software and hardware vendors can access these reports in order to analyze, fix and respond to these problems. WER technologies were originally implemented in Windows XP/Windows Server 2003, and are still a part of current Windows releases.

Broad-based trend analysis of error reporting data shows that across all the issues that exist on the affected Windows platforms and the number of incidents received:

• Fixing 20 percent of the top-reported bugs can solve 80 percent of customer issues.

• Addressing 1 percent of the bugs would address 50 percent of the customer issues.

Vendors can use WER to view error reports with no recurring charges. This service is available for all products, even those that do not qualify for the Windows hardware or software logo—

although we strongly recommend that you submit your products to the Windows Logo Program. Note: A class 3 VeriSign certificate is required to sign-up and use the service.

How to View Error Reports

Microsoft sorts error reports received through Windows Error Reporting into virtual "buckets." A bucket is a categorization of all instances of a specific error associated with a particular version of a driver, application, Windows feature, or other component.

You can use the Winqual website to view driver-specific, application-specific, or operating system-specific errors associated with your organization. Each error report provides details related to that bucket, and you can then request a file of the associated data.

To view error reports:

1. Establish a Winqual account.

To protect companies from impersonation and to ensure that the error reports go to a representative

from the correct company, the Winqual Web site requires your company to have a valid VeriSign ID.

• Check with your Legal Department; your company might already have a VeriSign ID (also called a

Software Publisher's Digital ID for Authenticode).

• Check on Winqual to see if your company already has an account.

2. Accept the Windows Error Reporting Agreement.

3. Sign in to the Winqual site.

4. Click Windows Error Reports.

If you do not see your company's error reports, users of your products might not have submitted error reports to Microsoft. However, it might also be because Microsoft does not have sufficient information to match your company with error reports related to your products.

How to Map Files: Matching Error Reports with Your Company's Products

For you to be able to view error reports for your company's software, Microsoft needs to know which software associated with a specific bucket belongs to your organization. Making this connection in Windows Quality Online Services is referred to as "mapping files."

To associate particular error report buckets as belonging to your company:

1. Complete the Request file mapping or Request file unmapping forms on the Winqual Web site. (Use

https://winqual.microsoft.com/default.asp

https://winqual.microsoft.com/member/LAC/DocumentDetails.aspx?id=420&type=0

https://winqual.microsoft.com/SignUp/

https://winqual.microsoft.com/Help/Default.htm#obtaining_a_verisign_class_3_digital_id.htm

https://winqual.microsoft.com/SignUp/

the resources listed at the bottom of this page to find up-to-date directions.)

2. When you are viewing the files associated with your organization, click a file name to add or remove a

file from the list.

How System Manufacturers Can Obtain and Analyze System Data

System manufacturers can choose to include a special file, called a "marker file," on their systems. This file is used to help associate WER data with specific computer models, so that the manufacturer can view and analyze crash dumps from those systems.

To help system manufacturers identify and resolve issues related to kernel-mode error data, Microsoft can provide the following assistance:

•Driver vendor and other developer support contacts.

• Help facilitating discussions between vendors and manufacturers.

•Data mining and trend analysis, upon request.

Minidump files can be made available on a case-by-case basis, based on the signed terms of use agreement. Driver vendors may also choose to share their minidumps directly with specific OEMs.

Features of Windows Error Reports on Winqual

Home page on Winqual for Windows Error Reporting:

•Quick links to crash reports for applications and drivers

•Management charts with summaries of user-mode and kernel-mode crash data

Reporting and filtering features for kernel-mode drivers:

• Ability to view driver crashes by filename, link date, operating system version and edition, date range, and responses.

•Ability to search, sort, or jump to any crash bucket (category) from anywhere on the site.

• Ability to download cabinet (.cab) files that contain mini-dump files for a particular crash bucket. Your developers can use these mini-dump files to debug the problem.

• Ability to filter downloaded .cab files by filename, link date, device Plug and Play ID, OEM machine, and operating system version and edition.

Views, sorting, and management features for user-mode applications:

•Automatic mapping of application crashes to vendors

•Summary views for company, product, and impact summary: X% of crashes map to Y% of your customer issues

•Search and sorting capabilities

•Response Management Center

•Updated help for end users

(http://msdn.microsoft.com/en-us/library/bb787181(VS.85).aspx)

Memory Management - Demystifying /3GB

As promised - here's the long awaited post on the infamous /3GB switch. At least once a week we have this discussion with a Systems Administrator somewhere who has this set in the boot.ini file on all of the servers but doesn't know why. Maybe someone added it to the server build process at some point or perhaps someone read about the /3GB switch somewhere and thought that it would improve performance or enable them to see the full 4GB or 8GB of physical memory installed on the system. So let's start by dispelling a few /3GB myths:

/3GB won't enable you to see the additional 4GB or 8GB of RAM you added to your new server /3GB doesn't necessarily make your application 50% more efficient /3GB should not be a standard for your environment (there are exceptions, and we'll get to

those)

OK - so what does the /3GB switch really do? If you recall, from the Memory Management 101 post, Windows 32-bit Operating Systems implement a virtual memory system based on a flat 32-bit address space. This address space translates into 4GB of virtual memory - no more, and no less. By default, the address space is divided into two equal chunks. 2GB for the Kernel and 2GB for User-mode processes. The Kernel space is common for all applications and the User-mode processes each get their own 2GB address space to work with.

So where does the /3GB switch come in? Windows 2000 Advanced Server, Windows 2000 Datacenter Server, Windows XP SP2 and later and all versions of Windows Server 2003 support the /3GB boot-time option to allow the user mode address space to grow to 3GB. The /3GB option was intended as a short term solution to allow applications such as database servers to maintain more data in memory than a 2GB address space allowed. However, using the /3GB method to increase the user-mode memory space comes at a cost.

Remember that we only have a 4GB total address space to work with. If we have to allocate an additional 1GB of this address space to the user-mode space, then the System space is cut in half. Drivers, Heap, Paged & NonPaged Memory all have only half the resources to work with now. However, because of the way memory mapping works, cutting the kernel space in half does a lot more than just reducing the address space. Many of the structures within the kernel virtual memory space are cut back by far more than 50%. For example, we took a Windows Server 2003 Enterprise R2 machine with 1GB of RAM installed and compared some values with and without the /3GB switch enabled.

Default OS Build:

Free System PTE's 251,980 (1,007,920 kb)NonPaged Pool Max 206,848 kbWith /3GB enabled:

Free System PTE's 34,884 (139,536 kb)NonPaged Pool Max 129,312 kb

As you can see, the Free System PTE's drops by over 200,000. Keep in mind that this is only a test server that isn't under any sort of load. A machine under medium to heavy load could quite easily run out of free PTE's - meaning that the system can no longer map system pages such as I/O space, kernel stacks and memory descriptor lists. In addition, look at NonPaged Pool after the /3GB parameter is enabled. The NonPaged Pool maximum is only 130MB. Drivers use the NonPaged Pool for many of their requirements because they can be accessed at any IRQL. Once we run into NonPaged pool depletion, we're looking at our old friend, the Event ID 2019.

OK - so let's quickly recap what we've discussed so far. The /3GB switch is not related to the amount of physical memory installed in a system. It is useful if you have an application that can take advantage of a larger address space. For a process to access the full 3GB address space, the image file must have the IMAGE_FILE_LARGE_ADDRESS_AWARE flag set in the image header.

If the flag is not set in the image header, then the OS reserves the third gigabyte so that the application won't see virtual addresses greater than 0x7FFFFFFF. You set this flag by specifying the linker flag /LARGEADDRESSAWARE when building the executable. This flag has no effect when running the application on a system with a 2-GB user address space. Therefore if you enable the /3GB switch, then applications that do not have this flag set can only use the standard 2GB of User mode memory, and the Kernel is still limited to the 1GB space - which means that 1GB of virtual memory is basically wasted!

Earlier, we mentioned that there were some applications that benefit from the use of the /3GB switch. The predominant scenario where the /3GB switch is not only recommended, but actually required is with Microsoft Exchange servers that house public folders and / or mailboxes. Due to the way that Exchange handles memory management, the additional 1GB of user mode memory is required to ensure that the Store.exe process does not run out of virtual address space. However, in order to guard against System PTE depletion, the system can be tuned using the /USERVA switch in conjunction with the /3GB switch. This tunes the actual amount of memory for the Address space. For example, setting USERVA=3030 means that the process space is actually only 3,030MB and not 3,072MB (which would be the process space with only the /3GB switch present). The additional 42MB is used for System PTE usage. The USERVA value can safely be tweaked as low as 2800 - however, if it is necessary to set USERVA this low, then you probably want to start thinking about scaling your Exchange environment to spread the load!

Ideally, there should always be at least 24,000 Free System PTE's at boot time. Depending on server workload there may be wide variances in the amount of Free System PTE's during the course of a normal duty cycle, so it may be necessary to implement some long-term monitoring to ensure that the server does not fall below 10,000 Free PTE's.

So there you have it - the /3GB switch demystified. Hopefully this post, as well as the others in our Memory Management series will help you understand a bit more about how and why the Operating System behaves the way it does. Remember that the /3GB switch is intended to be used in very specific instances - and now you know why!

Additional Resources:

Microsoft® Windows® Internals, Fourth Edition: Microsoft Windows Server™ 2003, Windows XP, and Windows 2000 (Chapter 7 covers Memory Management)

Memory Management: What Every Driver Writer Needs to Know Windows DDK: /3GB Microsoft KB833721 Available switch options for the Windows XP and the Windows Server 2003

Boot.ini files Microsoft KB823440 Use of the /3GB switch in Exchange Server 2003 on a Windows Server

2003-based system

Microsoft KB316739 How to use the /userva switch with the /3GB switch to tune the User-mode space to a value between 2 GB and 3 GB

Microsoft KB810371 Using the /Userva switch on Windows Server 2003-based computers that are running Exchange Server

Microsoft KB274750 Configuring SQL Server to use more than 2GB of Memory Raymond Chen: Summary of /3GB posts

Memory Management - x86 Virtual Address Space

In previous posts, we've discussed the Basics of Memory Management, Pool Resources and of course the /3GB Switch. Today we're going to take a look at the Virtual Address Space Layouts on a 32-bit system. We'll cover the 64-bit system specifics in a later post. First, let's cover some basic concepts dealing with the Virtual Address Space in Windows.

There are three main types of data that are mapped into the virtual address space in Windows:

per-process private code and data sessionwide code and data systemwide code and data

As we discussed previously, each process has its own private address space that cannot be accessed by other processes unless they have permission to open the process for read or write access. Threads within the process cannot access virtual addresses outside the private address space unless they map to shared memory sections or use cross-process memory functions.

On systems with multiple sessions, such as Windows 2000 Server with Terminal Services and Windows XP and later operating systems, the sessions space contains information that is global to each session. A session consists of all the processes and other system objects that represent a single user's logon session. We covered Sessions, Desktops and Windows Stations a couple of months ago. All sessionwide data structures are mapped into a region of system space called session space. When a process is created, the range of addresses is mapped to the pages appropriate to the session to which the process belongs.

Finally, system space contains the global OS code and data structures that are visible to each process. The following components are part of the system space:

System Code: the OS Image, HAL and Device Drivers used to boot the system System Mapped Views: Used to map the loadable kernel-mode part of the Windows subsystem

(Win32k.sys) and its kernel-mode graphics drivers Hyperspace: Used to map the process working set list and temporarily map other physical

pages for operations such as zeroing pages, invalidating page table entries and process creation

System Working Set List: Structures that describe the system working set System Cache: Used to map files that are open in the system cache Paged Pool: Pageable system memory heap System PTE's: Pool of system PTE's used to map system pages such as I/O space, kernel stacks

etc. NonPaged Pool: Nonpageable system memory heap Crash Dump Information: Information about the state of a system crash HAL usage: memory reserved for HAL-specific structures

That covers the basic concepts of the virtual address space. Don't be too alarmed if you don't fully understand the individual components of the system space listed above - the key to remember is that there are three main data types to consider: Process, Session and System. With that in mind, let's take a look at the layout of Virtual Address Space on a 32-bit (x86) system.

If you recall, each user-mode process on a 32-bit Windows system can have up to 2GB of private address space, with the rest being reserved for the Operating System (we're assuming that the /3GB switch is not in play at the moment). The default address space is shown below on the left. On the right is what the address space would look like if you were to use the /3GB switch.

As you can see, the system-space is dramatically decreased. For more on how this effects PTE's and NonPaged Pool (among others) please refer back to our Demystifying /3GB post. Now, let's take a look at the different system variables and how they map to system space on systems with and without /3GB:

System Variable Description x86 with 2GB system space

MmSystemRangeStart Start Address of System Space

0x80000000

MmSystemCacheWorkingSetList System Working Set List

0xC0000000

MmSystemCacheStart Start of System Cache

Calculated

MmSystemCacheEnd End of System Cache

Calculated

MiSystemCacheStartExtra Start of system cache or system PTE extension

Calculated

MiSystemCacheEndExtra End of system cache or PTE extension

0xC0000000

MmPagedPoolStart Start of Paged Pool Calculated

MmPagedPoolEnd End of Paged Pool Calculated (max = 650MB)

MmNonPagedSystemStart Start of System PTE's

Calculated (lowest value is 0xEB000000)

MmNonPagedPoolStart Start of NonPaged Pool

Calculated

MmNonPagedPoolExpansionStart Start of NonPaged Pool Expansion

Calculated

MmNonPagedPoolEnd End of NonPaged Pool

0xFFBE0000

That brings us to the end of our overview of the 32-bit Virtual Address Space. We'll go over the 64-bit address spaces in a future post. Until next time ...

Memory Management - Understanding Pool Resources

Following up on our Memory Management 101 post, we're moving on to a discussion of Pool Resources and Pool Resource Depletion. First of all - what are Pool Resources? When a machine boots up, the Memory Manager creates two dynamically sized memory pools that kernel-mode components use to

allocate system memory. These two pools are known as the Paged Pool and NonPaged Pool. Each of these pools start at an initial size that is based upon the amount of physical memory present in the system. Pool memory is a subset of available memory and is not necessarily contiguous. If necessary, these pools can grow up to a maximum size that is determined by the system at boot time.

So - what distinguishes Paged Pool and NonPaged Pool memory? The first difference is that Paged Pool is exactly what its name implies - it can be paged out. The NonPaged Pool cannot be paged out. Drivers use the NonPaged Pool for many of their requirements because they can be accessed at any Interrupt Request Level (IRQL). The IRQL defines the hardware priority at which a processor operates at any given time (there's a link to a document covering Scheduling, Thread Context and IRQL's in the Additional Resources section at the end of this post).

Getting back to our Pool Resources, it is important to remember that these resources are finite. The table below outlines some sample maximum values for Paged / NonPaged Pool on x86 systems that are not configured with the /3GB switch in the system's boot.ini file. We'll cover /3GB and its effects on memory in a future post. We'll also cover Kernel Changes to Windows Vista separately. It's important to note that x64 systems don't suffer from the same Virtual Address Space limitations!

Windows 2000

System RAM

NonPaged Max

Paged Max

Paged Max (TS)

512 MB 131 MB 264 MB 160 MB *

1024 MB 212 MB 268 MB 160 MB *

1536 MB 256 MB 340 MB 160 MB *

2048 MB 256 MB 340 MB 160 MB *

* If Terminal Services is installed on Windows 2000, Paged Pool is lowered down to 160 MB unless a registry change is made to the server to set the Paged Pool Size to its maximum value (see below).

Windows 2003 SP1

System RAM

NonPaged Max

Paged Max

512 MB 125 MB 184 MB

1024 MB 202 MB 168 MB

1536 MB 254 MB 352 MB

2048 MB 252 MB 352 MB

On Windows 2003 systems, Terminal Services are enabled by default.

On both Windows 2000 and Windows 2003, the HKLM\System\CurrentControlSet\Control\Session Management\Memory Management\PagedPoolSize value can be set to 0xFFFFFFFF (or resetting the value to 0) to ensure that the Virtual Address Space used for Paged Pool is maximized.

Also - here's the theoretical maximums for pre-Vista Operating Systems:

Region IA-64 x64 x86

Process Address 7152 8192 2 to 3 GB*

Space GB GB

Paged Pool 128 GB

128 GB

470 to 650 MB

NonPaged Pool 128 GB

128 GB

256 MB

* depends on whether or not /3GB is enabled

Now that we know what the maximum value ranges should look like, here's how to verify what those values look like on your own system using Process Explorer:

1. Download Process Explorer from the Microsoft Sysinternals Site. 2. Unzip the contents of the Zip file to C:\ProcessExplorer 3. Run ProcExp.exe 4. The first thing you'll want to do is configure the Symbols and the dbghelp.dll path. If you leave

the default path alone (see the image below), you'll get an error message that you haven't configured symbols - even if you have the Symbol path filled in (the Procexp.chm file in the ProcessExplorer folder provides instructions).

5. To get the most out of Process Explorer however, you'll need to install the Microsoft Debugging Tools (we'll be needing these later anyway when we start getting into troubleshooting). This is important because you can specify the proper version of dbghelp.dll. Once you have Process Explorer fully configured (see the image below) for the settings, you're ready to check out your

Pool Resources:

6. Within Process Explorer, select View ... System Information ... and look at the Kernel Memory Section. The Paged Limit and NonPaged Limit show the current maximum values for the system you are examining.

Now, let's examine what type of items reside in each of these pools. Within the NonPaged pool, you would find handles that are used by applications in the user-mode space as well as Kernel-Mode drivers (typically ending in a .sys file extension). Examples of Paged Pool items are Token Objects, Kernel-Mode drivers and the Registry.

Here's where things get really interesting - what happens to a system when these Pool Resources get depleted? Some of the most common symptoms exhibited are:

the machine becomes sluggish users can no longer log on to the machine console access is sluggish users cannot connect to file shares or other shared resources system hangs including the console itself being unresponsive

Symptoms such as this are usually the first indication that there is something causing an issue with the machine.

If the NonPaged Pool on a server has become depleted, the machine will log an Event in the System Log as shown below:

Event ID 2019 Event Type: Error Event Source: Srv Event Category: None Event ID: 2019 Description: The server was unable to allocate from the system NonPaged pool because the pool was empty.

Paged Pool Depletion is logged as an Event 2020:

Event ID 2020 Event Type: Error Event Source: Srv Event Category: None Event ID: 2020 Description: The server was unable to allocate from the system paged pool because the pool was empty.

What are these error messages telling us beyond the fact that there is an issue with Pool Depletion? A common misunderstanding of this message is that the problem is being caused by the Server Service (srv.sys). Usually the Server Service is the first component to experience the issue because it is trying to satisfy a request and cannot allocate the appropriate Pool Memory.

We'll cover 2019 & 2020 troubleshooting in greater depth in a future post including how to use tools such as Poolmon and Perfmon in conjunction with a Memory Dump to find the culprits! However, to get you started, I've included links to a great post by our CPR team on their blog as well as some more general information regarding Pool Memory. The links are below.

Coming soon - Memory Tuning, Troubleshooting Memory Issues and Using the /3GB switch

Memory Management 101

Memory Management issues make up a considerable portion of the support incidents that we handle. At some point during the support incident we invariably engage in a discussion of Memory Management, Memory Tuning, the use of the infamous /3GB switch and more. There's far too much information to compress into a single blog post, so think of this as the first part in a series. In this post

we'll cover the basics of 32-bit Memory architecture and the difference between Kernel and User mode memory. So let's dive right in ...

Windows 32-bit Operating Systems implement a virtual memory system based on a flat 32-bit address space. 32-bits of address space translates into 4GB of virtual memory. A process can access up to 4GB of memory address space (using the /3GB switch changes this behavior - and we'll cover that in a later post).

You can't have a discussion of Memory Management basics, without distinguishing between Kernel-mode and User-mode memory. The system space (aka Kernel space) is the portion of the address space in which the OS and kernel-mode drivers reside. Only kernel-mode code can access this space. User-mode threads can access data only in the context of their own process. User-mode threads cannot access data within another processes space directly, nor can it access the system address space directly. Kernel-mode drivers are trusted by the OS and can access both kernel and user space. When a driver routine is called from a user thread, the thread's data remains in the user-mode space. However, the kernel-mode driver can access the user-mode data for the thread and access the kernel-mode space.

OK - so looking at the diagram above, we can see how the 4GB memory address space is divided. Windows allocates the lower half of the 4GB address space (from 0x00000000 to 0x7FFFFFFF) to processes for their own unique private storage, and reserves the other half (from 0x80000000 to 0xFFFFFFFF) for the Operating System's use. Virtual memory provides a view of memory that does not necessarily correspond to the physical layout of memory.

This is usually the point in the discussion where the majority of folks start getting confused and their eyes start to glaze over. In simplistic terms, the memory manager translates the virtual memory addresses into physical addresses where the data is stored. Every page in virtual memory is listed in a

page table which in turn identifies the correct physical page. The system and CPU use the information from the virtual address to find the correct page table entry for a specific page.

So, looking at the diagram on the left, we can see that a virtual address points to a specific location on a virtual page. The virtual address contains a byte offset and several index values that are used to locate the page table entry that maps the virtual page into physical memory. After the memory manager finds the page table entry, it uses the offset to find a byte in physical memory - identified by a physical address.

And there you have it - a quick look at the basics of Memory Management. Over the course of the next few posts on Memory Management, we'll talk a bit more about the following topics:Pool Memory, Memory Tuning, Troubleshooting Memory Issues and the infamous /3GB switch.

RAM, virtual memory, pagefile, and memory management in Windows

Processes and address spaces

All processes (for example, application executables) that are running under 32-bit versions of Windows are

assigned virtual memory addresses (a virtual address space), ranging from 0 to 4,294,967,295 (2*32-1 = 4

http://blogs.technet.com/blogfiles/askperf/WindowsLiveWriter/Letstalk3GBPart1_10B61/MemoryManager1.jpg

GB), regardless of how much RAM is actually installed on the computer.

In the default Windows configuration, 2 gigabytes (GB) of this virtual address space are designated for the

private use of each process, and the other 2 GB is shared between all processes and the operating system.

Typically, applications (for example, Notepad, Word, Excel, and Acrobat Reader) use only a fraction of the 2

GB of private address space. The operating system assigns RAM page frames only to those virtual memory

pages that are being used.

Physical Address Extension (PAE) is the feature of the Intel 32-bit architecture that expands the physical

memory (RAM) address to 36 bits. PAE does not change the size of the virtual address space (which remains

at 4 GB), but just the volume of actual RAM that can be addressed by the processor. For more information,

click the following article number to view the article in the Microsoft Knowledge Base:

268363 (http://support.microsoft.com/kb/268363/ ) Intel Physical Addressing Extensions (PAE) in Windows 2000

The translation between the 32-bit virtual memory address that is used by the code that is running in a

process and the 36-bit RAM address is handled automatically and transparently by the computer hardware

according to translation tables that are maintained by the operating system. Any virtual memory page (32-

bit address) can be associated with any physical RAM page (36-bit address).

The following list describes how much RAM the various Windows versions and editions support (as of May

2010):

Collapse this tableExpand this table

Windows NT 4.0 4 GB

Windows 2000 Professional 4 GB

Windows 2000 Standard Server 4 GB

Windows 2000 Advanced Server 8 GB

Windows 2000 Datacenter Server 32 GB

Windows XP Professional 4 GB

Windows Server 2003 Web Edition 2 GB

Windows Server 2003 Standard Edition 4 GB


Windows Server 2003 Enterprise Edition 32 GB

Windows Server 2003 Datacenter Edition 128 GB

Windows Vista 4 GB

Windows Server 2008 Standard 4 GB

Windows Server 2008 Enterprise 64 GB

Windows Server 2008 Datacenter 64 GB

Windows 7 4 GB

Back to the top

Pagefile

RAM is a limited resource, whereas for most practical purposes, virtual memory is unlimited. There can be

many processes, and each process has its own 2 GB of private virtual address space. When the memory

being used by all the existing processes exceeds the available RAM, the operating system moves pages (4-

KB pieces) of one or more virtual address spaces to the computer’s hard disk. This frees that RAM frame for

other uses. In Windows systems, these “paged out” pages are stored in one or more files (Pagefile.sys files)

in the root of a partition. There can be one such file in each disk partition. The location and size of the page

file is configured in System Properties (click Advanced, click Performance, and then click the Settings

button).

Users frequently ask "how big should I make the pagefile?" There is no single answer to this question

because it depends on the amount of installed RAM and on how much virtual memory that workload

requires. If there is no other information available, the typical recommendation of 1.5 times the installed

RAM is a good starting point. On server systems, you typically want to have sufficient RAM so that there is

never a shortage and so that the pagefile is basically not used. On these systems, it may serve no useful

purpose to maintain a really large pagefile. On the other hand, if disk space is plentiful, maintaining a large

pagefile (for example, 1.5 times the installed RAM) does not cause a problem, and this also eliminates the

need to worry over how large to make it.

Performance, architectural limits, and RAM

On any computer system, as the load increases (the number of users, the volume of work), performance

decreases, but in a nonlinear manner. Any increase in load or demand, beyond a certain point, causes a



significant decrease in performance. This means that some resource is in critically short supply and has

become a bottleneck.

At some point, the resource that is in short supply cannot be increased. This means that an architectural

limit has been reached. Some frequently reported architectural limits in Windows include the following:

2 GB of shared virtual address space for the system (kernel)

2 GB of private virtual address space per process (user mode)

660 MB of system PTE storage (Windows Server 2003 and earlier)

470 MB of paged pool storage (Windows Server 2003 and earlier)

256 MB of nonpaged pool storage (Windows Server 2003 and earlier)

This applies to Windows Server 2003 specifically, but this may also apply to Windows XP and to Windows

2000. However, Windows Vista, Windows Server 2008, and Windows 7 do not all share these architectural

limits. The limits on user and kernel memory (numbers 1 and 2 here) are the same, but kernel resources

such as PTEs and various memory pools are dynamic. This new functionality enables both paged and

nonpaged memory. This also enables PTEs and session pool to grow beyond the limits that were discussed

earlier, up to the point where the whole kernel is exhausted.

Frequently found and quoted statements such as the following:

With a Terminal Server, the 2 GB of shared address space will be completely used before 4 GB of RAM is

used.”

This may be true in some cases. However, you have to monitor your system to know whether they apply to

your particular system or not. In some cases, these statements are conclusions from specific Windows NT 4.0

or Windows 2000 environments and do not necessarily apply to Windows Server 2003. Significant changes

were made to Windows Server 2003 to reduce the probability that these architectural limits will in fact be

reached in practice. For example, some processes that were in the kernel were moved to non-kernel

processes to reduce the memory used in the shared virtual address space.

Monitoring RAM and virtual memory usage

Performance Monitor is the principle tool for monitoring system performance and for identifying the location

of the bottleneck. To start Performance Monitor, click Start, click Control Panel, click Administrative

Tools, and then double-click Performance Monitor. Here is a summary of some important counters and

what they tell you:

Memory, Committed Bytes: This counter is a measure of the demand for virtual memory.

This shows how many bytes were allocated by processes and to which the operating system has

committed a RAM page frame or a page slot in the pagefile (or perhaps both). As Committed Bytes

grows greater than the available RAM, paging will increase, and the pagefile size that is being used

will also increase. At some point, paging activity starts to significantly affect performance.

Process, Working Set, _Total: This counter is a measure of the virtual memory in "active" use.

This counter shows how much RAM is required so that the virtual memory being used for all

processes is in RAM. This value is always a multiple of 4,096, which is the page size that is used in

Windows. As demand for virtual memory increases beyond the available RAM, the operating system

adjusts how much of a process's virtual memory is in its Working Set to optimize available RAM

usage and minimize paging.

Paging File, %pagefile in use: This counter is a measure of how much of the pagefile is actually

being used.

Use this counter to determine whether the pagefile is an appropriate size. If this counter reaches

100, the pagefile is full, and things will stop working. Depending on the volatility of your workload,

you probably want the pagefile large enough so that it is generally no more than 50-075 percent

used. If much of the pagefile is being used, having more than one on different physical disks, may

improve performance.

Memory, Pages/Sec: This counter is one of the most misunderstood measures.

A high value for this counter does not necessarily imply that your performance bottleneck stems

from a shortage of RAM. The operating system uses the paging system for purposes other than

swapping pages because of memory over-commitment.

Memory, Pages Output/Sec: This counter shows how many virtual memory pages were written to

the pagefile to free RAM page frames for other purposes each second.

This is the best counter to monitor if you suspect that paging is your performance bottleneck. Even if

Committed Bytes is greater than the installed RAM, if Pages Output/sec is low or zero most of the

time, there is no significant performance problem from insufficient RAM.

Memory, Cache Bytes,

Memory, Pool Nonpaged Bytes,

Memory, Pool Paged Bytes,

Memory, System Code Total Bytes,

Memory, System Driver Total Bytes:

The sum of these counters is a measure of how much of the 2 GB of the shared part of the 4-GB

virtual address space is actually being used. Use these to determine whether your system is

reaching one of the architectural limits discussed that were discussed earlier.

Memory, Available MBytes: This counter measures how much RAM is available to satisfy

demands for virtual memory (either new allocations, or for restoring a page from the pagefile).

When RAM is in short supply (for example, Committed Bytes is greater than installed RAM), the

operating system will try to keep a certain fraction of installed RAM available for immediate use by

copying virtual memory pages that are not in active use to the pagefile. Therefore, this counter will

not reach zero and is not necessarily a good indication of whether your system is short of RAM.

Terminal Server and Printer Redirection

A common issue we work with is Printer Redirection issues with Terminal Services on Windows Server 2003. Although printer redirection seems fairly simple on the surface, the issues that we work on can get somewhat convoluted. So today we're going to look at some common troubleshooting steps for Printer Redirection issues. However, before we get to the troubleshooting steps for tackling this problem let's take a quick look at Printer Redirection itself.

Printer redirection was first implemented in Windows 2000 Server. Printer redirection enables the users to print to their locally installed printer from a terminal services session. The Terminal Server client enumerates the local print queues to detect the locally installed printers. This list is presented to the server and server creates the print queue in the session. The TS client provides the driver string name for the locally installed printers and if the server has matching drivers installed then the printers will be redirected. When we look at Printers on the Terminal Server, a redirected printer will have a name similar to what is shown below:

As you can see, the naming convention follows this format:

Client Printer Name (from Client Computer Name) in Session Number

The printer queues created in this manner are referred to as automatic printer queues. However the Terminal Server administrator

could also create manual print queues by selecting the redirected ports in the Add Printer Wizard.

Seems pretty simple, right? Let's start looking at some problem scenarios ...

Scenario 1: Printer Redirection fails for all clients

The most common scenario is when Printer Redirection does not work for any of the Terminal Server clients. There are a number of troubleshooting steps to take when this occurs:

Check the Windows Printer Mapping Check Box:

Launch the Terminal Services Configuration utility. Double-click the Connections folder. In the right-hand pane, right click on RDP-Tcp and select properties.

Once you have the property sheet open, select the Client Settings tab. At the bottom of the tab, there is a section labeled Disable the following: as shown below. Ensure that the Windows Printer mapping checkbox is not selected. If it is, that means that the Terminal Server will not allow client Printer Redirection.

Group Policy Settings:

Printer Redirection can also be configured through Group Policy. The setting to prevent client printer redirection is located in the following container: Computer Configuration\Administrative Templates\Windows Components\Terminal Services\Client / Server Data Redirection. The name of the policy setting is "Do not allow client printer redirection" as shown below

If this policy is enabled, it will prevent client printer redirection. In addition, the Windows printer mapping checkbox in the Terminal Server Configuration console is disabled.

Terminal Server Device Redirector:

The server side component responsible for printer redirection is RDPDR.SYS. You can check the status of this driver in Device manager as shown below. If the Terminal Server Device Redirector is disabled, as in the screenshot below then device redirection will not work.

Make sure that Terminal Server Device Redirector is in enabled status. You can also use the following command to recreate this device: devcon -r install %windir%\inf\machine.inf root\rdpdr. More information on the DevCon utility is available in

Microsoft KB Article 311272.

Registry key verification

If you have implemented the mentioned in Microsoft KB Article 268065 it will prevent printer redirection from functioning. As per this KB if the registry value fEnablePrintRDR is set to 0, Printer Redirection will fail even if the Print Spooler is started on the Terminal Server. This will not register any events. As part of your standard troubleshooting you should check whether the following value exists: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Terminal Server\Wds\rdpwd\fEnablePrintRDR.

Scenario 2: Printer redirection fails for a single client:

When Printer redirection fails for a single client, there are a couple of things to check. First, check the settings on the Remote Desktop client on the problem machine and ensure that the Printers check box is checked to allow redirection:

The other thing to check (often overlooked) is the Remote Desktop Client version on the problem machine. You should always ensure that you are running the latest RDP Client on your client machines.

Scenario 3: Certain Printers are not Redirected

In some situations, individual printers are not redirected. On the Terminal Server itself you may see an error similar to the one below:

Type: Error Event ID: 1111 Description: Driver drivername required for printer printertype is unknown. Contact the administrator to install the driver before you log in again.

In many instances, the quickest way to work around this situation is to install the printer driver directly on the Terminal Server. However, with Windows 2003 Server Service Pack 1, you can use the Fallback Driver Capability instead. The policy for this is located under Computer Configuration\Administrative Templates\Windows Components\Terminal Services\Client / Server data redirection\Terminal Server Fallback Printer Driver Behavior

When you enable the policy, you have the following options:

An extremely useful tool to use when troubleshooting Printer Redirection is the TS Printer Redirection Wizard tool. This utility scans the Terminal Server's System Even log and detects all Event ID 1111 events with a source of "TermServDevices". The tool then scans the registry for installed Version 3 MINI drivers, and prompts the Administrator to substitute an installed driver for each of the printers that failed redirection. Any changes are written to a file where the custom redirected printer mappings are stored.

Sometimes multifunction print devices may not be redirected unless you are running Windows Server 2003 on your local computer because they use DOT4 ports. Only W2K3 redirects printer port names that do not begin with COM, LPT, or USB. If you are using a Windows XP machine use the workaround mentioned in Microsoft KB Article 302361.

Last but by no means least, please do not forget to make sure that your printing subsystem is functioning properly on both the Terminal Server and the TS Clients even before you start troubleshooting printer redirection issues in Terminal Services.

And just as a quick note, Citrix uses different methods for redirecting printers and these troubleshooting steps may not be applicable for Citrix servers. And that brings us to the end of our post on Terminal Server and Printer Redirection. I hope you find this information useful, and if you have any feedback, please don't hesitate to let us know!

Additional Resources:

How to redirect only the default printer of a Terminal Services client to a Windows Server 2003 Terminal Server session

Windows 2000 Terminal Services does not redirect network printers Windows Server 2003 Terminal Services

Basic Printing Architecture

Printer sharing, information retrieval, and data storage are among the most frequently used network services. This also means that when something major happens to a print server or file server - lots of people are adversely affected. Issues with Print Servers and the Print Spooler are very common issues for the Performance team. So today, we're going to kick off our series of posts on Printing with an overview of the Windows Printing Architecture and the Print Spooler.

Broken down into basic elements, the Windows printing architecture consists of a print spooler and a set of print drivers. The print spooler is the primary component of the printing interface. Most administrators are familiar with the spooler as an executable file (spoolsv.exe) that manages the printing process. In a default configuration, the spooler is loaded at system startup and continues to run until the operating system is shut down. The print spooler consists of a set of Microsoft and vendor components that perform the following tasks:

Should a print job be handled locally or across a network? Accepting a data stream created by GDI, in conjunction with a printer driver, for output on a

particular printer type Spooling the data to a file Selecting the first available physical printer in a logical printer queue Converting a data stream from a spooled format such as EMF to a format that can be sent to

the printer hardware (such as PCL) Sending a data stream to printer hardware Maintaining the registry database for spooler components and printer forms

Below is a simplified view of the print spooler components. When looking at this diagram, if the printer hardware is local to the system, then the "client" and "server" pieces are all on the same machine.

Application - The print application creates a print job by calling Graphics Device Interface (GDI) functions

GDI - GDI includes user-mode and kernel-mode components. The user-mode component, Microsoft Win32 GDI, is used by Win32 applications that require graphics support. The kernel-mode component, the graphics engine, exports services and functions that graphics device drivers can use

Winspool.drv - This is the client interface into the spooler.

Spoolsv.exe - This is the spooler's API server. It is implemented as a service when the OS is started

Spoolss.dll - Spoolss.dll acts as a router, determining which print provider to call based on a printer name or handle supplied with each function call. It then passes the function call to the correct provider.

OK - so let's discuss print providers. The print provider is responsible for several functions, including directing print jobs to local or remote print devices and print queue management operations such as starting, stopping and enumerating print queues. Print providers implement a common set of capabilities that are defined by a set of API functions. These functions are called by spoolss.dll. The diagram below illustrates possible flow paths involving different print providers.

The following print providers are supplied by Microsoft in Windows 2000, XP & Windows Server 2003:

localspl.dll - this is the local print provider which handles all print jobs directed to local printers

win32spl.dll - this is the Windows network print provider. All print jobs directed to remote servers and handled by this provider. When the job arrives at the remote server it is passed to the server's local print provider

nwprovau.dll - Novell NetWare print provider

inetpp.dll - HTTP print provider which handles print jobs sent to a URL

Printer manufacturers may also create their own network print providers.

And that brings us to Print Processors. Print processors are user-mode DLL files that convert the spooled data from a print job into a format that is understood by a print monitor. When a print job is spooled, the data is contained in a spool file. The print processor reads the file, performs the conversion on the data stream and writes the converted data to the spooler. The spooler sends the data to the correct print monitor. Print processors are associated with printer drivers during driver installation. The default Print Processor provided with the operating system is winprint.dll.

Lastly, let's talk about Print Monitors. These are user-mode DLL's responsible for directing a print data stream from the spooler to an appropriate port driver. When we talk about print monitors we are actually referring to two different types of monitor. First, there is the language monitor which provides a full-duplex communications path between the print spooler and bi-directional printers that are capable of providing software-accessible status information. In addition, the language monitors add printer control information to the data stream. It is important to note that Language monitors can be used to add any post spooling processing. In addition to the Language monitors, we also have Port monitors. These are responsible for providing a communications path between the user-mode print spooler and the kernel-mode port drivers that access I/O port hardware. Port monitors are, as the name suggests, responsible for management and configuration of the printer ports on a server.

And that will do it for our look at the Basic Printing Architecture. In future posts, we will be looking at drivers, troubleshooting and policies. Until next time ...

Windows Vista - Point & Print

Here on the Perf team, we deal with quite a few printing issues. An issue we've had a few calls on since the release of Windows Vista concerns the changes made to the Point & Print functionality. Point & Print is a Windows feature that enables users to connect to a shared printer without the need to manually install the necessary printer driver software. Point & Print automatically downloads and

installs the required printer drivers when a user connects to a shared printer. It also updates the printer driver on the client computer when the printer driver or the printer driver configuration is updated on the print server.

So - how is Point & Print different on Windows Vista?

Because Point & Print installs software on the client computer, Point & Print features are subject to the enhanced security model of Windows Vista. New configuration settings were added to the Point & Print Restrictions group policy in Windows Vista.

Point & Print Security Best Practices

The Point & Print Restrictions Group policy can be edited using gpedit.msc. The policies are located in User Configuration\Administrative Templates\Control Panel\Printers. We're going to outline several different configuration scenarios.

Scenario 1: Using Deployed Printers

With Deployed Printers, only the printers defined for a user or group will be installed on the client computers that are managed by the group policy. This is considered the most secure practice because the client computers only have the printers installed that are defined in the Group Policy. To configure Deployed Printers, use the Print Management Console (printmanagement.msc) to create the GPO and define the printers to deploy.

Configuration: Configure the GPO settings below.

After you configure the deployed printers, configure the Point and Print Restrictions group policy as follows:

Point and Print Restrictions: Enabled. When installing drivers for a new connection: Do not show warning or elevation prompt. When updating drivers for an existing connection: Show warning only.

User Experience: After you configure the deployed printers and the Point and Print Restrictions group policy, the deployed printers will automatically be installed on the client computer the next time the user logs on. The user will not see any warning messages when the printers are installed for the first time. However, if the printer configuration has been updated on the print server after the deployed printers have been installed on the client computer, the user will see a warning message that informs them that Point and Print must update the driver or configuration for the printer.

Scenario 2: Using the Default Security Settings

The default printer security settings of Window Vista provide a high degree of security and warn the user before software is installed on the client computer. The default security settings also restrict software installation to only users with administrator-level privileges. Trustworthy printer drivers, such as those provided in-box or in printer driver packages, do not require the user to have administrator-level privileges to install them with the default security. In-box printer drivers are those printer drivers found on the Windows distribution media.

Configuration: No additional configuration is necessary.

User Experience: If a user connects to a shared printer and the required printer driver is not on their computer, or if the driver for an installed printer has been updated on the print server, Point and Print begins the installation process. First, the user sees a warning message similar to the image below.

After a user with administrator-level privileges clicks Install driver, a dialog box is displayed to prompt for permission to continue.

After a non-privileged user clicks Install Driver, the UAC dialog box is displayed. The user must be able to enter a password for an account that has administrator-level privileges in this dialog or the printer installation will fail.

Scenario 3: Using Point and Print on Specific Print Servers Only

The Point and Print Restrictions group policy enables you to limit the servers to which a user can Point and Print. You can configure specific print servers to use only printers with trustworthy printer drivers or printers that do not require printer drivers to be downloaded, such as printers that have in-box drivers.

Configuration: First, configure the print servers so that they share only printers that have trustworthy printer drivers or printers with drivers that do not need to be downloaded. These can be printers that have:

in-box printer drivers printer drivers in driver packages printer drivers you have tested and found to be trustworthy printer drivers that are already installed on the client computers.

Then set the following options in the Group policy

Point and Print Restrictions: Enabled. Users can only point and print to these servers: Checked. Enter the fully qualified server names in the text box and separate each name with a semi-

colon. When installing drivers for a new connection: Do not show warning or elevation prompt. When updating drivers for an existing connection: Do not show warning or elevation prompt.

User Experience: When a user connects to a printer that is shared on a print server listed in the Point and Print Restrictions group policy, Point and Print installs the necessary printer drivers and does not require any additional user interaction. If the user connects to a shared printer on any other print server, Point and Print will not download a printer driver to the client computer. The user may still be able to use the printer but only if they do not need to download the printer driver.

Scenario 4: Use Printers with In-Box Drivers Only

Printers with in-box printer drivers can be installed without downloading any software from the print server. If all printers hosted by your print servers have in-box printer drivers, users will not see any warning dialog boxes when they connect to a shared printer.

Configuration: Verify that all shared printers have in-box drivers for the versions of Windows that are installed on the client computers in your enterprise.

User Experience: When the user connects to a shared printer that has an in-box printer driver, the printer driver will be installed by using software that is available on the client computer. Point and Print will not download any software and the user will not see any warning dialog boxes.

Scenario 5: Use Windows XP-Level Security

You can use the Point and Print Restrictions group policy to provide a client computer with the same level of Point and Print security on Windows Vista as it had with Windows XP.

Configuration: Configure the Point and Print Restrictions Properties group policy and set:

Point and Print Restrictions: Enabled. When installing drivers for a new connection: Do not show warning or elevation prompt. When updating drivers for an existing connection: Do not show warning or elevation prompt.

User Experience: Users will not see any additional warning messages when they connect to a shared printer and Point and Print installs a new printer driver or when Point and Print updates the printer driver for an existing connection.

Scenario 6: Use Printers with Printer Driver Packages

Windows Vista introduces printer driver packages. A printer driver package is a signed group of files that make up a printer driver. Printer driver packages are secure and they can be installed by users who do not have administrator-level privileges.

Configuration: Confirm that the shared printers on your print servers have a printer driver package (the printer driver packages should be supplied by the printer manufacturer). Note that only computers running Windows Vista can use printer driver packages. Computers that are running earlier versions of Windows and share printers cannot use printer driver packages.

User Experience: Because printer driver packages are secure, they are downloaded and installed without presenting any warning messages to the user.

OK - that's it for this post. Hopefully this helps to clear up some of the confusion concerning Point & Print on Windows Vista. Until next time ...

These links might be helpful for your 2nd Round of Technical Interview

The Case of the Mysterious Blank Desktop

Hello folks, Prabhakar Shettigar here once more with another odd case from the trenches. Recently my colleague Sumesh wrote about how Not All Systems Are Truly Equal. Following on from his post, I thought I’d share some interesting anecdotes about a couple of strange cases I worked involving one of our more common Desktop Shell issues that we see on the Performance team - the dreaded “Blank

http://blogs.technet.com/askperf/archive/2009/03/10/not-all-systems-are-truly-equal.aspx

Desktop”. I’m sure most of you have run into this at one time or another – you’ve entered your credentials and … nothing but a pretty blue screen. Now, these issues can be somewhat frustrating to troubleshoot under the best of circumstances, but in this particular instance, the problems began long before the user tried to log on …

Once upon a time there was an Administrator who had to deploy hundreds of desktops to his respective organizations. He told that the deployed desktops should be running Windows Vista and should have all the common user business applications installed and ready to go when the user received their machine and logged on. Our administrator duly created a new image and deployed it to his first group of users having first pre-staged the computer accounts in his Active Directory. All appeared to be going smoothly, until … the first user logged on with their domain credentials. Nothing. No icons, no taskbar … nothing. A beautiful blue background was all that he was presented with. The system was present on the network, had applied the requisite group policy settings, on the surface – all seemed well. What happened?

After some fairly straightforward troubleshooting, we discovered that during the build process the administrator had disabled User Account Control (UAC) in the interests of saving time when installing applications. When the systems were joined to the domain, the domain policy was set to enforce the use of UAC. In and of itself, this wouldn’t seem to be an insurmountable issue … except that somewhere down the road, there was a second change made to the system – the one that ultimately caused the issue. Before we get to the real culprit, let’s quickly review some of the architecture pieces of UAC. The excerpt below is taken from an MSDN Article – Understanding and Configuring User Account Control in Windows Vista.

While the Windows Vista logon process externally appears to be the same as the logon process in Windows XP, the internal mechanics have greatly changed. The following illustration details how the logon process for an administrator differs from the logon process for a standard user.

When an administrator logs on, the user is granted two access tokens: a full administrator access token and a "filtered" standard user access token. By default, when a member of the local Administrators group logs on, the administrative Windows privileges are disabled and elevated user rights are removed, resulting in the standard user access token. The standard user access token is then used to launch the desktop (Explorer.exe). Explorer.exe is the parent process from which all other user-initiated processes inherit their access token. As a result, all applications run as a standard user by default unless a user provides consent or credentials to approve an application to use a full administrative access token. Contrasting with this process, when a standard user logs on, only a standard user access token is created. This standard user access token is then used to launch the desktop.

This all seems fairly straightforward, but what exactly does it have to do with our scenario? What had happened here was that there was a new policy in place that had modified the Users group on the new systems. Authenticated Users and the NT AUTHORITY\INTERACTIVE account had been removed from

http://technet.microsoft.com/en-us/library/cc709628.aspx

http://technet.microsoft.com/en-us/library/cc709628.aspx

the group. We discovered this from the output of a GPRESULT scan. Under the section titled “The User is part of the following security groups”, neither NT AUTHORITY\Authenticated Users, nor NT AUTHORITY\INTERACTIVE was listed. A comparison of this result to a working system verified that this was the problem. When a domain user tried to log on, since they were not directly part of the the Users group had no permissions to … well, anything. Remember that even for administrative accounts on Windows Vista, a split token is created when UAC is turned on. If UAC is disabled, an administrative account only has a full privilege access token.

After determining the issue, the resolution itself was relatively straightforward – put Authenticated Users and NT AUTHORITY\INTERACTIVE back into the Users group and all was well. I know that seems like a bit of an anti-climax, but oftentimes the strangest problems have some fairly simple solutions. Take care!

The Case of the Randomly Launching Internet Explorer Processes

A while ago, I got the opportunity to work on an interesting case where the customer’s Explorer process was showing a continuous increase in handle count. Using Process Explorer we could see that these handles were open to various Iexplore.exe processes, which were showing as terminated. Interestingly however, these Iexplore.exe processes were not being started by any user. They seemed to get created randomly, about one every half hour and almost immediately showing up as a terminated process handle under Explorer.exe.

So what was causing these processes to be launched? Putting these processes under a debugger with a breakpoint set on CreateProcess was an option, however we did not have access to the server and getting internet access on the server would be difficult. So I thought of giving Process Monitor a try. The idea was to get log captured for the processes Iexplore.exe and Explorer.exe for the operations process create, process start, and thread create. Also, we wanted to ensure that when we leave this running, Process Monitor did not fill up the pagefile, which is used as the default backing file.

So we did the following:

1. Launched Process Monitor with the following syntax “procmon /backingfile:E:\processlaunch.pml”

2. In the Filters menu, checked the option “Drop Filtered Events”.

3. Set filters for processes Explorer.exe and Iexplore.exe and also for operations process create, process start and thread create.

With this done, we let the server run for a couple of hours and got the logs. Here’s what we saw.

Now, looking at the thread stack for Explorer, process create, we see unknown module, with addresses 0x10003d2f,0x10002298,0x10002629.

First off 0x10000000 converts to 268435456. This is essentially greater than the 2 GB user mode, virtual address space limit. The box was running with the /3GB switch, so this is a valid user mode address; however Explorer.exe and Iexplore.exe are not /LargeAddressAware, which definitely looks suspicious.

http://blogs.technet.com/b/askperf/archive/2007/03/23/memory-management-demystifying-3gb.aspx

http://blogs.technet.com/b/askperf/archive/2007/03/23/memory-management-demystifying-3gb.aspx

http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx

http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx

http://blogs.technet.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-54-33-metablogapi/8666.clip_5F00_image002_5F00_2.jpg

Now looking at the stack information of the thread creation of Iexplore.exe we see the following:

Seems we have a binary, Linkinfo.dll, and its loading from %windir% directory. Now the file name seems genuine, however a legitimate version of a system file like Linkinfo.dll is supposed to be loaded from the System32 directory and not the %windir% directory. Also the box we were working with was running Windows Server 2003, and Windows Server 2003 file versions start as 5.2.3790.xxxx. This in combination with the load address of 0x10000000 makes this look out of the ordinary.

Doing a Bing search on Linkinfo.dll in the %windir% directory led me to this link:

http://www.bing.com/

http://blogs.technet.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-54-33-metablogapi/2273.clip_5F00_image003_5F00_2.png

http://blogs.technet.com/cfs-file.ashx/__key/CommunityServer-Blogs-Components-WeblogFiles/00-00-00-54-33-metablogapi/1780.clip_5F00_image004_5F00_2.png

http://www.microsoft.com/security/portal/Threat/Encyclopedia/Entry.aspx?Name=Virus:Win32/Almanahe.B

Running a free Onecare online scan from the following link confirmed this, and was successful cleaning this up.

The Curious Case of Event ID: 56 with Source TermDD

Hi folks! It’s been a long time since I wrote the Terminal Services and Graphically Intensive Applications post. Today’s post is a short one; we will be discussing a curious case of Event ID: 56 on Windows Server 2008/R2 with the Remote Desktop Services Role. The clients were being disconnected by the server and the following error was generated:

Log Name: System Source: TermDD Event ID: 56 Level: Error Description: The Terminal Server security layer detected an error in the protocol stream and has disconnected the client. Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="TermDD" /> <EventID Qualifiers="49162">56</EventID> <Level>2</Level> <Task>0</Task> <Keywords>0x80000000000000</Keywords> <TimeCreated SystemTime="" /> <EventRecordID></EventRecordID> <Channel>System</Channel> <Computer> </Computer> <Security /> </System> <EventData> <Data>\Device\Termdd</Data> <Data></Data> <Binary>00000400010000000000000038000AC00000000038000AC000000000000000000000000000000000840100D0</Binary> </EventData> </Event>

This happened in conditions of heavy traffic to the server along with large client packets (i.e. lot of input activity on the client). As a result, the data stream gets corrupted and the TS server disconnects the client.

To track this down, I looked at the binary data attached to the event. The last DWORD is the error code is converted to an HRESULT.

For example if you have the following binary data attached to the event…

<Binary>00000400010000000000000038000AC00000000038000AC000000000000000000000000000000000840100D0</Binary>

…we first take the last 4 bytes: 840100D0.

You first have to reverse the byte order to get a readable error code. You don’t reverse the whole thing, you reverse each byte pair individually. So, D0 moves to the front, followed by 00 etc. After

http://www.microsoft.com/security/portal/Threat/Encyclopedia/Entry.aspx?Name=Virus:Win32/Almanahe.B

reversing you’ll get this: D0000184. To make it even messier, the D is actually a result of converting an NTSTATUS code into HRESULT, so we then have to replace it with C (Normally HRESULT would start with “8”). Thus, you need to replace “D” with “C”.

Finally we now have and NTSTATUS error of C0000184. You can look up this error code using something like Err.exe and and get STATUS_INVALID_DEVICE_STATE.

This most likely indicates that server was trying to send data to the client after the connection was broken. It does not tell us why the connection was broken. Additional codes might be more informative:

C00000B5 - STATUS_IO_TIMEOUT - the connection has timed out.

C000006F - STATUS_INVALID_LOGON_HOURS - The user account has time restrictions and may not be logged onto at this time.

80090330 - SEC_E_DECRYPT_FAILURE – the data on the wire got corrupted

To decipher the codes, you can download Err.exe from:

http://www.microsoft.com/downloads/details.aspx?familyid=be596899-7bb8-4208-b7fc-09e02a13696c&displaylang=en

113996 INFO: Mapping NT Status Error Codes to Win32 Error Codes

http://support.microsoft.com/default.aspx?scid=kb;EN-US;113996

Another way to troubleshoot the error is more inclined towards the driver development community, which is to use Windows Software Trace Preprocessor (WPP) to trace a driver's operation; it enhances WMI event tracing by adding conventions and mechanisms that simplify tracing a driver's operation. It is an efficient mechanism for user-mode applications and kernel-mode drivers to log real-time binary messages. The logged messages can subsequently be converted to a human-readable trace of the driver's operation.

The Case of the Mysterious Driver

The other day I used Process Explorer to examine the drivers loaded on a home system to see if I’d picked up any Sony or Starforce-like digital rights management (DRM) device drivers. The DLL view of the System process, which reports the currently loaded drivers and kernel-mode modules (such as the Hardware Abstraction Layer – HAL), listed mostly Microsoft operating system drivers and drivers associated with the DVD burning software I have installed, but one entry, Asctrm.sys caught my attention because its company information is “Windows (R) 2000 DDK provider”:

http://www.star-force.com/

http://www.sysinternals.com/utilities/processexplorer.html

This is the company name included in the version information of drivers that have been based on sample code from the Windows 2000 Device Driver Kit (DDK) and it’s obviously unusual to see it in production images. The driver’s description is equally unenlightening: “TR Manager”. My suspicions aroused, I set about investigating.

My first step was to right-click on the entry and “Google” the driver image name. The resulting Google search reveals that others have this driver and that in some cases it had been identified as the cause of system crashes, but although several spyware databases have entries for it, none of the ones I checked conclusively tied the driver with an application or vendor.

I next looked for clues in the image itself by double-clicking on the driver entry in the DLL view to open the Process Explorer DLL properties dialog. The image page revealed nothing of interest other than the fact that the driver had been linked in December of 2004. I turned my attention to the Strings tab to look for some hint as to the driver’s reason for existence. None of the few intelligible strings Process Explorer found in the image were unique except for the last one:

http://www.google.com/search?ie=UTF-8&oe=UTF-8&q=asctrm.sys

http://www.google.com/search?ie=UTF-8&oe=UTF-8&q=asctrm.sys

When a driver compiles the linker stores the path to the debug information file it generates, which has the extension .pdb, in the image. The path in this case appears to include the name of a company, “AegiSoft”. However, the http://www.aegisoft.com/ web site describes Aegis Software, Inc. as a company that creates “powerful, sophisticated and easy to use trading software and services for financial companies that demand performance, robustness, availability, and flexibility.” That doesn’t sound like a company that ships device drivers.

On a whim I did a Google search of “aegis” and came across this January 2001 news item announcing RealNetworks’ acquisition of Aegisoft Corp. (notice the difference in name from Aegis Software, Inc.). I knew I had RealPlayer installed on the system so I ran RealPlayer and confirmed that it uses the driver by doing a handle search for “asctrm”, the name of the device object I had seen in one of the driver’s strings:

Newer versions of RealPlayer don’t appear to include a device driver, but I have an old version on this system. I haven't gotten new release notifications because after installing RealPlayer I always use Autoruns to delete the HKLM\Software\Microsoft\Windows\CurrentVersion\Run item that the RealPlayer setup creates to launch the Real Networks Scheduler at each boot. That Run entry, incidentally, is “TkBellExe”, another misleading label.

So the driver is not malicious after all (but is related to DRM, so agreement with that view depends on your feelings about DRM), however this example highlights the need for all software vendors (Microsoft included!) to clearly identify their applications and drivers in their version resources and in any associated Registry keys or values.

The Case of the Periodic System Hangs

http://dc.internet.com/news/article.php/560401

http://www.aegisoft.com/

A few months ago I began experiencing periodic system freezes of about a second where even my mouse would pause during a movement. Needless to say, this became very annoying very quickly. A few minutes with Process Explorer, however, and I not only determined the cause, but came up with a fix.

One apparent clue as to the cause of the hangs was that I only experienced the freezes when I had the beta release of VMWare 5 running. That fact alone wasn’t enough to blame VMware for the spikes and in any case my reliance on VMware prevented a workaround of simply not using it. I therefore wanted to determine if VMware was really the cause and so the first step of my investigation was to look at the system’s overall CPU history in Process Explorer’s System Information dialog. I clicked on the mini-CPU history chart in the Process Explorer toolbar and in the larger graph confirmed a CPU spike every 10 seconds. I moved the mouse over a spike and the graph’s tooltip reported the System process as the major contributor to CPU usage at the time of the spike:

The System process serves as the host process to operating system worker threads, such as the modified and mapped page writer threads, as well as dedicated device driver threads, so my investigation wasn’t complete: I needed to look inside the System process to see what thread or threads were responsible for the spikes. To do that I double-clicked on the System process to open its Process Properties dialog and selected the threads tab. I pressed the space bar to pause Process Explorer’s refresh updates and at the next CPU spike (it might have take a two or three tries to get the timing right) I pressed F5 to cause a manual refresh:

Two threads contributed the majority of CPU usage. One had a start address in Http.sys and the other in Ftser2k.sys. Http.sys was introduced in Windows XP to serve as an in-kernel Web server accelerator that can serve cached content directly from kernel-mode. I didn’t know what Ftser2k.sys was so I clicked on the Module button to view the file properties for the driver’s file and saw that it describes itself as the “FTDIBUS Serial Device Driver”. This wasn’t very helpful so I investigated some more and found that it’s a driver that provides a virtual serial port interface for USB devices so that applications that aren’t USB-aware can interact with USB devices. I had recently installed XM Satellite Radio’s XM PC Player (which XM has since discontinued) and suspected that it was the application that required the Ftserv2k driver. Closing the XM Player resulted in Ftser2k’s CPU usage dropping to 0, confirming

http://msdn.microsoft.com/msdnmag/issues/02/03/IIS6/default.aspx

that it was the application using the driver’s services. However, I continued to experience CPU spikes in Http.sys.

My attention therefore turned to Http.sys. Since Http.sys simply implements a cache I theorized that IIS, which I run on my Windows XP system to host the Sysinternals staging site, would function even when Http.sys wasn’t present. I opened a command-prompt and typed “Net stop http” to stop the driver and was informed that several dependent services would also stop if I stopped the driver, but answered affirmatively anway:

Within a few seconds I verified that that stopping the driver ended the CPU spikes. I spent a few more minutes testing IIS and VMware to make sure that the driver’s absence had no adverse effect and came to the conclusion that the system was functioning fine without it. My next step was therefore to disable the driver permanently. I opened the Device Manager, selected “Show Hidden Devices” in the View menu, navigated to HTTP in the “Non-Plug and Play Drivers” node and double-clicked to open its properties dialog. On the Driver tab I changed its startup type from “Demand” to “Disabled”, which would prevent the driver from starting the next time the system booted (note that I could have just as easily navigated in Regedit to HKLM\System\CurrentControlSet\Services\Http and changed the Start value from 3 to 4).

My CPU spiking mystery was solved. I might have spent time trying to determine why Http.sys was causing the spikes in the first place, but since they only occurred when the beta of VMware 5 was running it obviously had something to do with VMware’s networking subsystem. Since I wasn’t using any applications that required Http’s services I turned my attention back to the work I had interrupted with the investigation. I recently re-enabled Http.sys with the final release of VMware 5 running and the spikes no longer occur.

So why do I use VMware instead of Virtual PC, even for my presentations at Microsoft and Microsoft conferences like TechEd? I can answer with one word: snapshots. Snapshots are a VMware feature that allows you to save the state of a VM, make modifications to the VM and later return its state to that of the snapshot. I create a baseline “clean” operating system installation snapshot and perform experiments, including installing software, and can always restart with the clean installation.

VMware 5 takes snapshots a step further by introducing snapshot “trees”. Using snapshot trees I can start with my clean OS snapshot, get the VM into a state that demonstrates an interesting behavior, and then create another snapshot. I make more modifications and snapshot again, or go back to the original clean snapshot and take the VM in a different direction to make another branch of the snapshot tree.

In my malware talk, for example, I use two snapshots to demonstrate how RootkitRevealer can detect the HackerDefender rootkit. I start with the clean OS snapshot, copy HackerDefender to the system, open an Explorer window to the files and create a new snapshot. Then I activate HackerDefender and run RootkitRevealer through its scan where it reports the presence of the cloaked HackerDefender files and Registry keys. Then I snapshot again. During the presentation I resume the first snapshot and show how HackerDefender’s files disappear when I activate the HackerDefender executable. A little later I resume the second snapshot and display RootkitRevealer’s ability to detect the cloaked files, but without making the audience wait through a scan, something which takes several minutes.

The Case of the Delayed Windows Vista File Open Dialogs

http://www.microsoft.com/events/EventDetails.aspx?CMTYSvcSource=MSCOMMedia&Params=~CMTYDataSvcParams%5E~arg+Name%3D%22ID%22+Value%3D%221032274950%22%2F%5E~arg+Name%3D%22ProviderID%22+Value%3D%22A6B43178-497C-4225-BA42-DF595171F04C%22%2F%5E~arg+Name%3D%22lang%22+Value%3D%22en%22%2F%5E~arg+Name%3D%22cr%22+Value%3D%22US%22%2F%5E~sParams%5E~%2FsParams%5E~%2FCMTYDataSvcParams%5E

I was in Barcelona a couple of weeks ago speaking at Microsoft’s TechEd/ITForum conference, where I delivered several sessions (two, Advanced Malware Cleaning and Windows Vista Kernel Changes earned the top #1 and #2 rated breakout sessions for the week - you can see an interview of me at the conference here). The conference was a huge success and Windows Vista, which I had taken on the road for the first time, performed great. However, as I was running through some demos before one of my sessions, I noticed that the file open dialog, which is common to all Windows applications, would often take between 5 and 15 seconds to appear.

I didn’t have time to investigate before my talk, so the delays caused me consternation when they showed up during my Windows Vista Kernel Changes session immediately afterward. The behavior felt uncannily like the one I wrote up a few blog posts ago in The Case of the Process Startup Delays. In that case, Windows Defender’s Remote Procedure Call (RPC) communications during process startup tried to contact a domain controller, which resulted in hangs when the system was disconnected from its domain. I mumbled excuses on behalf of Windows Vista and tried to distract the audience by explaining the subsequent demonstrations.

It wasn’t until the plane ride home that I got a chance to look into it. I followed steps similar to the ones I had when I explored the Windows Defender hangs. I launched Notepad from within Debugging Tools for Windows’ Windbg tool, typed Ctrl+O to open the File Open dialog, and when I got the hang broke in and looked at the stack of Notepad’s main thread:

If you haven’t seen a stack before, it’s a history from most recent to least of nested functions called by a thread. You read it from bottom to top, so the stack shows that Notepad had loaded Browseui.Dll and called its CAddressBand::SetNavigationState function. That function called CBreadcrumbBar::SetNavigationState, which called CBreadcrumbBar::SetIDList, and so on.

A look at the function names on the stack immediately told me what was happening: when you access the Open dialog the first time within an application it navigates to your documents folder. On Windows Vista my folder is C:\Users\Markruss\Documents, but the shell wants to make the path in the dialog’s new “bread crumb” bar pretty by displaying it as “Mark Russinovich\Documents”, and so it calls GetUserNameEx to lookup my account’s display named

as it’s stored in my User object in Active Directory. I confirmed my theory by verifying that the first parameter SHGetUserDisplayName passes to GetUserNameEx, which is interpreted as the EXTENDED_NAME_FORMAT enumeration, is 3: NameDisplay.

I set a breakpoint on the call’s return and hit it after the delay completed. GetUserNameEx returned the ERROR_NO_SUCH_DOMAIN error code, and stepping through SHGetUserDisplayName revealed that it falls back to calling GetUserName. Instead of looking up the user’s display name, that function just obtains the Security Identifier (SID) of the user from the process token (the kernel data structure that defined the owner of a process) and calls LookupAccountName to translate the SID to its account name, which in my case is simply “markruss”. Thus, the dialog that appeared looked like this:

As opposed to this, which is what I saw when I got back to the office and connected to the corporate network:

I had solved the case, but was curious to know where exactly the delay was taking place and so continued by researching what was happening on the other end of the Secure32!CallSPM call that’s on top of the stack listing. I knew that the Local Security Authority (LSASS) process is responsible for authentication, including interactions with domain controllers and account name translations, so I attached Windbg to the Lsass.exe process (make sure that you detach the debugger from LSASS before exiting with the “qd” command, otherwise LSASS will terminate and the system will begin a 30-second shutdown). I figured that Secur32.Dll acts like both a client and server and confirmed that it was loaded into LSASS, but I needed to determined the server-side function that corresponds to Secur32!SecpGetUserName. I did so by brute force: I dumped the functions implemented by Secur32.Dll and looked for ones with “name” in them:

I set breakpoints on several of them and when I reproduced the delay I hit the one on SecpGetUserName and stepped through it to eventually get to this stack:

The DsGetDcName function is documented as returning the name of a domain controller in the specified domain. SecpTranslateName obviously need to find a domain controller to which to send the account display name query. I traced further, and discovered that LSASS caches the result of the lookup for 45 seconds, which explained why I didn’t see the delay if I ran a different application and accessed the File Open dialog immediately after getting a delay. Then I hit a temporary dead-end when Netapi32!DsrGetDcNameEx2 executed a RPC request.

Again, figuring that Netapi32 acts like a client and a server, I dumped its symbols and set breakpoints on functions containing “dc”. I let LSASS continue executing and to my surprise hit the exact same function, Netapi32!DsrGetDcNameEx2. I traced into the call deeper and deeper until the thread finally called into the kernel (Ntdll!KiFastSystemCallRet):

I was close to the end of my investigation. The last question I had was what device driver was Netlogon calling to send a browser datagram? I answered this by looking at the first parameter it passed to NlBrowserDeviceIoControl, which I guessed was a handle to a file object. Then I opened Windbg in Local Kernel Debugging mode (note that on Windows Vista you have to boot in debugging mode to do this), which lets you look at live kernel data structures, and dumped the handle’s information. That showed me the device object that was opened, which told me that the driver is Bowser.sys, the “NT Lan Manager Datagram Receiver Driver”:

I thought my investigation was complete, but when I later tried to reproduce the delays I failed. I retraced my footsteps and found that LsapGetUserNameForLogonSession caches the display name for 30 minutes. Further, an account’s display name is cached with cached credentials so you won’t experience the delays for the first 30 minutes after logging in or disconnecting from the corporate network. I confirmed that by waiting 30 minutes and reproducing the hangs.

My investigation had come to a close. I had determined that Windows Vista’s File Open dialog tries to look up a user’s display name for the “bread crumb” bar when showing the documents folder and in the process tries to locate a domain controller by sending a Lan Manager datagram via the Bowser.sys device driver. I also knew that there’s no workaround for the delayed dialogs and that anyone that has a domain joined system that’s not connected to their domain will experience the same delays - at least until Windows Vista Service Pack 1.

The Case of the System Process CPU Spikes

As you’ve probably surmised by my blog posts and other writings, I like knowing exactly what my systems are doing. I want to know if a process is running away with the CPU, causing memory pressure, or hitting the disk. Besides keeping my computers running smoothly, my vigilance sometimes helps me spot performance and reliability problems in Windows and third-party code.

The main way I keep tabs on things is to configure Process Explorer to run automatically when I log in. Whenever I configure a new computer, I add a shortcut to Process Explorer to my profile’s Start directory that includes the /t (minimize) switch. Process Explorer runs otherwise hidden with tray icon that shows a small historical view of CPU activity level. Because I want access to detailed information about system processes, as well as my own, I also specify the /e option on Vista, which causes Windows to present a UAC prompt on logon that allows me to grant Process Explorer administrative rights.

Because I keep an eye out for CPU spikes in Process Explorer’s tray icon, which show up as green or red for user-mode (application) and kernel-mode (operating system and drivers) CPU usage, respectively, I’ve identified several application bugs over the last few months. In this post, I’ll share how I used both Process Explorer and another tool, Kernrate, to identify a problem with a third-party driver and followed the problem through to a fix by the vendor.

Not long after I got a new laptop several months ago, I noticed that the system sometimes felt sluggish. Process Explorer’s tray icon corroborated my perception by displaying a mini-graph of red CPU activity. The icon opens a tooltip that reports the name of the process consuming the most CPU when you move the mouse over it, and in this case the tooltip showed the System process as being responsible:

The first few times I noticed the problem, it resolved itself shortly after and I didn’t have a chance to troubleshoot. However, I could see by opening Process Explorer’s System Information dialog that the CPU spikes were significant:

The System process is special because it doesn’t host an executable image like other processes. It exists solely to host operating system threads for the memory manager, cache manager, and other subsystems, as well as device driver threads. These threads execute entirely in kernel mode, which is why System process CPU usage shows up as red in Process Explorer’s graphs.

I suspected that a third-party device driver was the cause of the problem, so the first step in my investigation was to figure out which thread was using CPU, which would hopefully point me at the guilty party. I watched vigilantly for signs of trouble every time I switched networks and jumped the first time I saw one. Process Explorer shows the threads running in a process on the Threads page of the Process Properties dialog, so I double-clicked on the System process and switched to the Threads page the next time I noticed the CPU spike:

The “ntkrnlpa.exe” prefix on each thread’s start address identified the ones I saw at the top of the CPU usage sort order as operating system threads (Ntkrnlpa.exe is the version of the kernel loaded on 32-bit client systems that have no execute memory protection or server systems that need to address more than 4GB of memory). Because I had previously configured Process Explorer to retrieve symbols for operating system images from the Microsoft public symbol server, the thread list also showed the names of the thread start functions. The most active threads began in the ExpWorkerThread function, which means that they were worker threads that perform work on behalf of the system and device drivers. Instead of creating dedicated threads that consume memory resources, the system and drivers can throw work at the shared pool of operating system worker threads.

Unfortunately, knowing that worker threads were causing the CPU usage didn’t get me any closer to solving identifying a root cause. I really needed to know what functions the worker threads were calling, because the functions would be inside the device driver or operating system component on whose behalf the threads were running. One way to look inside a thread’s execution is to look at the thread’s stack with Process Explorer. The stack is a memory region that stores function invocations and Process Explorer will show you a thread’s stack when you select the thread press the Stack button or double-click on the thread. On Vista, however, you get this error when you try and look at the stack for threads in the System process:

The System process is a special type of process on Vista called a “protected process” that doesn’t allow any access to its threads or memory. Protected processes were introduced to support Digital Rights Management (DRM) so that hi-definition content providers can store content encryption keys with a reduced risk of an administrative user using DRM-stripping tools to reach into the process and read the keys.

That approach foiled, I had to find another way to see what the worker threads were doing. For that, I turned to KernRate, a command-line profiling tool that’s a free download from Microsoft. KernRate can profile user-mode processes and kernel-mode threads. It uses the sample-based profiling facility that was introduced in the first release of Windows NT, which records the unique addresses at which the CPU is executing when the profiling interval timer fires. When you stop a profile capture, Kernrate retrieves the information from the kernel, maps the addresses to the loaded device drivers into which the fall, and can even use the symbol engine to report the names of functions.

I wouldn’t need symbols if the trace identified a device driver, so I ran Kernrate without passing it any arguments. Despite the fact that there’s no officially supported version of Kernrate for Vista, the version for Windows XP, Kernrate_i386_XP.exe, works on Vista 32-bit (you can also use the recently-released xperf tool to perform similar profiling - xperf requires Vista or Server 2008, but works on 64-bit versions). I let the profile run through heavy bursts of CPU and then hit Ctrl+C to print the results to the console window:

In first place were hits in the kernel, but in second was a driver that I didn’t recognize, b57nd60x. Most driver files are located in the %systemroot%\system32\drivers directory, so I could have opened that folder and viewed the file’s properties in Explorer, but I had Process Explorer open so a quicker way to check the driver’s vendor and version was to open the DLL view for the System process. The DLL view shows the DLLs and files mapped into the address

space of user-mode processes, but for the System process it shows the kernel modules, including drivers, loaded on the system. The DLL view revealed that the driver was for my laptop’s NIC, was from Broadcom, and was version 10.10:

Now that I knew that the Broadcom driver was causing the CPU usage, the next step was to see if there was a newer version available. I went to Dell’s download page for my system, but didn’t find anything. Suspecting that what I noticed might not be a known issue, I decided to notify Broadcom. I used contacts on the hardware ecosystem team here at Microsoft to find the Broadcom driver representative and email him a detailed description of the symptoms and my investigation. He forwarded my email to the driver developer, who acknowledged that they didn’t know the cause and within a few days sent me a debug version of the driver with symbols so that I could capture a Kernrate profile that would tell them what functions in the driver were active during the spikes. The problem reoccurred a few days later and I sent back the kernrate output with function information.

The developer explained that my trace revealed that the driver didn’t efficiently interact with the PCIe bus when processing specific queries and the problem seemed to be exacerbated by my particular hardware configuration. He gave me new driver for me to try and after a few weeks of monitoring my laptop closely for issues, I confirmed that the problem appeared to be resolved. The updated driver has not yet been posted to Dell’s support site, but I expect it to show up there in the near future. Another case closed, this time with Process Explorer, Kernrate, and a helpful Broadcom driver developer.

If you like these troubleshooting blog posts, you’ll enjoy the webcast of my “Case of the Unexplained…” session from TechEd/ITforum. Its 75 minutes are packed with real-world troubleshooting examples, including the one written up in this post and others, as well as some that I haven’t documented. At the end of the session I ask the audience to send me

screenshots, log files and descriptions of their own troubleshooting success stories, in return for which I’ll send back a signed copy of Windows Internals. The offer stands, so remember to document your investigation and you can get a free book. I’ve gotten a number of great examples and my next blog post will be a guest post by someone that watched the webcast and used Process Monitor to solve a problem with their web server.

Finally, if you want to see me speak live, come to TechEd US/IT Pro in June in Orlando where I’ll be delivering “The Case of the Unexplained…”, “Windows Server 2008 Kernel Advances”, and “Windows Security Boundaries”. Hope to see you there!

The Case of the Failed File Copy

The other day a friend of mine called me to tell me that he was having a problem copying pictures to a USB flash drive. He’d been able to copy over two hundred files when he got this error dialog, after which he couldn’t copy any more without getting the same message:

Unfortunately, the message, “The directory or file cannot be created”, provides no clue as to the underlying cause and the dialog explains that the error is unexpected and does not suggest where you can find the “additional help” to which it refers. My friend was sophisticated enough to make sure the drive had plenty of free space and he ran Chkdsk to check for corruption, but the scan didn’t find any problem and the error persisted on subsequent attempts to copy more files to the drive. At a loss, he turned to me.

I immediately asked him to capture a trace with Process Monitor, a real-time file system and registry monitoring tool, which would offer a look underneath the dialogs to reveal actual operating system errors returned by the file system. He sent me the resulting Process Monitor PML file, which I opened on my own system. After setting a filter for the volume in question to narrow the output to just the operations related to the file copy, I went to the end of the trace to look back for errors. I didn’t have to look far, because the last line appeared to be the operation with the error causing the dialog:

To save screen space, Process Monitor strips the “STATUS” prefix from the errors it displays, so the actual operating system error is STATUS_CANNOT_MAKE. I’d never seen or even heard of this error message. In fact, the version of Process Monitor at the time showed a raw error code, 0xc00002ea, instead of the error’s display name, and so I had to look in the Windows Device Driver Kit’s Ntstatus.h header file to find the display name and add it to the Process Monitor function that converts error codes to text.

At that point I could have cheated and searched the Windows source code for the error, but I decided to see how someone without source access would troubleshoot the problem. A Web search took me to this old thread in a newsgroup for Windows file system developers:

Sure enough, the volume was formatted with the FAT file system and the number of files on the drive, including those with long file names, could certainly have accounted for the use of all available 512 root-directory entries.

I had solved the mystery. I told my friend he had two options: he could create a subdirectory off the volume’s root and copy the remaining files into there, or he could reformat the volume with the FAT32 file system, which removes the limitation on entries in the root directory.

One question remained, however. Why was the volume formatted as FAT instead of FAT32? The answer lies with both the USB drive makers and Windows format dialog. I’m not sure what convention the makers follow, but my guess is that many format their drives with FAT simply because it’s the file system guaranteed to work on virtually any operating system, including those that don’t support FAT32, like DOS 6 and Windows 95.

As for Windows, I would have expected it to always default to FAT32, but a quick look at the Format dialog’s pick for one of my USB drives showed I was wrong:

I couldn’t find the guidelines used by the dialog anywhere on the Web, so I looked at the source and found that Windows defaults to FAT for non-CD-ROM removable volumes that are smaller than 4GB in size.

I’d consider this case closed, but I have two loose ends to follow up on: see if I can get the error message fixed so that it’s more descriptive, and lobby to get the default format changed to FAT32. Wish me luck.

The Case of the Notepad that Wouldn't Run

Dave Solomon was on campus a couple of weeks ago presenting a Windows internals seminar to Microsoft developers. Before I joined Microsoft I taught the classes here at Microsoft with him, but now with my other responsibilities here I step into the class and guest present a module or two if my schedule permits. This time I presented the security module, which describes logon (authentication) and the access check (authorization) model. It also includes a separate section on Vista’s User Account Control (UAC) feature, which consists of several technologies including virtualization and a new Mandatory Integrity Control (MIC) security model that’s layered on top of the existing Discretionary Access Control model that Windows NT introduced in its first release.

UAC allows for users, even administrators, to run as standard users most of the time, while giving them the ability to run executables with administrator rights when necessary. There are several mechanisms by which executables can trigger a request for administrator rights:

1. If the executable image includes a Vista manifest file that specifies a desire or need for administrator rights (this would be added by the developer who creates the image).

2. If the executable is in Vista’s application compatibility database as a legacy application that Microsoft has identified as requiring administrator rights to run correctly.

3. If the user explicitly requests an elevation using Explorer’s “Run as administrator” menu item in the context menu for executables (also can be set as an advanced shortcut property). Note that this does not run the executable under the Administrator account, but rather under the account of the logged in user, but with the Administrator group enabled in the process security token.

4. If the executable is determined to be a setup or installer program (for example, if the word “setup” or “update” is in the image’s name).

Perhaps the most common need for administrator rights comes from setup programs, which generally can’t install properly without write access to HKLM\Software and \Program Files, two locations that only administrators can modify. As an ad-hoc demonstration of the last request method, during the presentation I copied \Windows\Notepad.exe to my account’s profile directory, renaming it to Notepad-setup.exe in the process. Then I launched it, expecting to see a Consent dialog like the one below ask me to grant the renamed Notepad administrative rights:

To my consternation, no such dialog appeared. In fact, nothing happened. I reran it and got the same result. I was thoroughly confused, but didn’t have time to investigate in front of the class, so I moved on.

When I later got a chance to investigate what had happened, I started Notepad-setup.exe using Windbg (part of the free Debugging Tools for Windows) by clicking “File->Open Executable” followed by “Debug->Go” (or you can press F5). I then stepped through the initial instructions of Notepad’s entry point, Winmain. I saw it call an initialization function named NPInit that invokes LoadAccelerators to load Notepad’s keyboard accelerators. Strangely, LoadAccelerators was failing, causing NPInit to return an error to Winmain and Notepad to silently exit. But why would Notepad fail to load its accelerators, which should be included in the Notepad image itself?

My next step was to see if the file’s name was somehow causing the different behavior so I tried running a copy of Notepad.exe with the original name from my user directory, but got the same behavior (or lack thereof). It was time to watch what was happening with Filemon.

This scenario called for logging the operation of Notepad’s successful execution and comparing that to the log of the failing execution. I started Filemon, set the Include filter to Notepad.exe and the Exclude

filter to list the processes that reference Notepad’s image when Notepad launches, including Svchost (where the prefetcher runs) and Explorer (which I was using to launch Notepad):

I collected both traces, but before I could compare them I had to remove the columns that are always different in different execution traces: Sequence, Timestamp, and Process. To do this I loaded the traces into Excel, selected the data in the first three columns, deleted it, and saved the traces back out as tab-demitted text. You can get the two trace files here.

There are a number of text comparison tools available, but one that’s both free and that serves the needs of this type of comparison is Microsoft’s Windiff. Simply open both files and red and yellow lines highlight differences.

The first few lines that Windiff flags are Notepad reading its prefetch file, which has a different name in each trace because the name encodes the full path of the Notepad image it is associated with in a hash number:

The next set of differences are operations present only in the successful run of Notepad, and appear to be queries of some kind of global Windows resource cache that’s new to Windows Vista:

It wasn't clear to me why one run references the cache and the other doesn’t, so I continued to scan through the differences. The next group of differences are at lines 47-51 and are simply due to the different paths of the two Notepad copies:

Finally, at line 121 I came across something that looked like it might be the source of the problem:

The execution of \Windows\Notepad.exe successfully reads a file named Notepad.exe.mui from the \Windows\En-us subdirectory. Further, at line 172 in the trace comparison the failed launch of Notepad tries to read a file of the same name from an En-us subdirectory, but fails because the subdirectory doesn’t exist:

I knew that .mui files store language-dependent resources like strings and accelerators, so I was pretty certain that Notepad’s failure to load its accelerators was due to its inability to find the appropriate resource file for my local, US English (En-us). To verify this I made an En-us subdirectory in my profile directory and copied Notepad.exe.mui into it, reran Notepad from my directory, and it worked.

Previous versions of Windows used .mui files to separate language-specific data from executables, but didn’t know that in Windows Vista this capability is exposed for applications to use. The nice thing about the .mui support is that resource-related functions like LoadAccelerators and FindResourceEx do the magic of the language-specific resource files so application developers don’t need to do anything special coding to take advantage of it.

Now that I had Notepad working outside of the Windows directory I turned my attention to why I hadn’t been presented with a UAC Consent dialog asking me to give it permission to run with administrator rights. What I discovered empirically and then confirmed later in the Understanding and Configuring User Account Control in Windows Vista article on Microsoft.com, is that heuristic setup detection only applies to files that don’t have an embedded manifest that specifies a security TrustLevel. Notepad, like all the Windows executables in Windows Vista, does include a manifest. You can see it when you do a dump of Notepad’s strings with the Sysinternals Strings utility:

So, thanks to Filemon, the case of the Notepad that wouldn’t run was closed!

Technical Skills & experience: Minimum • Basic hardware and peripheral troubleshooting • Installation configuring & troubleshooting Windows 2000, Windows XP, Windows 2003 and Windows Vista in standalone and server environment • Backup, Storage and Recovery • Deployment of servers, VSS, Raid Configuration • Basic understanding of DNS, DHCP, WINS, TCP/IP, Routing, Antivirus, Firewalls etc. • Excellent Troubleshooting skills

For PERF

Strong Experience in any of the following technologies:

· Strong Experience in System Performance tuning & server hang related issues

· Terminal Server, Windows Installer Service, Print Servers etc.

· IE6 and IE7