<< Previous Tip > Tips Table < Next Tip >>

Avanti Product Banner

Abnormal End (ABEND) and
Debugger Information




NetWare is designed to protect itself from most cases of kernel corruption. It accomplishes this level of protection by trapping most hardware related errors, its internal allocation processes, and verifying the parameters passed to critical internal functions. If NetWare detects a situation which could present possible risk to the kernel, it will ABnormally END (ABEND) the currently running process and display a screen similar to the following:
              System halted [date and time]
              Abend: [reason (Error code #)]
              
      OS version: [description string] Running Process: [name of the current process] Stack: [hex dump] .. .. .. .. .. .. .. .. ..
    Press "Y" to copy diagnostic image to diskette. Otherwise Power off and back on to restart.

Due to its non-preemptive task switching design, NetWare can often isolate and identify the specific source of the error. However, it will occasionally mis-identify the source due to the pseudo preemption which takes place during certain I/O operations. For these situations, and just to generally confirm any suspicions, it can prove useful to initiate NetWare's internal debugger and collect additional information.

Pressing the Left Shift, Right Shift, Alt, and Escape keys simultaneously (or entering '386debug' at an ABEND screen) will activate the Debugger. Before entering the internal debugger, the exact ABEND error information should be recorded, including the first line of the Stack hex dump information since it may prove useful to someone providing NetWare kernel support. Vary rarely will it be necessary "to copy diagnostic image to diskette" since that process can only be reviewed by the most knowledgeable of engineers at Novell and is rarely asked for by Novell's technical support engineers.

(Note: If a diagnostic image is desired, be prepared to insert an unformatted, high density diskette for each Mbyte of installed RAM. This process may take some time . . . and patience.)

At this point, it is safe to collect more information. To enter the internal debugger, a relatively simple, albeit convoluted, combination of keys must be pressed on the Server keyboard. By pressing down and holding both the left and right shift keys, plus the left Alt key and the Escape key, the Server should enter the internal debugger. The user should see the ABEND error string, a set of registers (EAX, EBX, etc.), and a '#' prompt.

To narrow the cause of the problem further, enter 'i61'. NetWare should respond with a hex value which can be interpreted as follows:

              i61 return value         Possible conflict
              ----------------         -----------------
              greater than 80h         Bus, CPU, or RAM related
              40h - 7Fh                Interface or adapter card
              less than 3Fh            Software related
            

To identify the location of the aborted instruction, enter '?eip'. NetWare should reply with the name of the NLM where the instruction is located and its code offset.

To confirm the currently executing process and to list those processes in the queue awaiting processing time, enter '.p'. This information may be useful for the technical personnel in charge of supporting the offending NLM.

To determine additional information about the currently running process, enter '.r'. This information may be useful for the technical personnel in charge of supporting the offending NLM.

It is also possible to scroll through the Server screens by entering 'v' then pressing enter after each screen until returned to the internal debugger. This can be useful to determine if an error message appeared on any of the NLM screens which could further help isolate the source of the ABEND.

At this point, unless the ABEND was related to a hardware failure, entering 'q' and responding 'y'es should abort the internal debugger. If DOS has not been removed from the Server, the system will return to an MS-DOS prompt. If DOS has been removed from the Server, the system should perform a warm boot.



Common ABEND Types

While there are numerous possible sources for an ABEND, some of the most common ABENDs which are likely to occur are:

General Protection Processor Exception (GPPE)
This type of ABEND occurs when the processor is placed in an abnormal state or an invalid processor operation is attempted. While this can be caused by corrupted RAM altering the code being executed, it is most often the result of a miscoded routine attempting to access memory outside of the installed range (i.e., attempting to access a memory location greater than the amount of RAM installed in the Server) or a problem with the stack being overrun.

This type of ABEND can occasionally be recovered from but only if the user is well skilled in assembler and understands 32-bit flat memory model programming techniques. In most cases, it is best to collect as much information as possible and then restart the Server anew.

Upon encountering a GPPE ABEND, enter the Debugger (either by typing 386debug on the keyboard or by pressing the Left Shift, Right Shift, Alt, and Escape keys simultaneously).

At the Debugger prompt (#), type '? eip' and note the module name and code offset that is reported. In most cases, that will be the culprit.

However, if it reports CLIB as the offender, a module probably called a CLIB routine and passed it an invalid pointer. In that case, walking the stack is one option.

Under NetWare v4, examining the stack is very easy since Novell added a specific debugger command for providing such information. To examine the stack under NetWare v4, enter the following at the debugger prompt:

            # dds
            
      xxxxxxxx --yyyyyyyy ?
      (If the value cannot be resolved to a valid memory offset.)

      xxxxxxxx yyyyyyyy ( "NLM" |(Code Start)+xxxxxxxx)
      (If the value resolves to a valid code offset within an NLM.)

      xxxxxxxx yyyyyyyy ( "NLM" |(Data Start)+xxxxxxxx)
      (If the value resolves to a valid data offset within an NLM.)


The debugger should list the last 16 possible offset values that may have been pushed onto the stack. The first line which displays a valid NLM code offset is the most likely source of the API request. Record the NLM name and code offset.

Under NetWare v3, examining the stack is more difficult since there is not a specific debugger command for providing such information. To examine the stack under NetWare v3, it will be necessary to repeat the same command multiple times using different stack offsets. The first such command is as follows:

              # ?[desp+00]
              

    Address in ________.___ at code start +________h Current: ________ ________h


This debugger command queries specific offsets on the stack in an attempt to identify those which point to a valid NLM code offset. It may be necessary to repeat this command numerous time, adjusting the stack offset by a value of 4 (in hex - i.e., 00, 04, 08, 0C, 10, 14, 18, 1C, 20, 24, 28, 2C, 30, etc.) until a valid offset can be located.

After each entry, the Debugger will reply with a memory location. Some of the entries may result in an 'Unknown' memory location due to the fact that they are really values passed as parameters rather than pointers. What is being sought is a code offset within an NLM. In most cases, it should be the first reference which is not SERVER.NLM or CLIB.NLM.

While this technique is not fool-proof and makes assumptions that cannot be proved without additional Debugger research, in many cases it will provide insight into the potential source of the problem.

At this point, you can enter 'q' at the Debugger prompt (#) and NetWare should prompt you if you want to return to DOS (if DOS has not been removed) or the Server will reboot (if DOS has been removed). Type 'y' to return to DOS, at which point you can restart the Server.


Invalid OpCode Processor Exception (IOPE)
This type of ABEND occurs when the processor encounters code in the execution path which is not valid for the installed processor. Such errors are most often the result of corrupted RAM or a miscoded routine changing the code execution path to an invalid memory location (i.e., the execution path is changed to a memory location which contains data instead of code).

Upon encountering an IOPE ABEND, enter the Debugger (either by typing 386debug on the keyboard or by pressing the Left Shift, Right Shift, Alt, and Escape keys simultaneously).

At the Debugger prompt (#), type '? eip' and note the module name and code offset that is reported. In most cases, that will be the culprit.

If the return indicates a data offset or some other invalid code location, chances are that the stack has become corrupted for the active process and it attempted to return execution to an invalid address after a function call.


Non-Maskable Interrupts (NMI)
Non-Maskable Interrupts are one of three types of interrupts (software, maskable hardware, and non-maskable hardware) which can be handled by the main processor and are most often triggered by hardware problems. This is a non-recoverable ABEND (i.e., typical Debugger techniques are of little use) which is most often caused by power fluctuations or parity errors (either on the Bus or in memory). Only in very rare occassions will software can cause an NMI ABEND.

The ABEND "NMI parity error generated by System Board" indicates that the problem is hardware and related to the System Board (Bus, memory, or CPU SRAM cache). These ABENDs are often the result of power fluctuations or the failure of a RAM module. A mismatch in the CPU Cache SRAM can also cause NMI ABENDs. Disabling the CPU's external Cache (internal if the ABEND reoccurs) can often eliminate this ABEND.

The ABEND "NMI parity error generated by IO check is most often memory related. However, there are rare cases where software can triggered this ABEND. Unfortunately, the catastrophic nature of the error makes diagnosing such software problems difficult, at best.

While an NMI error may occur for the first time shortly after a new NLM has been loaded, it is often the result of previously unused memory being exercised, rather than being a problem with the new NLM. If this error occurs more than once, the RAM should be removed from the Server and checked with a hardware based RAM testing device, or replaced. Testing the Server RAM with software utilities will rarely isolate the problem, because NetWare typically manipulates the Server RAM far more thoroughly than such utilities.

With newer, high performance CPUs, heat can also become a factor. Check that there is adequate cooling in the Server for all components, especially the CPU and memory.


Page Fault Protection Exceptions (PFPE)
Page Fault ABENDs occur when a process attempts to access memory outside of its allocated or allow range.

    NetWare v3 allows NLMs to access most any memory region without conflict.

    NetWare v4 has memory protection which prevents access outside of allocated or common regions.


To minimize the possibility of ABENDs in such situations, enable the following SET Parameters:

              SET ALLOW INVALID POINTERS = ON
              SET READ FAULT EMULATION = ON
              SET READ FAULT NOTIFICATION = ON
              SET WRITE FAULT EMULATION = ON
              SET WRITE FAULT NOTIFICATION = ON
            

The emulation parameters will allow the operation to succeed while the notification parameters will report the occurrence via Console Alerts. Note the Console Alerts (which will also display the Module name and the code offset) then report them to the developer.



Other Operational Failures

Another event which can cause the Server to cease normal operation is if the code execution path changes to a memory location that NetWare recognizes as being invalid. This can occur if the stack becomes corrupted causing an invalid return point; or if a function pointer is not properly defined and references an invalid code location. Such events are usually manifested by the following message appearing on the System Console:

              Breakpoint at 00000001 because of INT 3 breakpoint
            

With an entry into the Debugger.

Typically, a "Breakpoint at 00000001" debugger message is the result of an NLM trying to change execution to a NULL pointer. NetWare writes hex 0xCC (Intel's internal opcode for INT 3 - Breakpoint Interrupt) in the first four bytes of memory starting at location zero. Memory location zero is the beginning of the Interrupt Vector Table and, as such, is an invalid location for code execution. Thus, if the stack or a function pointer becomes corrupted and the NLM tries to change execution to a NULL pointer, the Server enters the internal debugger to minimize potential damage caused by invalid code execution.

At the Debugger prompt (#), type '? [desp]' and note the module name and code offset that is reported. In most cases, that will be the culprit.



Additional NetWare Debugger and Debugging ABENDs Information Sources:

Novell has also produced the following documents on Debugging issues:

    Jun. '99 Novell AppNotes (PN# 464-000056-006)
      Using Stack-Walking to Troubleshoot a NetWare Abend

    Apr. '99 Novell AppNotes (PN# 464-000056-004)

      Reducing Abend Resolution Time With Novell's Abend Analyzer

    Oct. '97 Novell AppNotes (PN# 464-000052-010)

      Troubleshooting Server Problems Using the ABEND.LOG File and Memory Images (Core Dumps)

    Mar. '97 Novell AppNotes (PN# 464-000052-003)

      IntranetWare Server Automated Abend Recovery

    Jun. '95 Novell AppNotes (PN# 164-000047-006)

      ABEND Recovery Techniques for NetWare 3 and 4

    Aug. '91 Novell AppNotes (PN# 164-000030-008)

      Using the NetWare 3.x Internal Debugger

Either can be ordered by calling 800/377-4136 (303/297-2725).


This document is copyright © 1999 by avanti technology, inc.

<< Previous Tip > Tips Table < Next Tip >>