PC Hardware

Troubleshooting

Troubleshooting is perhaps the most difficult task of a computer professional. After a problem has been diagnosed, there are usually several resources, or given procedures, available to correct the problem. Frequently, the problems as reported are really just symptoms, not the cause. To make matters worse, computers never fail at a convenient time. They fail in the middle of a job or when there is a deadline and the user must have the problem fixed now.

Troubleshooting is more of an art form than an exact science. However, to be an efficient and effective troubleshooter, you must approach the problem in an organized and methodical manner. Remember, you are looking for the cause, not the symptom. As a troubleshooter, you must be able to quickly and confidently eliminate as many alternatives as possible so that you can focus on the things that might be the cause of the problem. In order to do this, you must be organized.

Understanding the following five phases of troubleshooting will help you focus on the cause of the problem and lead you to a permanent fix.

Phase I: Define the Problem

The first phase is the most critical and, often, the most ignored. Without a complete understanding of the entire problem, you can spend a great deal of time working on the symptoms instead of the cause. The only tools required for this phase are a pad of paper, a pen (or pencil), and good listening skills.

Listening to the client or coworker (the computer user) is your best source of information. Don't assume that just because you are the expert, the operator doesn't know what caused the problem. Remember, you might know how the computer works and be able to find the technical cause of the failure, but the users were there before and after the problem started and are likely to recall the events that led up to the failure.

Ask a few specific questions to help identify the problem and list the events that led up to the failure. You might want to create a form that contains the standard questions that follow (and other questions specific to the situation) for taking notes.

  • When did you first notice the problem or error?
  • Has the computer been moved recently?
  • Have you made any changes to software or hardware?
  • Has anything happened to the computer? Was it dropped or was something dropped on it? Was coffee or soda spilled on the keyboard?
  • When exactly does the problem or error occur? During the startup process? After lunch? Only on Monday mornings? After using e-mail?
  • Can you reproduce the problem or error?
  • If so, how do you reproduce the problem?
  • What does the problem or error look like?
  • Describe any changes in the computer coinciding with the problem (such as noise, screen changes, lights, and so forth).

Phase II: Zero In on the Cause

The next step involves the process of isolating the problem. There is no particular correct approach to follow, and there is no substitute for experience. The best you can do is to eliminate any obvious problems and work from the simplest problems to the more complex. The purpose is to narrow your search down to one or two general categories. The following table provides 14 possible categories you can use to narrow your search.

Category Subcategory Symptom
Electrical Power Electric utility
Fuse box
Wiring
Plugs/cords
Power supply
Power connectors
Dead computer.
Intermittent errors on POST.
Intermittent lockups.
Device not working/not found.
Connectivity External cables
Internal cables
Properly seated cards (chip/boards)
SCSI chain
Front panel wires (lights and buttons)
Device not working.
Device not found.
Intermittent errors on a device.
Boot Boot ROM
CMOS (chip and settings)
CMOS battery
Flash ROM
Dead computer.
Consistent errors on POST.
Beep errors.
CMOS text errors.
RAM, hard disk drive, floppy disk drive, video errors.
Memory DRAM-proper type and setup
DRAM CMOS settings
SRAM-proper type and setup
SRAM CMOS settings
Motherboard jumpers
Dead computer.
Parity errors.
GPF with consistent addresses.
HIMEM.SYS errors.
Mass storage Hard disk drives, floppy disk drives, CD-ROM drives, Zip drives, tape drives
Partitions
File structure
FATs
Directory structure
Filenames and attributes
Error messages:
"Missing operating system"
"File not found"
"No boot device"
"Abort, Retry, Fail"
Input/output IRQ settings
I/O addresss
DMA settings
Serial port settings
Parallel port settings
SCSI settings
Card jumper settings
System locks up.
Device not responding.
Bizarre behavior from a device.
Operating system BUFFERS
FILES
FCBs (File Control Blocks)
Stacks
IO.SYS/MSDOS.SYS
Set statements
Paths and prompts
External MS-DOS commands
Multiboot CONFIG.SYS
Error messages:
"Missing operating system"
"Bad or missing command interpreter"
"Insert disk with COMMAND.COM"
"Stack overflow"
"Insufficient File Handles"
Applications Proper installation
Proper configuration
Knowledge of capabilities
Knowledge of bugs, incompatibilities, work-arounds
Application doesn't work properly.
Application-specific errors.
Application-specific GPFs.
Lock-up only in specific application.
Device drivers All devices in CONFIG.SYS, SYSTEM.INI, or Registry
Proper versions
Proper configuration
Device lockups on access.
Intermittent lockups.
Computer runs in safe mode only.
Memory management HIMEM.SYS settings
EMM386.EXE settings
MSDOS.SYS options (Win95)
SYSTEM.INI/WIN.INI
Virtual memory
Windows resource usage
UMB management
"Not enough memory" error.
Missing XMS, EMS memory.
Device lockups.
GPFs at KRNL386.EXE.
GPFs at USER.EXE or GDI.EXE.
Configuration/setup Files used for initialization
Basic layout of initialized files
Programs refuse to do something they should.
Missing options in program.
Missing program or device.
Viruses Virus-management procedures
Knowledge of virus symptoms
Virus-removal procedures
Computer runs slow.
Intermittent lockups.
Storage problems.
Operating-system problems.
Mysterious symptoms.
Operator Interface Lack of training/understanding
Fear of the computer
Poor attitude
"I didn't touch it!"
"It always does that!"
Multiple users.
Network Logon errors
Communication errors
User forgets password.
Expired password.
Cable or NIC card problems.
Be sure to observe the failure yourself. If possible, have someone demonstrate the failure to you. If it is an operator-induced problem, it is important to observe how it is created, as well as the results.

Intermittent problems are the most difficult ones to isolate. They never seem to occur when you are present. The only way to resolve them is to be able to re-create the set of circumstances that causes the failure. Sometimes, moving step-by-step to eliminate the possible causes is all you can do. This takes time and patience. The user will have to keep a detailed record of what is being done before and when the failure occurs. In such cases, tell the user to not do anything with the computer when the problem recurs, except to call you. That way, the "evidence" will not be disturbed.

TIP
For a totally random, intermittent problem, always suspect the power supply.

Phase III: Conduct the Repair

After you have zeroed in on a few categories, the process of elimination begins.

Make a Plan

Create a planned approach to isolating the problem based on your knowledge at this point. Your plan should start with the most obvious or easiest solution to eliminate and move forward. Put the plan in writing!

The first step of any plan should be to document and back up.

If possible, make no assumptions. If you must make any assumptions, write them down. You might need to refer back to them later.

Follow the Plan from Beginning to End

Once a plan is created, it is important to follow it through. Jumping around and randomly trying things can often lead to more serious problems.

Document every action you take and its results.

If the first plan is not successful (they won't always be), create a new plan based on what you discovered with the previous plan. Be sure to refer to any assumptions you might have made.

Repair or Replace

After locating the problem, either repair or replace the defect. If the problem is software-oriented, be sure to record the "before" and "after" changes.

Phase IV: Confirm the Results

No repair is complete without confirmation that the job is done. Confirmation involves two steps:

  • Make sure that the problem no longer exists. Ask the user to test the solution and confirm client satisfaction.
  • Make sure that the fix did not create other problems. You have not done a professional job if the repair has been completed at the expense of something else.

Phase V: Document the Results

Finally, document the problem and the repair. There is no substitute for experience in troubleshooting. Every new problem presents you with an opportunity to expand that experience. Keeping a copy of the repair procedure in your technical library will come in handy in a year or two when the problem (or one like it) occurs again. This is one way to build, maintain, and share experience.

Lesson Summary

The following points summarize the main elements of this lesson:

  • Learning doesn't stop with certification. To stay at the top of your profession, you must keep learning.

  • Staying connected with your peers is an important part of learning.

  • Maintain a proper set of the tools of the trade.

  • Know where and how to get technical support.

  • Good troubleshooting requires a plan. To be successful, you must stick to your plan.