Troubleshooting is more of an art form than an exact science. However, to be an efficient and effective troubleshooter, you must approach the problem in an organized and methodical manner. Remember, you are looking for the cause, not the symptom. As a troubleshooter, you must be able to quickly and confidently eliminate as many alternatives as possible so that you can focus on the things that might be the cause of the problem. In order to do this, you must be organized.
Understanding the following five phases of troubleshooting will help you focus on the cause of the problem and lead you to a permanent fix.
Phase I: Define the Problem
The first phase is the most critical and, often, the most ignored. Without a complete understanding of the entire problem, you can spend a great deal of time working on the symptoms instead of the cause. The only tools required for this phase are a pad of paper, a pen (or pencil), and good listening skills.
Listening to the client or coworker (the computer user) is your best source of information. Don't assume that just because you are the expert, the operator doesn't know what caused the problem. Remember, you might know how the computer works and be able to find the technical cause of the failure, but the users were there before and after the problem started and are likely to recall the events that led up to the failure.
Ask a few specific questions to help identify the problem and list the events that led up to the failure. You might want to create a form that contains the standard questions that follow (and other questions specific to the situation) for taking notes.
- When did you first notice the problem or error?
- Has the computer been moved recently?
- Have you made any changes to software or hardware?
- Has anything happened to the computer? Was it dropped or was something dropped on it? Was coffee or soda spilled on the keyboard?
- When exactly does the problem or error occur? During the startup process? After lunch? Only on Monday mornings? After using e-mail?
- Can you reproduce the problem or error?
- If so, how do you reproduce the problem?
- What does the problem or error look like?
- Describe any changes in the computer coinciding with the problem (such as noise, screen changes, lights, and so forth).
Phase II: Zero In on the Cause
The next step involves the process of isolating the problem. There is no particular correct approach to follow, and there is no substitute for experience. The best you can do is to eliminate any obvious problems and work from the simplest problems to the more complex. The purpose is to narrow your search down to one or two general categories. The following table provides 14 possible categories you can use to narrow your search.
Category | Subcategory | Symptom |
---|---|---|
Electrical Power | Electric utility Fuse box Wiring Plugs/cords Power supply Power connectors |
Dead computer. Intermittent errors on POST. Intermittent lockups. Device not working/not found. |
Connectivity | External cables Internal cables Properly seated cards (chip/boards) SCSI chain Front panel wires (lights and buttons) |
Device not working. Device not found. Intermittent errors on a device. |
Boot | Boot ROM CMOS (chip and settings) CMOS battery Flash ROM |
Dead computer. Consistent errors on POST. Beep errors. CMOS text errors. RAM, hard disk drive, floppy disk drive, video errors. |
Memory | DRAM-proper type and setup DRAM CMOS settings SRAM-proper type and setup SRAM CMOS settings Motherboard jumpers |
Dead computer. Parity errors. GPF with consistent addresses. HIMEM.SYS errors. |
Mass storage | Hard disk drives, floppy disk drives, CD-ROM drives, Zip drives, tape drives Partitions File structure FATs Directory structure Filenames and attributes |
Error messages: "Missing operating system" "File not found" "No boot device" "Abort, Retry, Fail" |
Input/output | IRQ settings I/O addresss DMA settings Serial port settings Parallel port settings SCSI settings Card jumper settings |
System locks up. Device not responding. Bizarre behavior from a device. |
Operating system | BUFFERS FILES FCBs (File Control Blocks) Stacks IO.SYS/MSDOS.SYS Set statements Paths and prompts External MS-DOS commands Multiboot CONFIG.SYS |
Error messages: "Missing operating system" "Bad or missing command interpreter" "Insert disk with COMMAND.COM" "Stack overflow" "Insufficient File Handles" |
Applications | Proper installation Proper configuration Knowledge of capabilities Knowledge of bugs, incompatibilities, work-arounds |
Application doesn't work properly. Application-specific errors. Application-specific GPFs. Lock-up only in specific application. |
Device drivers | All devices in CONFIG.SYS, SYSTEM.INI, or Registry Proper versions Proper configuration |
Device lockups on access. Intermittent lockups. Computer runs in safe mode only. |
Memory management | HIMEM.SYS settings EMM386.EXE settings MSDOS.SYS options (Win95) SYSTEM.INI/WIN.INI Virtual memory Windows resource usage UMB management |
"Not enough memory" error. Missing XMS, EMS memory. Device lockups. GPFs at KRNL386.EXE. GPFs at USER.EXE or GDI.EXE. |
Configuration/setup | Files used for initialization Basic layout of initialized files |
Programs refuse to do something they should. Missing options in program. Missing program or device. |
Viruses | Virus-management procedures Knowledge of virus symptoms Virus-removal procedures |
Computer runs slow. Intermittent lockups. Storage problems. Operating-system problems. Mysterious symptoms. |
Operator Interface | Lack of training/understanding Fear of the computer Poor attitude |
"I didn't touch it!" "It always does that!" Multiple users. |
Network | Logon errors Communication errors |
User forgets password. Expired password. Cable or NIC card problems. |
Be sure to observe the failure yourself. If possible, have someone demonstrate the failure to you. If it is an operator-induced problem, it is important to observe how it is created, as well as the results. |
Intermittent problems are the most difficult ones to isolate. They never seem to occur when you are present. The only way to resolve them is to be able to re-create the set of circumstances that causes the failure. Sometimes, moving step-by-step to eliminate the possible causes is all you can do. This takes time and patience. The user will have to keep a detailed record of what is being done before and when the failure occurs. In such cases, tell the user to not do anything with the computer when the problem recurs, except to call you. That way, the "evidence" will not be disturbed.
TIP
For a totally random, intermittent problem, always suspect the power supply.
Phase III: Conduct the Repair
After you have zeroed in on a few categories, the process of elimination begins.
Make a PlanCreate a planned approach to isolating the problem based on your knowledge at this point. Your plan should start with the most obvious or easiest solution to eliminate and move forward. Put the plan in writing!
The first step of any plan should be to document and back up.
If possible, make no assumptions. If you must make any assumptions, write them down. You might need to refer back to them later.
Follow the Plan from Beginning to EndOnce a plan is created, it is important to follow it through. Jumping around and randomly trying things can often lead to more serious problems.
Document every action you take and its results.
If the first plan is not successful (they won't always be), create a new plan based on what you discovered with the previous plan. Be sure to refer to any assumptions you might have made.
Repair or ReplaceAfter locating the problem, either repair or replace the defect. If the problem is software-oriented, be sure to record the "before" and "after" changes.
Phase IV: Confirm the Results
No repair is complete without confirmation that the job is done. Confirmation involves two steps:
- Make sure that the problem no longer exists. Ask the user to test the solution and confirm client satisfaction.
- Make sure that the fix did not create other problems. You have not done a professional job if the repair has been completed at the expense of something else.
Phase V: Document the Results
Finally, document the problem and the repair. There is no substitute for experience in troubleshooting. Every new problem presents you with an opportunity to expand that experience. Keeping a copy of the repair procedure in your technical library will come in handy in a year or two when the problem (or one like it) occurs again. This is one way to build, maintain, and share experience.
Lesson Summary
The following points summarize the main elements of this lesson:
Learning doesn't stop with certification. To stay at the top of your profession, you must keep learning.
Staying connected with your peers is an important part of learning.
Maintain a proper set of the tools of the trade.
Know where and how to get technical support.
Good troubleshooting requires a plan. To be successful, you must stick to your plan.