Visual Basic

Evil Type Coercion

A programmer on my team had a surprise when writing Visual Basic code to extract information from a SQL Server database. Having retrieved a recordset, he wrote the following code:

  Dim vntFirstValue As Variant, vntSecondValue As Variant
  Dim nResultValue1 As Integer, nResultValue2 As Integer
  vntFirstValue = Trim(rsMyRecordset!first_value)
  vntSecondValue = Trim(rsMyRecordset!second_value)
  nResultValue1 = vntFirstValue + vntSecondValue
  nResultValue2 = vntFirstValue + vntSecondValue + 1

He was rather upset when he found that the "+" operator not only concatenated the two variants but also added the final numeric value. If vntFirstValue contained "1" and vntSecondValue contained "2," nResultValue1 had the value 12 and nResultValue2 had the value 13.

To understand exactly what's going on here, we have to look at how Visual Basic handles type coercion. Up until Visual Basic 3, type coercion was relatively rare. Although you could write Visual Basic 3 code like this:

  txtBox.Text = 20

and find that it worked without giving any error, almost every other type of conversion had to be done explicitly by using statements such as CStr and CInt. Starting with Visual Basic 4, and continuing in Visual Basic 5 and 6, performance reasons dictated that automatic type coercion be introduced. Visual Basic no longer has to convert an assigned value to a Variant and then unpack it back into whatever data type is receiving the assignment. It can instead invoke a set of hard-coded coercion rules to perform direct coercion without ever involving the overhead of a Variant. Although this is often convenient and also achieves the laudable aim of good performance, it can result in some rather unexpected results. Consider the following code:

  Sub Test()
      Dim sString As String, nInteger As Integer
      sString = "1"
      nInteger = 2
      ArgTest sString, nInteger
  End Sub
  Sub ArgTest(ByVal inArgument1 As Integer, _
              ByVal isArgument2 As String)
      ' Some code here
  End Sub

In Visual Basic 3, this code would give you an immediate error at compile time because the arguments are in the wrong order. In Visual Basic 4 or later, you won't get any error because Visual Basic will attempt to coerce the string variable into the integer parameter and vice versa. This is not a pleasant change. If inArgument1 is passed a numeric value, everything looks and performs as expected. As soon as a non-numeric value or a null string is passed, however, a run-time error occurs. This means that the detection of certain classes of bugs has been moved from compile time to run time, which is definitely not a major contribution to road safety.

The following table shows Visual Basic 6's automatic type coercion rules.

Source Type Coerced To Apply This Rule
Integer Boolean 0=False, nonzero=True
Boolean Byte False=0, True=-1 (except Byte
Boolean Any numeric False=0, True=-1 (except Byte
String Date String is analyzed for MM/dd/yy and so on
Date Numeric type Coerce to Double and useDateSerial(Double)
Numeric Date Use number as serial date, check valid date range
Numeric Byte Error if negative
String Numeric type Treat as Double when representing a number

Some final thoughts Any Visual Basic developer with aspirations to competence should learn the automatic type coercion rules and understand the most common situations in which type coercion's bite can be dangerous.

Arguing safely

In Visual Basic 3, passing arguments was relatively easy to understand. You passed an argument either by value (ByVal) or by reference (ByRef). Passing ByVal was safer because the argument consisted only of its value, not of the argument itself. Therefore, any change to that argument would have no effect outside the procedure receiving the argument. Passing ByRef meant that a direct reference to the argument was passed. This allowed you to change the argument if you needed to do so.

With the introduction of objects, the picture has become more complicated. The meaning of ByVal and ByRef when passing an object variable is slightly different than when passing a nonobject variable. Passing an object variable ByVal means that the type of object that the object variable refers to cannot change. The object that the object variable refers to is allowed to change, however, as long as it remains the same type as the original object. This rule can confuse some programmers when they first encounter it and can be a source of bugs if certain invalid assumptions are made.

Type coercion introduces another wrinkle to passing arguments. The use of ByVal has become more dangerous because Visual Basic will no longer trigger certain compile-time errors. In Visual Basic 3, you could never pass arguments to a procedure that expected arguments of a different type. Using ByVal in Visual Basic 6 means that an attempt will be made to coerce each ByVal argument into the argument type expected. For example, passing a string variable ByVal into a numeric argument type will not show any problem unless the string variable actually contains non-numeric data at run time. This means that this error check has to be delayed until run time-see the earlier section called "Evil Type Coercion" for an example and more details.

If you don't specify an argument method, the default is that arguments are passed ByRef. Indeed, many Visual Basic programmers use the language for a while before they realize they are using the default ByRef and that ByVal is often the better argument method. For the sake of clarity, I suggest defining the method being used every time rather than relying on the default. I'm also a firm believer in being very precise about exactly which arguments are being used for input, which for output, and which for both input and output. A good naming scheme should do something like prefix every input argument with "i" and every output argument with "o" and then perhaps use the more ugly "io" to discourage programmers from using arguments for both input and output. Input arguments should be passed ByVal, whereas all other arguments obviously have to be passed ByRef. Being precise about the nature and use of procedure arguments can make the maintenance programmer's job much easier. It can even make your job easier by forcing you to think clearly about the exact purpose of each argument.

One problem you might run into when converting from previous versions of Visual Basic to Visual Basic 6 is that you are no longer allowed to pass a control to a DLL or OCX using ByRef. Previously, you might have written your function declaration like this:

  Declare Function CheckControlStatus Lib "MY.OCX" _
            (ctlMyControl As Control) As Integer

You are now required to specify ByVal rather than the default ByRef. Your function declaration must look like this:

  Declare Function CheckControlStatus Lib "MY.OCX" _
            (ByVal ctlMyControl As Control) As Integer

This change is necessary because DLL functions now expect to receive the Windows handle of any control passed as a parameter. Omitting ByVal causes a pointer to the control handle to be passed rather than the control handle itself, which will result in undefined behavior and possibly a GPF.

The meaning of zero

Null, IsNull, Nothing, vbNullString, "", vbNullChar, vbNull, Empty, vbEmpty… Visual Basic 6 has enough representations of nothing and zero to confuse the most careful programmer. To prevent bugs, programmers must understand what each of these Visual Basic keywords represents and how to use each in its proper context. Let's start with the interesting stuff.

  Private sNotInitString As String
  Private sEmptyString As String
  Private sNullString As String
  sEmptyString = ""
  sNullString = 0&

Looking at the three variable declarations above, a couple of questions spring to mind. What are the differences between sNotInitString, sEmptyString, and sNullString? When is it appropriate to use each declaration, and when is it dangerous? The answers to these questions are not simple, and we need to delve into the murky depths of Visual Basic's internal string representation system to understand the answers.

After some research and experimentation, the answer to the first question becomes clear but at first sight is not very illuminating. The variable sNotInitString is a null pointer string, held internally as a pointer that doesn't point to any memory location and that holds an internal value of 0. sEmptyString is a pointer to an empty string, a pointer that does point to a valid memory location. Finally, sNullString is neither a null string pointer nor an empty string but is just a string containing 0.

Why does sNotInitString contain the internal value 0? In earlier versions of Visual Basic, uninitialized variable-length strings were set internally to an empty string. Ever since the release of Visual Basic 4, however, all variables have been set to 0 internally until initialized. Developers don't normally notice the difference because, inside Visual Basic, this initial zero value of uninitialized strings always behaves as if it were an empty string. It's only when you go outside Visual Basic and start using the Windows APIs that you receive a shock. Try passing either sNotInitString or sEmptyString to any Windows API function that takes a null pointer. Passing sNotInitString will work fine because it really is a null pointer, whereas passing sEmptyString will cause the function to fail. Of such apparently trivial differences are the really nasty bugs created.

The following code snippet demonstrates what can happen if you're not careful.

  Private Declare Function WinFindWindow Lib "user32" Alias _
      "FindWindowA" (ByVal lpClassName As Any, _
                     ByVal lpWindowName As Any) As Long
      Dim sNotInitString As String
      Dim sEmptyString As String
      Dim sNullString As String
      sEmptyString = ""
      sNullString = 0&
      Shell "Calc.exe", 1
      DoEvents
      ' This will work.
      x& = WinFindWindow(sNotInitString, "Calculator")
      ' This won't work.
      x& = WinFindWindow(sEmptyString, "Calculator")
      ' This will work.
      x& = WinFindWindow(sNullString, "Calculator")

Now that we've understood one nasty trap and why it occurs, the difference between the next two variable assignments becomes clearer.

  sNullPointer = vbNullString
  sEmptyString = ""

It's a good idea to use the former assignment rather than the latter, for two reasons. The first reason is safety. Assigning sNullPointer as shown here is the equivalent of sNotInitString in the above example. In other words, it can be passed to a DLL argument directly. However, sEmptyString must be assigned the value of 0& before it can be used safely in the same way. The second reason is economy. Using "" will result in lots of empty strings being scattered throughout your program, whereas using the built-in Visual Basic constant vbNullString will mean no superfluous use of memory.

Null and IsNull are fairly clear. Null is a variant of type vbNull that means no valid data and typically indicates a database field with no value. The only hazard here is a temptation to compare something with Null directly, because Null will propagate through any expression that you use. Resist the temptation and use IsNull instead.

  ' This will always be false.
  If sString = Null Then
      ' Some code here
  End If

Continuing through Visual Basic 6's representations of nothing, vbNullChar is the next stop on our travels. This constant is relatively benign, simply CHR$(0). When you receive a string back from a Windows API function, it is normally null-terminated because that is the way the C language expects strings to look. Searching for vbNullChar is one way of determining the real length of the string. Beware of using any API string without doing this first, because null-terminated strings can cause some unexpected results in Visual Basic, especially when displayed or concatenated together.

Finally, two constants are built into Visual Basic for use with the VarType function. vbNull is a value returned by the VarType function for a variable that contains no valid data. vbEmpty is returned by VarType for a variable that is uninitialized. Better people than I have argued that calling these two constants vbTypeNull and vbTypeEmpty would better describe their correct purpose. The important point from the perspective of safety is that vbEmpty can be very useful for performing such tasks as ensuring that the properties of your classes have been initialized properly.