C Language

Hassle-free Arrays and Strings

When you learn that, in C, the first element of an array is referenced as x[0], you appreciate C's reputation for being both efficient and hard to read. The natural way to number the first element in a series, of course, is with 1. But, as a replacement for assembler, C was designed to start arrays with 0 to improve performance.

Arrays that use 0 instead of 1 for the first element permit simpler and faster calculation of memory addresses for subscripted references. The rules for using C arrays are simple, but don't let that lull you into thinking you won't encounter problems.

Suppose you need to store the number of orders for each month (1 to 12). A "good" C programmer might declare an array and reference an array element as in the following code:

 int month;
 int orders[12];
 ++orders[month - 1];

This works fine, so long as you never forget to subtract 1 from month when you use month as a subscript. You also need to be careful when you code for loops:

 for (month = 0; month < 12; month++) {
   printf("Total for month %d is %d\n",
       (month + 1), orders[month]);

This example suggests I should clarify my previous caution to remember to subtract 1. When you use month as a loop variable that runs across the array's range (0 to 11), you shouldn't make the adjustment in subscripted array references, but rather in printing the loop variable. Also remember that, to cover the array's range, the for loop must start at 0 and run to 11, not 12, so use < instead of <= in the limit test.

There, see how simple arrays that start at 0 are! There are at least a dozen other ways to code this example, but not one of them overcomes the conflict between natural counting systems, which begin with 1, and C's arrays, which begin with 0. This "impedance mismatch" between the natural world and C increases the likelihood of "off by 1" array and loop errors.

An easy solution exists, however: Declare C arrays with one extra element, and don't use the element with subscript 0. Look how much simpler the code becomes:

 int month;
 int orders[12 + 1];
 for (month = 1; month <= 12; month++) {
   printf("Total for month %d is %d\n",
       month, orders[month]);

Now you don't have to selectively adjust subscripts, and for loops can have their range expressed clearly. You might wonder at my profligate waste of memory for the unused array element; but in many cases, simplified subscripts require less machine code, so you get smaller programs.

If you want to refine this approach further by expressing your array declarations using the highest valid subscript (i.e., the upper bound), rather than one greater, you can call on our old friends, source macros. Figure 4.1 shows one way to simplify C array declarations by using a source macro. (In this and other macros in this chapter, I use the term "table" to emphasize the distinction between 0-based and 1-based arrays.) You can build similar macros for tables of two or more dimensions.

Figure 4.1 Table Definition Macros

 #define TABLE( tname, ttype, ttop )      \
      int tname##_upper_bound = ( ttop ); \
      ttype tname[ ( ttop ) + 1 ]

 #define upper_bound( tname ) tname##_upper_bound

Note: ## is the macro concatenation (or "token-pasting") operator. For example, if you use the macro upper_bound( orders ), the macro preprocessor will paste orders to _upper_bound to generate orders_upper_bound.

Now we can write our first example as in Figure 4.2. And because we often want to do for loops across the entire range of a table, the macros in Figure 4.3 are handy. Using these macros, we can simplify printing the monthly counts to the code in Figure 4.4.

Figure 4.2 Using Table Macros

 int  month;
 TABLE( orders, int, 12 );
 ++ orders[ month ];

 for ( month = 1; month <= upper_bound( orders ); month ++ ) {
     printf( "Total for month %d is %d\n",
           month, orders[ month ] );

Figure 4.3 Table Loop Macros

 #define OVER_TABLE( tname, idx )                        \
      {int idx;                              \
       for ( idx = 1; idx <= upper_bound( tname ); idx ++ ) {

 #define ENDOVER }}

Figure 4.4 Using Table Loop Macros

 OVER_TABLE( orders, month )
   printf( "Total for month %d is %d\n",
        month, orders[ month ] );

By now, you might reasonably ask, "Why bother creating all these macros to make C look like some other language; why not just use another language?" Good question, and if you have a good alternative, such as Pascal or Modula-2, you should use it instead of C. But if you're stuck with C, well-designed macros can add substantial safety and clarity to your programs. And, I should add, well-written macros don't hurt runtime performance, because they are translated into ordinary C code before compilation.