Thursday, May 29, 2008

C++ puzzles #9

Namespace Members Have Static Storage

Variables and objects defined within a namespace have static storage type. As such, namespace members are by default initialized to binary zeros before program's outset. Likewise, namespace members are destroyed after program's termination:

 
namespace mine
{
  int n; //automatically initialized to 0 before program's outset
  std::string str("abc"); // str will be destroyed after program's termination
}

Accessing Members of a Class in a Static Member Function

A static member function doesn't take an implicit this argument, as do ordinary class member functions. Therefore, it can't access any other members of its class unless they are also static. Sometimes, you have no choice but to use a static member function, especially when you want to run it in a separate thread but you still need to access other members of the class from that function. There are two solutions: either declare these members static, so that the static member function can access them directly:

 
class Singleton
{
 public:
  static Singleton * instance();
private:
  Singleton * p;
  static Lock lock;
};
 
Singleton * Singleton::instance()
{
 lock.get_lock(); // OK, lock is a static member
 if (p==0)
  p=new Singleton;
 lock.unlock();
 return p;
}

Alternatively, pass a reference to the object in question as an argument of the static member function, so that it can access the object's members through that reference:

 
class C
{
public:
  static void func(C & obj);
  int get_x() const;
private:
 int x
};
 
void C::func( C & obj)
{
 int n = obj.get_x(); // access a member through reference 
}

Accessing a Private Data Member from a Different Object


Different objects of the same class can access each other's members, even if these members are private. For example:

 
class A
{
 int n;
public:
 void f(A* p)  {p->n=0;}// another object's private member!
};
 
int main()
{
A a1,a2;
a1.f(&a2); // a1 changes a2's n
}


Typically, this coding style should be avoided. However, you should be aware that private members of an object can be changed by another object of the same type. Therefore, in certain special conditions, this coding style may be useful.

Why Class String Doesn't Have an Implicit Conversion to char *

The standard class string (unlike MFC CString for example) doesn't have a char * conversion operator for two reasons. First, implicit conversions can cause undesirable surprises when you least expect them. Legacy C code is combined with new C++ code in many systems. In its pre-standardized form, C used char * as generic pointers (void* was added much later). You can imagine what chaos an implicit conversion to char* can inflict in such systems. The other reason is that C strings are null terminated, whereas the underlying representation of a string object is implementation-dependent. An implicit conversion of a string object in a context requiring a null-terminated array of characters can be disastrous. For these reasons, the C++ standardization committee did not include a conversion operator in class string. When you need to do such a conversion, you can call string::c_str() explicitly.

Returning a Value from a Function that Throws

A reader asked me the following questions: "Can a function that returns something, throw a exception and still return a value?" Lets' look at a concrete example:

 
int f()
{
 if (something)
  throw X();
else
  return 0;
}

f() has two exit points: the throw statement and the return statement. On each invocation, f() can exit only from one of these exit points. Thus, syntactically, there is no way that f() can both throw and exception and return a value. At the logical level, there is another reason why it is impossible to return and throw at the same time. When an exception is thrown, it means that the function encountered an irrecoverable error from which it cannot proceed normally and return a meaningful value. On the other hand, if the function can return a meaningful result, there's no reason why an exception should be thrown. The execution path of a return statement and a throw statement are always mutually exclusive.

Guidelines for Overloading the + Operator

The built-in + operator is a binary operator that takes two arguments of the same type and returns the sum of its arguments without changing their values. In addition, + is a commutative operator. This means that you can swap the operands' positions and still get the same result. Likewise, an overloaded version of operator + should reflect all these characteristics.

When overloading +, you can either declare it as a member function of its class or as a friend function. For example:

 
class Date
{
public:
  Date operator +(const Date& other); //member function
};
class Year
{
  friend Year operator+ (const Year y1, const Year y2); //friend
};
Year operator+ (const Year y1, const Year y2);  

The friend version is preferred because it reflects symmetry between the two operands. Since built-in + does not modify any of its operands, the parameters of the overloaded + are declared const. Finally, overloaded + should return the result of its operation by value, not by reference.

Avoiding Buffer Overflows

Buffer overflows are a fertile source of bugs and malicious attacks. They occur when a program attempts to write data past the end of a buffer. Consider this example:

 
#include <stdio.h>
int main()
{
  char buff[15] = {0};  /*zero initialize all elements*/
  printf("enter your name: ");
  scanf(buff, "%s"); /*dangerous, length unchecked*/
}

The program reads a string from the standard input (the keyboard). The problem is it doesn't check the string's length. If the string has more than 14 characters, it causes a buffer overflow as scanf() tries to write the remaining characters past buff's end (remember that one character is always reserved for a null terminator). The result is most likely a runtime crash. On some systems, the users will receive a shell's prompt after the crash. Even if the shell has restricted privileges, the users can still examine the values of environment variables, list the current directory files or detect the network with the "ping" command.

That's not the worst thing that can happen, though. A more dangerous situation is when the program doesn't crash due to a buffer overrun. Experts who are familiar the system's internals can craft a string that is just long enough to overwrite the program's IP (instruction pointer, a pointer to the program's next instruction). If the last four bytes of such a string contain a valid memory address, the program's flow can be altered. For instance, instead of executing the next instruction, the program will execute the code to which the new IP points—it might call another routine, skip code that performs security checks, etc.

What can you do to avert buffer overruns? Always check the bounds of an array before writing it to a buffer. If this is impossible, e.g., when the input is coming from a CGI script, use functions that explicitly limit the number of input characters, e.g., instead of using scanf(), use the fgets() function which reads characters up to a specified limit:

 
#include <stdio.h>
int main()
{
 char buff[15] = {0};
 fgets(buff, sizeof(buff), stdin); /*read at most 14 chars*/
}

Additionally, the standard string functions have versions that take an explicit size limit. Thus, instead of strcpy(), strcmp(), and sprintf(), use strncpy(), strncmp(), and snprint(), respectively.

Return type of an overriding virtual member function

Once declared virtual, a member function can be overridden in any level of derivation, as long as the overriding version has the identical signature of the original declaration. This is quite straightforward, but sometimes a virtual has to return an object of type that differs from the return type declared in its base class. The C++ Standard allows that only when the return type is replaced with a class publicly derived from the original return type. For example:

 
class B {
               virtual B* clone () const { return new B; }
}
 
class D {
               void* clone () const { return new B; } //error; return type

                                                                            //differs 

 
               D* clone () const { return new D; } //OK, D is publicly

                                                                              //derived from B

 
}

Overloading Methods


Suppose you are writing a method in a class that accepts a parameter of a given type. Such a method can also be called with an argument of a different type—as long as an implicit conversion exists between the two types (for example, short to int).

 class example
{
public:
    void method(int parameter);
    ...
}
 
int main()
{
               example eg;
               short pants = 42;
               eg.method(pants); // short to int conversion here
               ...
               return 0;
}


It is possible to overload such methods, and by making the overloaded method private, unwanted conversions can be turned into compile time errors. For example:

 
class example
{
public:
    void method(int parameter);
    ..
private: // reject unwanted conversions 
    void method(short);
    ...
}
 
int main()
{
               example eg;
               short pants = 42;
               eg.method(pants); // Compile time error
               ...
               return 0;
}


You can even use this technique to overload on different signedness of integers. For example:

 
namespace non_std
{
    class string
    {
    public:
              char & operator[](size_t index);
        const char & operator[](size_t index) const;
    ..
    private: // reject unwanted conversions 
        void operator[](signed int);
        void operator[](signed int) const;
    ...
    };
}

Returning Objects by Value

For efficiency reasons, large objects should usually be passed to or returned from a function by reference or by their address (using a pointer). There are, however, a few circumstances in which the best choice is to return an object by value. A good example is an overloaded operator +. It has to return a result-object, yet it may not modify any of its operands. The seemingly natural choice is to allocate the result-object on the heap (using operator new) and return its address. But this is not such a good idea - dynamic memory allocation is significantly slower than local storage. Also it may fail and throw an exception which has to be caught and handled; worse it can lead to memory leaks since it is unclear who's responsible for deleting this object - the creator or the user? Another solution is to use a static object and return it by reference. This is also problematic, since on each invocation of the overloaded operator, the same instance of the static object is being modified and returned to the caller, resulting in aliasing. Consequently, the safest, less error prone and most efficient solution is to return the result-object by value:

 
class Date {
               int d,m,y;
public:
               Date operator + (const Date& other) const { Date temp = *this; temp += other; return temp;} 
};

Initializing a Bit Struct

To initialize a struct that contains bit fields, simply use the ={0}partial initialization list:

 
int main()
{
  struct MP3_HEADER 
  {  
     unsigned Sync:11;
     unsigned Version:2;
     unsigned Layer:2;
     unsigned Protection:1;
     unsigned Bitrate:4;
     unsigned Frequency:2;
     unsigned Padding:1;
     unsigned Private:1;
     unsigned ChannelMode:2;
     unsigned ModeExtension:2;
     unsigned Copyright:1;
     unsigned Original:1;
     unsigned Emphasis:2;
   };
  // create an instance and initialize it
  MP3_HEADER header  = {0}; /*set all members to zero*/
}

When you use a {0} partial initialization list, both C and C++ guarantee that all the members in the struct—regardless of their size and type—are automatically initialized to binary zeros.

Initializing Array Class Members

You cannot initialize an array member in a member-initialization list of a class. For this reason, the following code will not compile:

 
  class A
  {
  private:
    char buff[100]; 
  public:
    A::A() : buff("")   //ill-formed
    {}
  };
 

The following forms won't compile either:

 
  A::A() : buff('\0')  {}  //ill-formed
  A::A() : buff(NULL)  {}  //ill-formed
 

Instead, you should initialize arrays inside the constructor body, as follows:

   
  A::A() 
  { 
    memset(buff, '\0', sizeof(buff)); 
  }

Assigning a Zero Value to All Members of a Struct

Sometimes you need to clear all the members of a struct after it was used. With large structs, you usually use memset() for that purpose. However, for smaller structs that occupy 2, 4 or 8 bytes of memory, calling memset() is rather expensive in terms of performance:

 
  struct Date
  {
    char day;
    char month;
    short year;
  }; // Date occupies 4 bytes
 
  Date d;
  d.day = 13; d.month = 4; d.year = 2000; // use d
  memset(&d, 0, sizeof (d)); // now clear it

You can avoid the overhead of calling memset() by using the following technique instead:

 
    // convert a pointer to d to a pointer to int
  int * pfake = reinterpret_cast <int*> (&d); // 1
    // assigns all d's members to zero through fake ptr
  *pfake = 0; // 2

The first line of code creates a pointer to int (assuming 32-bit int size) that actually points to a Date object, d. Of course, cheating is necessary to convince the compiler to accept this conversion. This is why reinterpret_cast is used. The second line dereferences that fake int pointer and writes the value 0 to its memory block. This ensures that all the bits in d are set to zero.

You can perform this two-step operation in a single statement like this:

 
  *( reinterpret_cast <int*> (&d) ) = 0;

However, this is less readable and more error prone.

Zero Sized Arrays?

In standard C++, declaring arrays with zero elements is illegal:

         
  int n[0]; //illegal

However, certain compilers do support arrays of zero size as non-standard extension.

In contrast, dynamic allocation of zero sized arrays is valid C++:

         
  int n = new int[0]; 

The standard requires that in this case, new allocate an array with no elements. The pointer returned by new is non-null and it is distinct from a pointer to any other object. Similarly, deleting such a pointer is a legal operation.

While zero-sized dynamic arrays may seem like another C++ trivia that no one may ever need, this feature is chiefly important when implementing custom memory allocators: A custom allocation function may take any non-negative (i.e., unsigned) argument without worrying whether it's zero.

         
  void * allocate_mem(unsigned int size)
  {
    //...no need to check whether size equals zero
    return new char[size];
  }

Guidelines for Writing Portable Code

As opposed to what most people believe, portability does not guarantee that you can compile and run the same code on every platform without any modifications. Although it is sometimes possible to write 100% portable code, in practice, this code is too complex and inefficient. A better approach is to separate platform-specific modules from the platform-independent parts of the entire application. GUI components, for instance, tend to be very platform specific, and should be encapsulated in dedicated classes. Conversely, business logic and numeric computations are platform-independent. Therefore, when the application is to be ported, only the GUI modules need to be re-written while the rest of the modules can be reused.

Assigning Integers to enum Variables

C and C++ differ in their handling of enum types. While C allows you to assign a plain int to an enum variable, C++ doesn't. Therefore, a C compiler will accept the following code while a standard compliant C++ compiler won't:

 
enum Direction (West, North East, South};
Direction d;
d = 1; /* OK in C, d equals 'North' */

C++ has stricter type safety rules. A standard-compliant C++ compiler will reject the assignment of 1 to d. You have to use an explicit cast for this to work, or better still, always assign an enumerator to an enum variable:

 
d = static_cast < Direction > (1);  // fine
d = East;

Variable Length Arrays

In C89 and C++, array dimensions must be declared using integer constant expressions. This allows the compiler to compute the array's size at compile time. In C99, this rule was relaxed: you can now use any integer expression to declare array dimensions. The size of such an array is known only at runtime. For example:

 
int func(int dim)  
{
  int arr[dim]; // possible only in C99
}

The number of elements in arr may change every time func() is called. For example:

 
int main
{
 int dim;
 printf("enter array's size : ");
 scanf("%d",&dim);
 func(dim1);
}

Because the size of a variable length array can't be determined at compile-time, C99 also changed the sizeof operator. In general, sizeof calculates the size of an object at compile time. However, when applied to variable length arrays, sizeof calculates the array's size at runtime. Remember that variable length arrays aren't dynamic arrays. That is, they can't change their size during their lifetime. They differ from ordinary arrays in that they may have a different size every time their declaration is encountered. Variable arrays can only be local. Note that C++ doesn't support variable length arrays. However, because many C++ compilers are also C compilers, certain C++ compilers may support this feature as a non-standard extension.

Avoid Excessive use of Fully Qualified Names

To some extent, the use of fully qualified names is the recommended way of referring to namespace members because it uniquely identifies and yet it avoids name conflicts. For example:

 
  #include <string>
  int main()
  {
    std::string str;
    std::string str2;
  }

In the example above, the fully qualified name std::string is used twice. In real world code, however, dozens of components of the Standard Library are used. Repeating their qualified names over and over again is laborious and error-prone. Worse yet, it renders your code unreadable. When you need to use qualified names more than 2-3 times, prefer a using-declaration or even a user-directive:

 
  #include <string>
  #include <vector>
  int main()
  {
    using std::string; //using declaration; 

needed only once 
    using std::vector; // using declaration 
    string str; // non-qualified name 
    string str2;
    vector <int> vi;
  }

How To Know Which Process Has Loaded Your DLL


It’s possible that a DLL to which you’ve written will be used by more than one process. Sometimes you may need to know which process has called a function exposed by the DLL.

Adding the following lines to the DLL function will retrieve the process that made the call:

 
char buf[MAX_PATH]=““;
GetModuleFileName(NULL,buf,sizeof(buf));

Structs and Unions


It is common knowledge that the only difference between a struct and a union in C is that all elements of a union share the same memory location but the elements of struct do not. In C++, a struct is very similar to a class, the only difference being that the default access specifier for a struct is public and for a class it’s private. This feature of the struct construct broadens the difference between a struct and a union in C++.

Firstly, a struct can be part of an inheritance hierarchy, a union cannot. In other words, a union can neither inherit nor be the parent of another union, struct, or class. This gives rise to another difference: a struct can have virtual members, while a union cannot. This means no static variables, or any objects that overload the = operator, can be members of a union. Finally, no object can be a member of a union if the object has a constructor or a destructor function, but no such restrictions apply to a struct.

Uses of an anonymous union

Unions are used to minimize memory waste. For example, consider the following database-transaction class used to update a person's data. The key can be either a unique ID number or a person's last name, but never both at once:

 
class UpdateDetails {
               private:
               enum keytype{ keystring, keyID} key;
               char *name;
               long ID;
               //...
               UpdateDetails(const char *n): key(keystring),
                                                                                            n (new char [strlen)n) +1]
                                                                                           {strcpy(name,n);} 
 
               UpdatePersonalDetails(long id) : ID(id), key(keyID) {}
};

Clearly, memory is wasted here since only one of the keys can be used in each transaction. An anonymous union (for instance, an embedded union having neither a tag-name nor an instance name) can be used in this case to avoid memory waste:

 
class UpdateDetails {
               enum keytype{ keystring, keyID} key;
               union {  //anonymous union: 1. has no tag name,
                               char *name;
                               long ID 
};    // 2. and no instance name.
public:
UpdateDetails(const char *n) : key (keystring), 
                                                                                             n (new char [strlen(n) +1] 
                                                                                            {strcpy(name, n);} 
 
               UpdateDetails(long id) : ID(id), key(keyID) {};
               //...
};

The advantage over an ordinary union is that in this case, the members of an anonymous union are accessed directly:

 
void  UpdateDetails::GetID() const 
{ 
  if (key == keyID) 
   return ID;//anonymous union member, accessed like ordinary member
 
  return 0L;  //indicate string key is used 
}

 

No comments: