Thursday, May 29, 2008

C++ puzzles #8

Where Are Exception Objects Stored?

Consider the following program:

 
  class Exception {};
  int main()
  {
    try 
    {
      throw Exception(); // where is the exception stored?
    }
    catch( Exception & ex) // catch by reference
    {}
  }

The handler catches the exception by reference, not by value. Catching exception by reference is recommended because it's more efficient and it avoids slicing when handling a derived exception object. However, a reader who read this recommendation was intrigued: "If the exception object is created on the stack, why doesn’t it get destroyed in the process of unwinding the stack, as do other automatic (non-exception) objects?"

Although the C++ standard doesn't specify where exceptions are stored in memory, the general approach among compiler vendors is to use a special stack for exceptions. This stack is not affected by the stack unwinding process so when an exception is thrown from a function and that function exits, the exception object remains alive until the handler that caught it terminates.

The Memory of an Exception Object

The memory for the temporary copy of an exception that is being thrown is allocated in a platform-defined way. Note, however, that it may not be allocated on the free store, and in general, such objects are allocated on a special exception stack. The temporary persists as long as a handler for that exception is executing. If a handler exits by executing a throw statement (i.e., the handler re-throws the exception), control passes to another handler for the same exception, so the temporary remains. Only when the last handler for that exception has terminated is the temporary object destroyed, and the implementation may deallocate its memory. For example:

         
  class X();
  int main()
  {
    try
    {
      throw X();
    }
    catch (X x) //catch by value
    {
      cout<<"an exception has occurred";
      return;
    }//x is destroyed at this point
}

Detecting Your Machine's Endian

The term endian refers to the way a computer architecture stores the bytes of a multi-byte number in memory. If bytes at lower addresses have lower significance (Intel microprocessors, for instance), this is called little endian ordering. Conversely, big endian ordering describes a computer architecture in which the most significant byte has the lowest memory address. The following portable program detects the endian of the machine on which it is executed:

 
  void main() 
  {
    union probe{ 
      unsigned int num;
      unsigned char bytes[sizeof(unsigned int)];
   };
 
    probe p = { 1U }; //initialize first member of p with unsigned 1
    bool little_endian = (p.bytes[0] == 1U); //in a big endian architecture, p.bytes[0] equals 0  
  }

When Are Pointers Equal?

Pointers to objects or functions of the same type are equal if and only if in they are both NULL:

 
  int *p1 = NULL, p2 = NULL;
  bool equal = (p1==p2); //true

Or if they point to the same object:

 
  char c;
  char * pc1 = &c;
  char * pc2 = &c;
  equal = (pc1 == pc2); // true

Additionally, pointers are equal if they point one position past the end of the same array.

Prefer Enums Over #define Macros When You Need a Fixed Set of Values

Instead of using #define macros to create a set of values, as in this (deprecated) example:

 
#define JAN           1
#define FEB           2
//...
#define DEC          12

Enum types are a significantly better choice:

 //file enums.h
enum Months {
               //a list of enumerators: 
               Jan,        
               Feb,
               //...
               Dec };
 
enum Days {
               Sun,
               Mon,
               //...
               };

There are several benefits to the use of enums.

1. They are safe, because the compiler checks that an enum is always assigned a valid value (one of the enumerators in the enum's definition and no other). Enum types behave like any other built-in type so you can use them to overload functions:

 
#include “enums.h”
bool  func(Months month);  //1
bool func(Days day);            //2
 
void main() {
 
               Days day = Sun;  //Type Safety. Mind that ‘day = 1’ is illegal
               Months month = Feb;
 
               bool b = func(day); // func() #1 is called
               b = func(month); //now func() #2 is called
}

This feature also eliminates silly mistakes like this:

 
bool b= func(50); //If we used #defines, this would pass unnoticed.
                               //Enums ensure that this mistake is detected at 
                               //compile time

2. C++ enums are very efficient because they are automatically converted by the compiler to plain ints. Furthermore: an enum in C++ needn't be identical in size to sizeof(int). As a result, the compiler may optimize memory usage by storing the enumerators’ list in units smaller than int, e.g., short or char. There is also the possibility of storing the enum's value on a machine register, which may considerably increase performance even more.

3. C/C++ enums are easy to maintain. Their enumerators' list can be extended without having to manually recalculate the enumerators' values, because the compiler takes care of that.

Prefer References Over Pointers

Even experienced C++ programmers who have prior experience with C tend to use pointers excessively whereas references may be a better choice. Pointers may result in bugs like the following:

 
bool isValid( const Date *pDate);
 
void f()
{
               Date *p = new Date(); //default: current date
 
               //...many lines of code
 
               delete p;                 //p is now a “dangling pointer” 
               bool valid = isValid(p); //oops! undefined behavior
               p = NULL;
               valid = isValid(p)    //ops! null pointer dereferencing;         
                                                                            //most likely will lead to a crash
}

The use of references eliminates the notorious bugs related to pointers: null pointer assignment and dangling pointer dereferencing, since a reference is always bound to a valid object:

 
bool isValid( const Date& date);  //reference version
 
void f()
{
               Date date; //default: current date
 
               //...many lines of code
 
               bool valid = isValid(date); //always safe
               date += 100; //add 100 days
               valid = isValid(date)              //always safe                                                                      
}
 

One More Reason to Avoid Macros

Even macros that look harmless can have detrimental effects. For example:

 
#define twice(x) ((x)+(x))

The macro, twice, is well parenthesized and performs a simple addition operation. Despite its simplicity, there can still be problems. When twice takes an expression with side effects as an argument, it yields unexpected results:

 
int n = 1;
int sum;
sum = twice(++n);  //guess what?

Since ++n equals 2, you might assume (rather naively) that sum would be 4, but it isn't. The expression twice(++n) is expanded as ((++n)+(++n)). However, if you had used an ordinary function instead of a macro, like this

 
inline int  twice(int x) { return x+x; }

The result will be 4, as expected.

Force an Object to Destroy Itself

Sometimes, you need to force an object to destroy itself because its destructor performs an operation needed immediately. For example, when you want to release a mutex or close a file:

 
  void func(Modem& modem)
  {
      Mutex mtx(modem); // lock modem
      Dial("1234567");
      /* at this point, you want to release <br>the modem lock by invoking Mutex's dtor */  
      do_other_stuff(); //modem is still locked
  } //Mutex dtor called here

After the function Dial() has finished, the modem no longer needs to be locked. However, the Mutex object will release it only when func() exits, and in the meantime, other users will not be able to use the modem. Before you suggest to invoke the destructor explicitly, remember that the destructor will be called once again when func() exits, with undefined behavior:

 
void func(Modem& modem)
{
    Mutex mtx(modem); // lock modem
    Dial("1234567");
    mtx->~Mutex(); // very bad idea!
    do_other_stuff(); 
} //Mutex dtor called here for the second time

There is a simple and safe solution to this. You wrap the critical code in braces:

 
void func(Modem& modem)
 {
    {
      Mutex mtx(modem); // lock modem
      Dial("1234567");
    } //mtx destroyed here
     //modem is not locked anymore
    do_some_other_stuff(); 
}

Avoid Deleting a Pointer More Than Once

The results of deleting an object more than once are undefined. However, if code modifications are impossible (when a third party code is used, for example), a temporary workaround to this bug is assigning a NULL value to a pointer right after it has been deleted. It is guaranteed that a NULL pointer deletion is harmless.

 
String * ps = new String;
//...use ps
if ( TrueCondition ) {
                               delete ps; 
               ps = NULL; //safety-guard: further deletions of ps will be harmless
}
//...many lines of code
delete ps; //a bug. ps is deleted for the second time. However, it's harmless

Please note that this hack is not meant to replace a thorough code review and debugging; it should be used as a transitory band-aid.

What is the Role of an Implicitly-Declared Constructor?

If there is no user-declared constructor in a class, and the class does not contain const or reference data members, the implementation implicitly declares a default constructor for it. An implicitly-declared default constructor is an inline public member of its class. It performs the initialization operations that are needed by the implementation to create an object instance. Note, however, that these operations do not involve initialization of user-declared data members or allocation of memory from the free store. For example:

 
class C
{
private:
  int n;
  char *p;
public:
  virtual ~C() {}
};
 
void f()
{
  C obj;  // 1 implicitly-defined constructor is invoked
}

The programmer did not declare a constructor in class C. Therefore, an implicit default constructor was declared and defined by the implementation in order to create an instance of class C. The synthesized constructor does not initialize the data members n and p, nor does it allocate memory for the latter. These data members have an indeterminate value after obj has been constructed.

The Exception Specification of an Implicitly-Declared Default Constructor

An implicitly-declared default constructor has an exception specification. The exception specification contains all the exceptions of every other special member functions (for example, the constructors of base class and embedded objects) that the constructor invokes directly. To demonstrate that, consider this class hierarchy:

 
  struct A 
  {
    A(); //can throw any type of exception
  };
 
  struct B 
  {
    B() throw(); //empty exception specification; not allowed to throw any exceptions
  };

Here, the classes, C and D, have no user-declared constructors and consequently, the implementation implicitly declares constructors for them:

 
  struct C : public B
  {
    //implicitly-declared constructor
    // public: inline C::C() throw;
  }
 
  struct D: public A, public B 
  {
    //implicitly-declared constructor
    // public inline D::D(); 
  };

The implicitly-declared constructor in class C is not allowed to throw any exception because it directly invokes the constructor of class B, which is not allowed to throw any exception either. On the other hand, the implicitly-declared constructor in class D is allowed to throw any type of exception because it directly invokes the constructors of the classes A and B. Since the constructor of class A is allowed to throw any type of exception, D's implicitly-declared constructor has a matching exception specification. In other words, an implicitly-declared constructor allows all exceptions if any function it directly invokes allows all exceptions; it allows no exceptions if every function it directly invokes allows no exceptions

Static Class Members may not be Initialized in a Constructor

A common mistake is to initialize static class members inside the constructor body or a member-initialization list like this:

 
class File
{
  private: 
    static bool locked;
  private: 
    File();
  //…
};
File::File(): locked(false) {} //error, static initialization in a member initialization list

Although compilers flag these ill-formed initializations as errors, programmers often wonder why this is an error. Bear in mind that a constructor is called as many times as the number of objects created, whereas a static data member may be initialized only once because it is shared by all the class objects. Therefore, you should initialize static members outside the class, as in this example:

 
class File
{
  private: 
    static bool locked;
  private: 
    File() { /*..*/}
  //…
};
 
File::locked = false; //correct initialization of a non-const static member

Alternatively, for const static members of an integral type, the Standard now allows in-class initialization:

 
class Screen
{
private:
  const static int pixels = 768*1024; //in-class initialization of const static integral types
public:
  Screen() {/*..*/}
//…
};

Creating Classes Dynamically

One reader posted the following question: "I have two classes that have the same member functions and data members. However, the member functions perform different operations. Why can't I do something like this:"

                     
  void* pCls;
                     
  if (cond == true)
    pCls =(Class1 *) new Class1;
  else
    pCls = (Class2 *) new Class2;
  pCls->CommonFunc(); //compiler error

On the face of it, there are many reasons why this code snippet refuses to compile (and even if it did compile, it would probably manifest undefined behavior at runtime). Notwithstanding that, dynamic creation of objects is a fundamental feature of object-oriented programming. How can you achieve the desired effect in well-formed C++?

First, the fact that the two classes have member functions with identical names but different functionality cries for inheritance and virtual member functions. This is done by deriving one class from the other or by deriving both of them from an abstract base class. Secondly, void* should be replaced with a pointer to the base class. Not only is this safer but it also eliminates to need for brute-force casts. The result should look like this:

    
class Base 
{
public: 
  virtual void CommonFunc() = 0;
};
class Class1 : public Base
{
public: 
  void CommonFunc(); //implementation 
};
class Class2 : public Base 
{
public: 
  void CommonFunc();//implementation
};
 
Base * pb;
if (cond == true)
    pb = new Class1;
else
    pCls = new Class2;
pCls->CommonFunc(); //now fine

 

No comments: