Chapter 3: A First Impression Of C++

Don't hesitate to send in feedback: send an e-mail if you like the C++ Annotations; if you think that important material was omitted; if you find errors or typos in the text or the code examples; or if you just feel like e-mailing. Send your e-mail to Frank B. Brokken.
Please state the document version you're referring to, as found in the title (in this document: 8.3.1) and please state chapter and paragraph name or number you're referring to.
All received mail is processed conscientiously, and received suggestions for improvements will usually have been processed by the time a new version of the Annotations is released. Except for the incidental case I will normally not acknowledge the receipt of suggestions for improvements. Please don't interpret this as me not appreciating your efforts.

In this chapter C++ is further explored. The possibility to declare functions in structs is illustrated in various examples; the concept of a class is introduced; casting is covered in detail; many new types are introduced and several important notational extensions to C are discussed.

3.1: Extensions to C

Before we continue with the `real' object-approach to programming, we first introduce some extensions to the C programming language: not mere differences between C and C++, but syntactic constructs and keywords not found in C.

3.1.1: Namespaces

C++ introduces the notion of a namespace: all symbols are defined in a larger context, called a namespace. Namespaces are used to avoid name conflicts that could arise when a programmer would like to define a function like sin operating on degrees, but does not want to lose the capability of using the standard sin function, operating on radians.

Namespaces are covered extensively in chapter 4. For now it should be noted that most compilers require the explicit declaration of a standard namespace: std. So, unless otherwise indicated, it is stressed that all examples in the Annotations now implicitly use the

        using namespace std;

declaration. So, if you actually intend to compile examples given in the C++ Annotations, make sure that the sources start with the above using declaration.

3.1.2: The scope resolution operator ::

C++ introduces several new operators, among which the scope resolution operator ( ::). This operator can be used in situations where a global variable exists having the same name as a local variable:

    #include <stdio.h>

    int counter = 50;                   // global variable

    int main()
    {
        for (int counter = 1;           // this refers to the
             counter < 10;              // local variable
             counter++)
        {
            printf("%d\n",
                    ::counter           // global variable
                    /                   // divided by
                    counter);           // local variable
        }
    }

In the above program the scope operator is used to address a global variable instead of the local variable having the same name. In C++ the scope operator is used extensively, but it is seldom used to reach a global variable shadowed by an identically named local variable. Its main purpose is described in chapter 7.

3.1.3: Using the keyword `const'

Even though the keyword const is part of the C grammar, its use is more important and much more common in C++ than it is in C.

The const keyword is a modifier stating that the value of a variable or of an argument may not be modified. In the following example the intent is to change the value of a variable ival, which fails:

    int main()
    {
        int const ival = 3;     // a constant int
                                // initialized to 3

        ival = 4;               // assignment produces
                                // an error message
    }

This example shows how ival may be initialized to a given value in its definition; attempts to change the value later (in an assignment) are not permitted.

Variables that are declared const can, in contrast to C, be used to specify the size of an array, as in the following example:

    int const size = 20;
    char buf[size];             // 20 chars big

Another use of the keyword const is seen in the declaration of pointers, e.g., in pointer-arguments. In the declaration

    char const *buf;

buf is a pointer variable pointing to chars. Whatever is pointed to by buf may not be changed through buf: the chars are declared as const. The pointer buf itself however may be changed. A statement like *buf = 'a'; is therefore not allowed, while ++buf is.

In the declaration

    char *const buf;

buf itself is a const pointer which may not be changed. Whatever chars are pointed to by buf may be changed at will.

Finally, the declaration

    char const *const buf;

is also possible; here, neither the pointer nor what it points to may be changed.

The rule of thumb for the placement of the keyword const is the following: whatever occurs to the left to the keyword may not be changed.

Although simple, this rule of thumb is often used. For example, Bjarne Stroustrup states (in http://www.research.att.com/~bs/bs_faq2.html#constplacement):

Should I put "const" before or after the type?
I put it before, but that's a matter of taste. "const T" and "T const" were always (both) allowed and equivalent. For example:
    const int a = 1;        // OK
    int const b = 2;        // also OK
My guess is that using the first version will confuse fewer programmers (``is more idiomatic'').

But we've already seen an example where applying this simple `before' placement rule for the keyword const produces unexpected (i.e., unwanted) results as we will shortly see below. Furthermore, the `idiomatic' before-placement also conflicts with the notion of const functions, which we will encounter in section 7.5. With const functions the keyword const is also placed behind rather than before the name of the function.

The definition or declaration (either or not containing const) should always be read from the variable or function identifier back to the type indentifier:

``Buf is a const pointer to const characters''

This rule of thumb is especially useful in cases where confusion may occur. In examples of C++ code published in other places one often encounters the reverse: const preceding what should not be altered. That this may result in sloppy code is indicated by our second example above:

    char const *buf;

What must remain constant here? According to the sloppy interpretation, the pointer cannot be altered (as const precedes the pointer). In fact, the char values are the constant entities here, as becomes clear when we try to compile the following program:

    int main()
    {
        char const *buf = "hello";

        ++buf;                  // accepted by the compiler
        *buf = 'u';             // rejected by the compiler
    }

Compilation fails on the statement *buf = 'u'; and not on the statement ++buf.

Marshall Cline's C++ FAQ gives the same rule (paragraph 18.5) , in a similar context:

[18.5] What's the difference between "const Fred* p", "Fred* const p" and "const Fred* const p"?
You have to read pointer declarations right-to-left.

Marshal Cline's advice might be improved, though: you should start to read pointer definitions (and declarations) at the variable name, reading as far as possible to the definition's end. Once you see a closing parenthesis, read backwards (right to left) from the initial point, until you find matching open-parenthesis or the very beginning of the definition. For example, consider the following complex declaration:

    char const *(* const (*ip)[])[]

Here, we see:

the variable ip, being a
(reading backwards) modifiable pointer to an
(reading forward) array of
(reading backward) constant pointers to an
(reading forward) array of
(reading backward) modifiable pointers to constant characters

3.1.4: `cout', `cin', and `cerr'

Analogous to C, C++ defines standard input- and output streams which are available when a program is executed. The streams are:

cout, analogous to stdout,
cin, analogous to stdin,
cerr, analogous to stderr.

Syntactically these streams are not used as functions: instead, data are written to streams or read from them using the operators <<, called the insertion operator and >>, called the extraction operator. This is illustrated in the next example:

    #include <iostream>

    using namespace std;

    int main()
    {
        int     ival;
        char    sval[30];

        cout << "Enter a number:\n";
        cin >> ival;
        cout << "And now a string:\n";
        cin >> sval;

        cout << "The number is: " << ival << "\n"
                "And the string is: " << sval << '\n';
    }

This program reads a number and a string from the cin stream (usually the keyboard) and prints these data to cout. With respect to streams, please note:

The standard streams are declared in the header file iostream. In the examples in the C++ Annotations this header file is often not mentioned explicitly. Nonetheless, it must be included (either directly or indirectly) when these streams are used. Comparable to the use of the using namespace std; clause, the reader is expected to #include <iostream> with all the examples in which the standard streams are used.
The streams cout, cin and cerr are variables of so-called class-types. Such variables are commonly called objects. Classes are discussed in detail in chapter 7 and are used extensively in C++.
The stream cin extracts data from a stream and copies the extracted information to variables (e.g., ival in the above example) using the extraction operator (two consecutive > characters: >>). We will describe later how operators in C++ can perform quite different actions than what they are defined to do by the language, as is the case here. Function overloading has already been mentioned. In C++ operators can also have multiple definitions, which is called operator overloading.
The operators which manipulate cin, cout and cerr (i.e., >> and <<) also manipulate variables of different types. In the above example cout << ival results in the printing of an integer value, whereas cout << "Enter a number" results in the printing of a string. The actions of the operators therefore depend on the types of supplied variables.
The extraction operator (>>) performs a so called type safe assignment to a variable by `extracting' its value from a text stream. Normally, the extraction operator skips all white space characters preceding the values to be extracted.
Special symbolic constants are used for special situations. Normally a line is terminated by inserting "\n" or '\n'. But when inserting the endl symbol the line is terminated followed by the flushing of the stream's internal buffer. Thus, endl can usually be avoided in favor of '\n' resulting in somewhat more efficient code.

The stream objects cin, cout and cerr are not part of the C++ grammar proper. The streams are part of the definitions in the header file iostream. This is comparable to functions like printf that are not part of the C grammar, but were originally written by people who considered such functions important and collected them in a run-time library.

A program may still use the old-style functions like printf and scanf rather than the new-style streams. The two styles can even be mixed. But streams offer several clear advantages and in many C++ programs have completely replaced the old-style C functions. Some advantages of using streams are:

Using insertion and extraction operators is type-safe. The format strings which are used with printf and scanf can define wrong format specifiers for their arguments, for which the compiler sometimes can't warn. In contrast, argument checking with cin, cout and cerr is performed by the compiler. Consequently it isn't possible to err by providing an int argument in places where, according to the format string, a string argument should appear. With streams there are no format strings.
The functions printf and scanf (and other functions using format strings) in fact implement a mini-language which is interpreted at run-time. In contrast, with streams the C++ compiler knows exactly which in- or output action to perform given the arguments used. No mini-language here.
In addition the possibilities of the insertion and extraction operators may be extended allowing objects of classes that didn't exist when the streams were originally designed to be inserted into or extracted from streams. Mini languages as used with printf cannot be extended.
The usage of the left-shift and right-shift operators in the context of the streams illustrates yet another capability of C++: operator overloading allowing us to redefine the actions an operator performs in certain contexts. Ascending from C operator overloading requires some getting used, but after a short little while these overloaded operators feel rather comfortable.
Streams are independent of the media they operate upon. This (at this point somewhat abstract) notion means that the same code can be used without any modification at all to interface your code to any kind of device. The code using streams can be used when the device is a file on disk; an Internet connection; a digital camera; a DVD device; a satellite link; and much more: you name it. Streams allow your code to be decoupled (independent) of the devices your code is supposed to operate on, which eases maintenance and allows reuse of the same code in new situations.

The iostream library has a lot more to offer than just cin, cout and cerr. In chapter 6 iostreams will be covered in greater detail. Even though printf and friends can still be used in C++ programs, streams have practically replaced the old-style C I/O functions like printf. If you think you still need to use printf and related functions, think again: in that case you've probably not yet completely grasped the possibilities of stream objects.

3.2: Functions as part of structs

Earlier it was mentioned that functions can be part of structs (see section 2.5.13). Such functions are called member functions. This section briefly discusses how to define such functions.

The code fragment below shows a struct having data fields for a person's name and address. A function print is included in the struct's definition:

    struct Person
    {
        char name[80];
        char address[80];

        void print();
    };

When defining the member function print the structure's name (Person) and the scope resolution operator (::) are used:

    void Person::print()
    {
        cout << "Name:      " << name << "\n"
                "Address:   " << address << '\n';
    }

The implementation of Person::print shows how the fields of the struct can be accessed without using the structure's type name. Here the function Person::print prints a variable name. Since Person::print is itself a part of struct person, the variable name implicitly refers to the same type.

This struct Person could be used as follows:

    Person person;

    strcpy(person.name, "Karel");
    strcpy(p.address, "Marskramerstraat 33");
    p.print();

The advantage of member functions is that the called function automatically accesses the data fields of the structure for which it was invoked. In the statement person.print() the object person is the `substrate': the variables name and address that are used in the code of print refer to the data stored in the person object.

3.2.1: Data hiding: public, private and class

As mentioned before (see section 2.3), C++ contains specialized syntactic possibilities to implement data hiding. Data hiding is the capability of sections of a program to hide its data from other sections. This results in very clean data definitions. It also allows these sections to enforce the integrity of their data.

C++ has three keywords that are related to data hiding: private, protected and public. These keywords can be used in the definition of structs. The keyword public allows all subsequent fields of a structure to be accessed by all code; the keyword private only allows code that is part of the struct itself to access subsequent fields. The keyword protected is discussed in chapter 13, and is somewhat outside of the scope of the current discussion.

In a struct all fields are public, unless explicitly stated otherwise. Using this knowledge we can expand the struct Person:

    struct Person
    {
        private:
            char d_name[80];
            char d_address[80];
        public:
            void setName(char const *n);
            void setAddress(char const *a);
            void print();
            char const *name();
            char const *address();
    };

As the data fields d_name and d_address are in a private section they are only accessible to the member functions which are defined in the struct: these are the functions setName, setAddress etc.. As an illustration consider the following code:

    Person fbb;

    fbb.setName("Frank");         // OK, setName is public
    strcpy(fbb.d_name, "Knarf");  // error, x.d_name is private

Data integrity is implemented as follows: the actual data of a struct Person are mentioned in the structure definition. The data are accessed by the outside world using special functions that are also part of the definition. These member functions control all traffic between the data fields and other parts of the program and are therefore also called `interface' functions. The thus implemented data hiding is illustrated in Figure 2.

Figure 2: Private data and public interface functions of the class Person.

The members setName and setAddress are declared with char const * parameters. This indicates that the functions will not alter the strings which are supplied as their arguments. Analogously, the members name and address return char const *s: the compiler will prevent callers of those members from modifying the information made accessible through the return values of those members.

Two examples of member functions of the struct Person are shown below:

    void Person::setName(char const *n)
    {
        strncpy(d_name, n, 79);
        d_name[79] = 0;
    }

    char const *Person::name()
    {
        return d_name;
    }

The power of member functions and of the concept of data hiding results from the abilities of member functions to perform special tasks, e.g., checking the validity of the data. In the above example setName copies only up to 79 characters from its argument to the data member name, thereby avoiding a buffer overflow.

Another illustration of the concept of data hiding is the following. As an alternative to member functions that keep their data in memory a library could be developed featuring member functions storing data on file. To convert a program which stores Person structures in memory to one that stores the data on disk no special modifications would be required. After recompilation and linking the program to a new library it will have converted from storage in memory to storage on disk. This example illustrates a broader concept than data hiding; it illustrates encapsulation. Data hiding is a kind of encapsulation. Encapsulation in general results in reduced coupling of different sections of a program. This in turn greatly enhances reusability and maintainability of the resulting software. By having the structure encapsulate the actual storage medium the program using the structure becomes independent of the actual storage medium that is used.

Though data hiding can be implemented using structs, more often (almost always) classes are used instead. A class is a kind of struct, except that a class uses private access by default, whereas structs use public access by default. The definition of a class Person is therefore identical to the one shown above, except for the fact that the keyword class has replaced struct while the initial private: clause can be omitted. Our typographic suggestion for class names (and other type names defined by the programmer) is to start with a capital character to be followed by the remainder of the type name using lower case letters (e.g., Person).

3.2.2: Structs in C vs. structs in C++

In this section we'll discuss an important difference between C and C++ structs and (member) functions. In C it is common to define several functions to process a struct, which then require a pointer to the struct as one of their arguments. An imaginary C header file showing this concept is:

    /* definition of a struct PERSON    This is C   */
    typedef struct
    {
        char name[80];
        char address[80];
    } PERSON;

    /* some functions to manipulate PERSON structs */

    /* initialize fields with a name and address    */
    void initialize(PERSON *p, char const *nm,
                       char const *adr);

    /* print information    */
    void print(PERSON const *p);

    /* etc..    */

In C++, the declarations of the involved functions are put inside the definition of the struct or class. The argument denoting which struct is involved is no longer needed.

    class Person
    {
        char d_name[80];
        char d_address[80];

        public:
            void initialize(char const *nm, char const *adr);
            void print();
            // etc..
    };

In C++ the struct parameter is not used. A C function call such as:

    PERSON x;

    initialize(&x, "some name", "some address");

becomes in C++:

    Person x;

    x.initialize("some name", "some address");

3.3: More extensions to C

3.3.1: References

In addition to the common ways to define variables (plain variables or pointers) C++ introduces references defining synonyms for variables. A reference to a variable is like an alias; the variable and the reference can both be used in statements involving the variable:

    int int_value;
    int &ref = int_value;

In the above example a variable int_value is defined. Subsequently a reference ref is defined, which (due to its initialization) refers to the same memory location as int_value. In the definition of ref, the reference operator & indicates that ref is not itself an int but a reference to one. The two statements

    ++int_value;
    ++ref;

have the same effect: they increment int_value's value. Whether that location is called int_value or ref does not matter.

References serve an important function in C++ as a means to pass modifiable arguments to functions. E.g., in standard C, a function that increases the value of its argument by five and returning nothing needs a pointer parameter:

    void increase(int *valp)    // expects a pointer
    {                           // to an int
        *valp += 5;
    }

    int main()
    {
        int x;

        increase(&x);           // pass x's address
    }

This construction can also be used in C++ but the same effect is also achieved using a reference:

    void increase(int &valr)    // expects a reference
    {                           // to an int
        valr += 5;
    }

    int main()
    {
        int x;

        increase(x);            // passed as reference
    }

It is arguable whether code such as the above should be preferred over C's method, though. The statement increase (x) suggests that not x itself but a copy is passed. Yet the value of x changes because of the way increase() is defined. However, references can also be used to pass objects that are only inspected (without the need for a copy or a const *) or to pass objects whose modification is an accepted side-effect of their use. In those cases using references are strongly preferred over existing alternatives like copy by value or passing pointers.

Behind the scenes references are implemented using pointers. So, as far as the compiler is concerned references in C++ are just const pointers. With references, however, the programmer does not need to know or to bother about levels of indirection. An important distinction between plain pointers and references is of course that with references no indirection takes place. For example:

    extern int *ip;
    extern int &ir;

    ip = 0;     // reassigns ip, now a 0-pointer
    ir = 0;     // ir unchanged, the int variable it refers to
                // is now 0.

In order to prevent confusion, we suggest to adhere to the following:

In those situations where a function does not alter its parameters of a built-in or pointer type, value parameters can be used:

        void some_func(int val)
        {
            cout << val << '\n';
        }

        int main()
        {
            int x;

            some_func(x);       // a copy is passed
        }

When a function explicitly must change the values of its arguments, a pointer parameter is preferred. These pointer parameters should preferably be the function's initial parameters. This is called return by argument.
```
        void by_pointer(int *valp)
        {
            *valp += 5;
        }
```

When a function doesn't change the value of its class- or struct-type arguments, or if the modification of the argument is a trivial side-effect (e.g., the argument is a stream) references can be used. Const-references should be used if the function does not modify the argument:

        void by_reference(string const &str)
        {
            cout << str;    // no modification of str
        }

        int main ()
        {
            int x = 7;
            by_pointer(&x);         // a pointer is passed
                                    // x might be changed
            string str("hello");
            by_reference(str);      // str is not altered
        }

References play an important role in cases where the argument is not changed by the function but where it is undesirable to copy the argument to initialize the parameter. Such a situation occurs when a large object is passed as argument, or is returned by the function. In these cases the copying operation tends to become a significant factor, as the entire object must be copied. In these cases references are preferred.

If the argument isn't modified by the function, or if the caller shouldn't modify the returned information, the const keyword should be used. Consider the following example:

    struct Person                   // some large structure
    {
        char    name[80];
        char    address[90];
        double  salary;
    };

    Person person[50];          // database of persons

                                // printperson expects a
                                // reference to a structure
                                // but won't change it
    void printperson (Person const &p)
    {
        cout << "Name: " << p.name << '\n' <<
                "Address: " << p.address << '\n';

    }
                                // get a person by indexvalue
    Person const &person(int index)
    {
        return person[index];   // a reference is returned,
    }                           // not a copy of person[index]

    int main()
    {
        Person boss;

        printperson (boss);     // no pointer is passed,
                                // so variable won't be
                                // altered by the function
        printperson(person(5));
                                // references, not copies
                                // are passed here
    }

Furthermore, note that there is yet another reason for using references when passing objects as function arguments. When passing a reference to an object, the activation of a so called copy constructor is avoided. Copy constructors will be covered in chapter 8.

References could result in extremely `ugly' code. A function may return a reference to a variable, as in the following example:

    int &func()
    {
        static int value;
        return value;
    }

This allows the use of the following constructions:

    func() = 20;
    func() += func();

It is probably superfluous to note that such constructions should normally not be used. Nonetheless, there are situations where it is useful to return a reference. We have actually already seen an example of this phenomenon in our previous discussion of streams. In a statement like cout << "Hello" << '\n'; the insertion operator returns a reference to cout. So, in this statement first the "Hello" is inserted into cout, producing a reference to cout. Through this reference the '\n' is then inserted in the cout object, again producing a reference to cout, which is then ignored.

Several differences between pointers and references are pointed out in the next list below:

A reference cannot exist by itself, i.e., without something to refer to. A declaration of a reference like int &ref; is not allowed; what would ref refer to?
References can be declared as external. These references were initialized elsewhere.
References may exist as parameters of functions: they are initialized when the function is called.
References may be used in the return types of functions. In those cases the function determines what the return value will refer to.
References may be used as data members of classes. We will return to this usage later.
Pointers are variables by themselves. They point at something concrete or just ``at nothing''.
References are aliases for other variables and cannot be re-aliased to another variable. Once a reference is defined, it refers to its particular variable.
Pointers (except for const pointers) can be reassigned to point to different variables.
When an address-of operator & is used with a reference, the expression yields the address of the variable to which the reference applies. In contrast, ordinary pointers are variables themselves, so the address of a pointer variable has nothing to do with the address of the variable pointed to.

3.3.2: Rvalue References (C++0x)

In C++, temporary (rvalue) values are indistinguishable from const & types. the C++0x standard adds a new reference type called an rvalue reference, defined as typename &&.

The name rvalue reference is derived from assignment statements, where the variable to the left of the assignment operator is called an lvalue and the expression to the right of the assignment operator is called an rvalue. Rvalues are often temporary (or anonymous) values, like values returned by functions.

In this parlance the C++ reference should be considered an lvalue reference (using the notation typename &). They can be contrasted to rvalue references (using the notation typename &&).

The key to understanding rvalue references is anonymous variable. An anonymous variable has no name and this is the distinguishing feature for the compiler to associate it automatically with an lvalue reference if it has a choice. Before introducing some interesting and new constructions that weren't available before C++0x let's first have a look at some distinguishing applications of lvalue references. The following function returns a temporary (anonymous) value:

    int intVal()
    {
        return 5;
    }

Although the return value of intVal can be assigned to an int variable it requires a copying operation, which might become prohibitive when a function does not return an int but instead some large object. A reference or pointer cannot be used either to collect the anonymous return value as the return value won't survive beyond that. So the following is illegal (as noted by the compiler):

    int &ir = intVal();         // fails: refers to a temporary
    int const &ic = intVal();   // OK: immutable temporary
    int *ip = &intVal();        // fails: no lvalue available

Apparently it is not possible to modify the temporary returned by intVal. But now consider the next function:

    void receive(int &value)
    {
        cout << "int value parameter\n";
    }
    void receive(int &&value)
    {
        cout << "int R-value parameter\n";
    }

and let's call this function from main:

    int main()
    {
        receive(18);
        int value = 5;
        receive(value);
        receive(intVal());
    }

This program produces the following output:

    int R-value parameter
    int value parameter
    int R-value parameter

It shows the compiler selecting receive(int &&value) in all cases where it receives an anonymous int as its argument. Note that this includes receive(18): a value 18 has no name and thus

receive(int
&&value)

is called. Internally, it actually uses a temporary variable to store the 18, as is shown by the following example which modifies receive:

    void receive(int &&value)
    {
        ++value;
        cout << "int R-value parameter, now: " << value << '\n';
            // displays 19 and 6, respectively.
    }

Contrasting receive(int &value) with receive(int &&value) has nothing to do with int &value not being a const reference. If receive(int const &value) is used the same results are obtained. Bottom line: the compiler selects the overloaded function using the rvalue reference if the function is passed an anonymous value.

The compiler runs into problems if void receive(int &value) is replaced by void receive(int value), though. When confronted with the choice between a value parameter and a reference parameter (either lvalue or rvalue) it cannot make a decision and reports an ambiguity. In practical contexts this is not a problem. Rvalue refences were added to the language in order to be able to distinguish the two forms of references: named values (for which lvalue references are used) and anonymous values (for which rvalue references are used).

It is this distinction that allows the implementation of move semantics and perfect forwarding. At this point the concept of move semantics cannot yet fully be discussed (but see section 8.6 for a more thorough discussusion) but it is very well possible to illustrate the underlying ideas.

Consider the situation where a function returns a struct Data containing a pointer to dynamically allocated characters. Moreover, the struct defines a member function copy(Data const &other) that takes another Data object and copies the other's data into the current object. The (partial) definition of the struct Data might look like this (To the observant reader: in this example the memory leak that results from using Data::copy() should be ignored):

    struct Data
    {
        char *text;
        size_t size;
        void copy(Data const &other)
        {
            text = strdup(other.text);
            size = strlen(text);
        }
    };

Next, functions dataFactory and main are defined as follows:

    Data dataFactory(char const *txt)
    {
        Data ret = {strdup(txt), strlen(txt)};
        return ret;
    }

    int main()
    {
        Data d1 = {strdup("hello"), strlen("hello")};

        Data d2;
        d2.copy(d1);                        // 1 (see text)

        Data d3;
        d3.copy(dataFactory("hello"));      // 2
    }

At (1) d2 appropriately receives a copy of d1's text. But at (2) d3 receives a copy of the text stored in the temporary returned by the dataFactory function. As the temporary ceases to exist after the call to copy() two releated and unpleasant consequences are observed:

The return value is a temporary object: its only reason for existence is to pass its data on to d3. Now d3 copies the temporary's data which clearly is somewhat overdone.
The temporary Data object is lost following the call to copy(). Unfortunately its dynamically allocated data is lost as well resulting in a memory leak.

In cases like these rvalue reference should be used. By overloading the copy member with a member copy(Data &&other) the compiler is able to distinguish situations (1) and (2). It now calls the initial copy() member in situation (1) and the newly defined overloaded copy() member in situation (2):

    struct Data
    {
        char *text;
        size_t size;
        void copy(Data const &other)
        {
            text = strdup(other.text);
        }
        void copy(Data &&other)
        {
            text = other.text;
            other.text = 0;
        }
    };

Note that the overloaded copy() function merely moves the other.text pointer to the current object's text pointer followed by reassigning 0 to other.text. Struct Data suddenly has become move-aware and implements move semantics, removing the drawbacks of the previously shown approach:

Instead of making a deep copy (which is required in situation (1)), the pointer value is simply moved to its new owner;
Since the other.text doesn't point to dynamically allocated memory anymore the memory leak is prevented.

Rvalue references for *this and initialization of class objects by rvalues are not yet supported by the g++ compiler.

3.3.3: Raw String Literals (C++0x, 4.5)

Standard ASCII-C strings are delimited by double quotes, supporting escape sequences like \n, \\ and \". In some cases it is useful to avoid escaping strings (e.g., in the context of XML). To this end, the C++0x standard offers raw string literals.

Raw string literals start with an R, followed by a double quote, followed by a label (which is an arbitrary sequence of characters not equal to [), followed by [. The raw string ends at the closing bracket ], followed by the label which is in turn followed by a double quote. Example:

    R"[A Raw \ "String"]"
    R"delimiter[Another \ Raw "[String]]delimiter"

In the first case, everything between "[ and ]" is part of the string. Escape sequences aren't supported so \ " defines three characters: a backslash, a blank character and a double quote. The second example shows a raw string defined between the markers "delimiter[ and ]delimiter".

3.3.4: Strongly typed enumerations (C++0x)

Enumeration values in C++ are in fact int values, thereby bypassing type safety. E.g., values of different enumeration types may be compared for (in)equality, albeit through a (static) type cast.

Another problem with the current enum type is that their values are not restricted to the enum type name itself, but to the scope where the enumeration is defined. As a consequence, two enumerations having the same scope cannot have identical values.

In the C++0x standard these problems are solved by defining enum classes. An enum class can be defined as in the following example:

    enum class SafeEnum
    {
        NOT_OK,     // 0, by implication
        OK          = 10,
        MAYBE_OK    // 11, by implication
    };

Enum classes use int values by default, but the used value type can easily be changed using the : type notation, as in:

    enum class CharEnum: unsigned char
    {
        NOT_OK,
        OK
    };

To use a value defined in an enum class its enumeration name must be provided as well. E.g., OK is not defined, CharEnum::OK is.

Using the data type specification (noting that it defaults to int) it is possible to use enum class forward declarations. E.g.,

    enum Enum1;                 // Illegal: no size available
    enum Enum2: unsigned int;   // Legal in C++0x: explicitly declared type

    enum class Enum3;           // Legal in C++0x: default int type is used
    enum class Enum4: char;     // Legal in C++0x: explicitly declared type

3.3.5: Initializer lists (C++0x)

The C language defines the initializer list as a list of values enclosed by curly braces, possibly themselves containing initializer lists. In C these initializer lists are commonly used to initialize arrays and structs.

C++ extends this concept in the C++0x standard by introducing the type initializer_list<Type> where Type is replaced by the type name of the values used in the initializer list. Initializer lists in C++ are, like their counterparts in C, recursive, so they can also be used with multi-dimensional arrays, structs and classes.

Like in C, initializer lists consist of a list of values surrounded by curly braces. But unlike C, functions can define initializer list parameters. E.g.,

    void values(std::initializer_list<int> iniValues)
    {
    }

A function like arrayValues could be called as follows:

    values({2, 3, 5, 7, 11, 13});

The initializer list appears as an argument which is a list of values surrounded by curly braces. Due to the recursive nature of initializer lists a two-dimensional series of values can also be passes, as shown in the next example:

    void values2(std::initializer_list<int> iniValues)
    {
    }

    values2({{1, 2}, {2, 3}, {3, 5}, {4, 7}, {5, 11}, {6, 13}});

Initializer lists are constant expressions and cannot be modified. However, their size and values may be retrieved using their size, begin, and end members as follows:

    void values(initializer_list<int> iniValues)
    {
        cout << "Initializer list having " << iniValues.size() << "values\n";
        for
        (
            initializer_list<int>::const_iterator begin = iniValues.begin();
                begin != iniValues.end();
                    ++begin
        )
            cout << "Value: " << *begin << '\n';
    }

Initializer lists can also be used to initialize objects of classes (cf. section 7.3).

3.3.6: Type inference using `auto' (C++0x)

A special use of the keyword auto is defined by the C++0x standard allowing the compiler to determine the type of a variable automatically rather than requiring the software engineer to define a variable's type explicitly.

In parallel, the use of auto as a storage class specifier is no longer supported in the C++0x standard. According to that standard a variable definition like auto int var results in a compilation error.

This can be very useful in situations where it is very hard to determine the variable's type in advance. These situations occur, e.g., in the context of templates, topics covered in chapters 18 until 22.

At this point in the Annotations only simple examples can be given, and some hints will be provided about more general uses of the auto keyword.

When defining and initializing a variable int variable = 5 the type of the initializing expression is well known: it's an int, and unless the programmer's intentions are different this could be used to define variable's type (although it shouldn't in normal circumstances as it reduces rather than improves the clarity of the code):

    auto variable = 5;

Here are some examples where using auto is useful. In chapter 5 the iterator concept is introduced (see also chapters 12 and 18). Iterators sometimes have long type definitions, like

    std::vector<std::string>::const_reverse_iterator

Functions may return types like this. Since the compiler knows the types returned by functions we may exploit this knowledge using auto. Assuming that a function begin() is declared as follows:

    std::vector<std::string>::const_reverse_iterator begin();

Rather than writing the verbose variable definition (at

//
1

) a much shorter definition (at // 2) may be used:

    std::vector<std::string>::const_reverse_iterator iter = begin();    // 1
    auto iter = begin();                                                // 2

It's easy to define additional variables of this type. When initializing those variables using iter the auto keyword can be used again:

    auto start = iter;

If start can't be initialized immediately using an existing variable the type of a well known variable of function can be used in combination with the decltype keyword, as in:

    decltype(iter) start;
    decltype(begin()) spare;

The keyword decltype may also receive an expression as its argument. This feature is already available in the C++0x standard implementation in g++ 4.3. E.g., decltype(3 + 5) represents an int, decltype(3 / double(3)) represents double.

The auto keyword can also be used to postpone the definition of a function's return type. The declaration of a function intArrPtr returning a pointer to an array of 10 ints looks like this:

    int (*intArrPtr())[10];

Such a declaration is fairly complex. E.g., among other complexities it requires `protection of the pointer' using parentheses in combination with the function's parameter list. In situations like these the specification of the return type can be postponed using the auto return type, followed by the specification of the function's return type after any other specification the function might receive (e.g., as a const member (cf. section 7.5) or following its exception throw list (cf. section 9.6)).

Using auto to declare the above function, the declaration becomes:

    auto intArrPtr() -> int (*)[10];

A return type specification using auto is called a late-specified return type.

The auto keyword can also be used to defined types that are related to the actual auto associated type. Here are some examples:

    vector<int> vi;
    auto iter = vi.begin();     // standard: auto is vector<int>::iterator
    auto &&rref = vi.begin();   // auto is rvalue ref. to the iterator type
    auto *ptr = &iter;          // auto is pointer to the iterator type
    auto *ptr = &rref;          // same

3.3.7: Range-based for-loops (C++0x, ?)

The C++ for-statement is identical to C's for-statement:

    for (init; cond; inc)
        statement

In many cases, however, the initialization, condition and increment parts are fairly obvious as in situations where all elements of an array or vector must be processed. Many languages offer the foreach statement for that and C++ offers the std::for_each generic algorithm (cf. section 19.1.17).

The C++0x standard adds a new for statement syntax to this. The new syntax can be used to process each element of a range. Three types of ranges are distinguished:

Plain arrays (e.g., int array[10]);
Standard containers (or comparable) (cf. chapter 12);
Ranges contained in a std::pair (cf. section 12.2, e.g., std::pair subRange(array + 1, array + 8)).

In these cases the C++0x standard offers the following additional for-statement syntax:

    // assume int array[30]
    for (int &element: array)
        statement

here an int &element is defined whose lifetime and scope is restricted to the lifetime of the for-statement. It refers to each of the subsequent elements of array at each new iteration of the for-statement, starting with the first element of the range.

3.4: New language-defined data types

In C the following built-in data types are available:

void, char,
short, int, long, float

and double. C++ extends these built-in types with several additional built-in types: the types bool, wchar_t, long long and long double (Cf. ANSI/ISO draft (1995), par. 27.6.2.4.1 for examples of these very long types). The type long long is merely a double-long long datatype. The type long double is merely a double-long double datatype. These built-in types as well as pointer variables are called primitive types in the C++ Annotations.

There is a subtle issue to be aware of when converting applications developed for 32-bit architectures to 64-bit architectures. When converting 32-bit programs to 64-bit programs, only long types and pointer types change in size from 32 bits to 64 bits; integers of type int remain at their size of 32 bits. This may cause data truncation when assigning pointer or long types to int types. Also, problems with sign extension can occur when assigning expressions using types shorter than the size of an int to an unsigned long or to a pointer. More information about this issue can be found here.

Except for these built-in types the class-type string is available for handling character strings. The datatypes bool, and wchar_t are covered in the following sections, the datatype string is covered in chapter 5. Note that recent versions of C may also have adopted some of these newer data types (notably bool and wchar_t). Traditionally, however, C doesn't support them, hence they are mentioned here.

Now that these new types are introduced, let's refresh your memory about letters that can be used in literal constants of various types. They are:

b or B: in addition to its use to indicate a hexadecimal value, it can also be used to define a binary constant. E.g., 0b101 equals the decimal value 5.
E or e: the exponentiation character in floating point literal values. For example: 1.23E+3. Here, E should be pronounced (and interpreted) as: times 10 to the power. Therefore, 1.23E+3 represents the value 1230.
F can be used as postfix to a non-integral numeric constant to indicate a value of type float, rather than double, which is the default. For example: 12.F (the dot transforms 12 into a floating point value); 1.23E+3F (see the previous example. 1.23E+3 is a double value, whereas 1.23E+3F is a float value).
L can be used as prefix to indicate a character string whose elements are wchar_t-type characters. For example: L"hello world".
L can be used as postfix to an integral value to indicate a value of type long, rather than int, which is the default. Note that there is no letter indicating a short type. For that a static_cast<short>() must be used.
p, to specify the power in hexadecimal floating point numbers. E.g. 0x10p4. The exponent itself is read as a decimal constant and can therefore not start with 0x. The exponent part is interpreted as a power of 2. So 0x10p2 is (decimal) equal to 64: 16 * 2^2.
U can be used as postfix to an integral value to indicate an unsigned value, rather than an int. It may also be combined with the postfix L to produce an unsigned long int value.

And, of course: the x and a until f characters can be used to specify hexadecimal constants (optionally using capital letters).

3.4.1: The data type `bool'

The type bool represents boolean (logical) values, for which the (now reserved) constants true and false may be used. Except for these reserved values, integral values may also be assigned to variables of type bool, which are then implicitly converted to true and false according to the following conversion rules (assume intValue is an int-variable, and boolValue is a bool-variable):

        // from int to bool:
    boolValue = intValue ? true : false;

        // from bool to int:
    intValue = boolValue ? 1 : 0;

Furthermore, when bool values are inserted into streams then true is represented by 1, and false is represented by 0. Consider the following example:

    cout << "A true value: "  << true << "\n"
            "A false value: " << false << '\n';

The bool data type is found in other programming languages as well. Pascal has its type Boolean; Java has a boolean type. Different from these languages, C++'s type bool acts like a kind of int type. It is primarily a documentation-improving type, having just two values true and false. Actually, these values can be interpreted as enum values for 1 and 0. Doing so would ignore the philosophy behind the bool data type, but nevertheless: assigning true to an int variable neither produces warnings nor errors.

Using the bool-type is usually clearer than using int. Consider the following prototypes:

    bool exists(char const *fileName);  // (1)
    int  exists(char const *fileName);  // (2)

For the first prototype, readers will expect the function to return true if the given filename is the name of an existing file. However, using the second prototype some ambiguity arises: intuitively the return value 1 is appealing, as it allows constructions like

    if (exists("myfile"))
        cout << "myfile exists";

On the other hand, many system functions (like access, stat, and many other) return 0 to indicate a successful operation, reserving other values to indicate various types of errors.

As a rule of thumb I suggest the following: if a function should inform its caller about the success or failure of its task, let the function return a bool value. If the function should return success or various types of errors, let the function return enum values, documenting the situation by its various symbolic constants. Only when the function returns a conceptually meaningful integral value (like the sum of two int values), let the function return an int value.

3.4.2: The data type `wchar_t'

The wchar_t type is an extension of the char built-in type, to accomodate wide character values (but see also the next section). The g++ compiler reports sizeof(wchar_t) as 4, which easily accomodates all 65,536 different Unicode character values.

Note that Java's char data type is somewhat comparable to C++'s wchar_t type. Java's char type is 2 bytes wide, though. On the other hand, Java's byte data type is comparable to C++'s char type: one byte. Confusing?

3.4.3: Unicode encoding (C++0x)

In C++ string literals can be defined as ASCII-Z C strings. Prepending an ASCII-Z string by L (e.g., L"hello") defines a wchar_t string literal.

The new C++0x standard adds to this support for 8, 16 and 32 bit Unicode encoded strings. Furthermore, two new data types are introduced: char16_t and char32_t storing, respectively, a UTF-16 and UTF-32 unicode value.

In addition, char will be large enough to contain any UTF-8 unicode value as well (i.e., it will remain an 8-bit value).

String literals for the various types of unicode encodings (and associated variables) can be defined as follows:

    char     utf_8[] = u8"This is UTF-8 encoded.";
    char16_t utf16[] = u"This is UTF-16 encoded.";
    char32_t utf32[] = U"This is UTF-32 encoded.";

Alternatively, unicode constants may be defined using the \u escape sequence, followed by a hexadecimal value. Depending on the type of the unicode variable (or constant) a UTF-8, UTF-16 or UTF-32 value is used. E.g.,

    char     utf_8[] = u8"\u2018";
    char16_t utf16[] = u"\u2018";
    char32_t utf32[] = U"\u2018";

Unicode strings can be delimited by double quotes but raw string literals can also be used.

3.4.4: The data type `long long int' (C++0x)

The C++0x standard adds the type long long int to the set of standard types. On 32 bit systems it will have at least 64 usable bits. Some compilers already supported long long int as an extension, but C++0x officially adds it to C++.

3.4.5: The data type `size_t'

The size_t type is not really a built-in primitive data type, but a data type that is promoted by POSIX as a typename to be used for non-negative integral values answering questions like `how much' and `how many', in which case it should be used instead of unsigned int. It is not a specific C++ type, but also available in, e.g., C. Usually it is defined implictly when a (any) system header file is included. The header file `officially' defining size_t in the context of C++ is cstddef.

Using size_t has the advantage of being a conceptual type, rather than a standard type that is then modified by a modifier. Thus, it improves the self-documenting value of source code.

Sometimes functions explictly require unsigned int to be used. E.g., on amd-architectures the X-windows function XQueryPointer explicitly requires a pointer to an unsigned int variable as one of its arguments. In such situations a pointer to a size_t variable can't be used, but the address of an unsigned int must be provided. Such situations are exceptional, though.

Other useful bit-represented types also exists. E.g., uint32_t is guaranteed to hold 32-bits unsigned values. Analogously, int32_t holds 32-bits signed values. Corresponding types exist for 8, 16 and 64 bits values. These types are defined in the header file cstdint.

3.5: A new syntax for casts

Traditionally, C offers the following cast syntax:

        (typename)expression

here typename is the name of a valid type, and expression is an expression.

C style casts are now deprecated. Although C++ offers function call notations using the following syntax:

    typename(expression)

the function call notation does in fact not represents a cast, but a request to the compiler to construct an (anonymous) variable having type typename from expression. Although this form is very often used in C++, it should not be used for casting. Instead, there are now four new-style casts available, that are introduced in the following sections.

The C++0x standard defines the shared_ptr type (cf. section 18.4). To cast shared pointers specialized casts should be used. These are discussed in section 18.4.5.

3.5.1: The `static_cast'-operator

The cast converting conceptually comparable types to each other is:

        static_cast<type>(expression)

This type of cast is used to convert, e.g., a double to an int: both are numbers, but as the int has no fractions precision is potentially reduced. But the converse also holds true. When the quotient of two int values must be assigned to a double the fraction part of the division will get lost unless a cast is used.

Here is an example of such a cast is (assuming quotient is of type double and lhs and rhs are int-typed variables):

        quotient = static_cast<double>(lhs) / rhs;

If the cast is omitted, the division operator will ignore the remainder as its operands are int expressions. Note that the division should be put outside of the cast expression. If the division is put inside (as in static_cast<double>(lhs / rhs)) an integer division will have been performed before the cast has had a chance to convert the type of an operand to double.

Another nice example of code in which it is a good idea to use the static_cast<>()-operator is in situations where the arithmetic assignment operators are used in mixed-typed expressions. Consider the following expression (assume doubleVar is a variable of type double):

        intVar += doubleVar;

This statement actually evaluates to:

        intVar = static_cast<int>(static_cast<double>(intVar) +
doubleVar);

Here IntVar is first promoted to a double, and is then added as a double value to doubleVar. Next, the sum is cast back to an int. These two casts are a bit overdone. The same result is obtained by explicitly casting doubleVar to an int, thus obtaining an int-value for the right-hand side of the expression:

        intVar += static_cast<int>(doubleVar);

A static_cast can also be used to undo or introduce the signed-modifier of an int-typed variable. The C function tolower requires an int representing the value of an unsigned char. But char by default is a signed type. To call tolower using an available char ch we should use:

        tolower(static_cast<unsigned char>(ch))

Casts like these provide information to the compiler about how to handle the provided data. Very often (especially with data types differing only in size but not in representation) the cast won't require any additional code. Additional code will be required, however, to convert one representation to another, e.g., when converting double to int.

3.5.2: The `const_cast'-operator

The const keyword has been given a special place in casting. Normally anything const is const for a good reason. Nonetheless situations may be encountered where the const can be ignored. For these special situations the const_cast should be used. Its syntax is:

        const_cast<type>(expression)

A const_cast<type>(expression) expression is used to undo the const attribute of a (pointer) type.

The need for a const_cast may occur in combination with functions from the standard C library which traditionally weren't always as const-aware as they should. A function strfun(char *s) might be available, performing some operation on its char *s parameter without actually modifying the characters pointed to by s. Passing char const hello[] = "hello"; to strfun will produce the warning

        passing `const char *' as argument 1 of `fun(char *)' discards const

A const_cast is the appropriate way to prevent the warning:

        strfun(const_cast<char *>(hello));

3.5.3: The `reinterpret_cast'-operator

The third new-style cast is used to change the interpretation of information: the reinterpret_cast. It is somewhat reminiscent of the static_cast, but reinterpret_cast should be used when it is known that the information as defined in fact is or can be interpreted as something completely different. Its syntax is:

        reinterpret_cast<pointer type>(pointer expression)

A reinterpret_cast<type>(expression) operator is appropriately used to reinterpret a void * to a pointer of a well-known type. Void pointers are encountered with functions from the C library like qsort. The qsort function expects a pointer to a (comparison) function having two void const * parameters. In fact, the void const *s point to data elements of the array to sort, and so the comparison function may cast the void const * parameters to pointers to the elements of the array to be sorted. E.g., if the array is an int array[] and the compare function's parameters are void const *p1, void const *p2 then the compare function may obtain the address of the int pointed to by p1 by using:

        reinterpret_cast<int const *>(p1)

Another example of a reinterpret_cast is found in combination with the write functions that are available for files and streams. In C++ streams are the preferred interface to, e.g., files. Output streams (like cout) offer write members having the prototype

        write(char const *buffer, int length)

To write a double to a stream using write a reinterpret_cast is needed as well. E.g., to write the raw bytes of a variable double value to cout we would use:

    cout.write(reinterpret_cast<char const *>(&value), sizeof(double));

All casts are potentially dangerous, but the reinterpret_cast is the most dangerous of all casts. Effectively we tell the compiler: back off, we know what we're doing, so stop fuzzing. All bets are off, and we'd better do know what we're doing in situations like these. As a case in point consider the following code:

    int value = 0x12345678;     // assume a 32-bits int

    cout << "Value's first byte has value: " << hex <<
            static_cast<int>(
                *reinterpret_cast<unsigned char *>(&value)
                            );

The above code will show different results on little and big endian computers. Little endian computers will show the value 78, big endian computers the value 12. Also note that the different representations used by little and big endian computers renders the previous example (cout.write(...)) non-portable over computers of different architectures.

As a rule of thumb: if circumstances arise in which casts have to be used, clearly document the reasons for their use in your code, making double sure that the cast will not eventually cause a program to misbehave.

3.5.4: The `dynamic_cast'-operator

Finally there is a new style cast that is used in combination with polymorphism (see chapter 14). Its syntax is:

        dynamic_cast<type>(expression)

It is used at run-time to convert, a pointer to an object of a class to a pointer to an object of a class that is found further down its so-called class hierarchy (which is also called a downcast). At this point in the Annotations a dynamic_cast cannot yet be discussed extensively, but we will return to this topic in section 14.5.1.

3.6: Keywords and reserved names in C++

C++'s keywords are a superset of C's keywords. Here is a list of all keywords of the language:

alignof compl        explicit namespace        return      typeid
and     concept      export   new              short       typename
and_eq  const        extern   not              signed      union
asm     const_cast   false    not_eq           sizeof      unsigned
auto    constexpr    float    nullptr          static      using
axiom   continue     for      operator         static_cast virtual
bitand  decltype     friend   or               struct      void
bitor   default      goto     or_eq            switch      volatile
bool    delete       if       private          template    wchar_t
break   do           import   protected        this        while
case    double       inline   public           throw       xor
catch   dynamic_cast int      register         true        xor_eq
char    else         long     reinterpret_cast try
class   enum         mutable  requires         typedef

Notes:

The export keyword is removed from the language under the C++0x standard, but remains a keyword, reserved for future use.
The nullptr keyword is defined in the C++0x standard (not yet supported by the g++ compiler).
the operator keywords: and, and_eq, bitand, bitor, compl, not, not_eq, or, or_eq, xor and xor_eq are symbolic alternatives for, respectively, &&, &=, &, |, ~, !, !=, ||, |=, ^ and ^=.

Keywords can only be used for their intended purpose and cannot be used as names for other entities (e.g., variables, functions, class-names, etc.). In addition to keywords identifiers starting with an underscore and living in the global namespace (i.e., not using any explicit namespace or using the mere :: namespace specification) or living in the std namespace are reserved identifiers in the sense that their use is a prerogative of the implementor.