Don't hesitate to send in feedback: send an e-mail if you like the C++ Annotations; if you think that important material was omitted; if you find errors or typos in the text or the code examples; or if you just feel like e-mailing. Send your e-mail to Frank B. Brokken.Please state the document version you're referring to, as found in the title (in this document: 8.3.1) and please state chapter and paragraph name or number you're referring to.
All received mail is processed conscientiously, and received suggestions for improvements will usually have been processed by the time a new version of the Annotations is released. Except for the incidental case I will normally not acknowledge the receipt of suggestions for improvements. Please don't interpret this as me not appreciating your efforts.
In this chapter C++ is further explored. The possibility to
declare functions in struct
s is illustrated in various examples; the
concept of a class
is introduced; casting is covered in detail; many new
types are introduced and several important notational extensions to C are
discussed.
sin
operating on degrees, but does not want to lose
the capability of using the standard sin
function, operating on
radians.
Namespaces are covered extensively in chapter 4. For now it
should be noted that most compilers require the explicit declaration of a
standard namespace: std
. So, unless otherwise indicated, it is
stressed that all examples in the Annotations now implicitly use the
using namespace std;declaration. So, if you actually intend to compile examples given in the C++ Annotations, make sure that the sources start with the above
using
declaration.
::
). This operator can be
used in situations where a global variable exists having the same name as a
local variable:
#include <stdio.h> int counter = 50; // global variable int main() { for (int counter = 1; // this refers to the counter < 10; // local variable counter++) { printf("%d\n", ::counter // global variable / // divided by counter); // local variable } }In the above program the scope operator is used to address a global variable instead of the local variable having the same name. In C++ the scope operator is used extensively, but it is seldom used to reach a global variable shadowed by an identically named local variable. Its main purpose is described in chapter 7.
const
is part of the C grammar, its use is
more important and much more common in C++ than it is in C.
The const
keyword is a modifier stating that the value of a variable
or of an argument may not be modified. In the following example the intent is
to change the value of a variable ival
, which fails:
int main() { int const ival = 3; // a constant int // initialized to 3 ival = 4; // assignment produces // an error message }This example shows how
ival
may be initialized to a given value in its
definition; attempts to change the value later (in an assignment) are not
permitted.
Variables that are declared const
can, in contrast to C, be used to
specify the size of an array, as in the following example:
int const size = 20; char buf[size]; // 20 chars bigAnother use of the keyword
const
is seen in the declaration of
pointers, e.g., in pointer-arguments. In the declaration
char const *buf;
buf
is a pointer variable pointing to char
s. Whatever is pointed
to by buf
may not be changed through buf
: the char
s are declared
as const
. The pointer buf
itself however may be changed. A statement
like *buf = 'a';
is therefore not allowed, while ++buf
is.
In the declaration
char *const buf;
buf
itself is a const
pointer which may not be changed. Whatever
char
s are pointed to by buf
may be changed at will.
Finally, the declaration
char const *const buf;is also possible; here, neither the pointer nor what it points to may be changed.
The
rule of thumb for the placement of the keyword const
is the
following: whatever occurs to the left to the keyword may not be changed.
Although simple, this rule of thumb is often used. For example, Bjarne Stroustrup states (in http://www.research.att.com/~bs/bs_faq2.html#constplacement):
Should I put "const" before or after the type?But we've already seen an example where applying this simple `before' placement rule for the keywordI put it before, but that's a matter of taste. "const T" and "T const" were always (both) allowed and equivalent. For example:
const int a = 1; // OK int const b = 2; // also OKMy guess is that using the first version will confuse fewer programmers (``is more idiomatic'').
const
produces unexpected (i.e., unwanted)
results as we will shortly see below. Furthermore, the `idiomatic'
before-placement also conflicts with the notion of
const functions, which
we will encounter in section 7.5. With const functions the
keyword const
is also placed behind rather than before the name of the
function.
The definition or declaration (either or not containing const
) should
always be read from the variable or function identifier back to the type
indentifier:
``Buf is a const pointer to const characters''This rule of thumb is especially useful in cases where confusion may occur. In examples of C++ code published in other places one often encounters the reverse:
const
preceding what should not be
altered. That this may result in sloppy code is indicated by our second
example above:
char const *buf;What must remain constant here? According to the sloppy interpretation, the pointer cannot be altered (as
const
precedes the pointer). In fact,
the char values are the constant entities here, as becomes clear when we try
to compile the following program:
int main() { char const *buf = "hello"; ++buf; // accepted by the compiler *buf = 'u'; // rejected by the compiler }Compilation fails on the statement
*buf = 'u';
and not on the
statement ++buf
.
Marshall Cline's C++ FAQ gives the same rule (paragraph 18.5) , in a similar context:
[18.5] What's the difference between "const Fred* p", "Fred* const p" and "const Fred* const p"?Marshal Cline's advice might be improved, though: you should start to read pointer definitions (and declarations) at the variable name, reading as far as possible to the definition's end. Once you see a closing parenthesis, read backwards (right to left) from the initial point, until you find matching open-parenthesis or the very beginning of the definition. For example, consider the following complex declaration:You have to read pointer declarations right-to-left.
char const *(* const (*ip)[])[]Here, we see:
ip
, being a
#include <iostream> using namespace std; int main() { int ival; char sval[30]; cout << "Enter a number:\n"; cin >> ival; cout << "And now a string:\n"; cin >> sval; cout << "The number is: " << ival << "\n" "And the string is: " << sval << '\n'; }This program reads a number and a string from the
cin
stream (usually
the keyboard) and prints these data to cout
. With respect to streams,
please note:
iostream
. In the examples in the
C++ Annotations this header file is often not mentioned explicitly. Nonetheless,
it must be included (either directly or indirectly) when these streams are
used. Comparable to the use of the using namespace std;
clause, the reader
is expected to #include <iostream>
with all the examples in which the
standard streams are used.
cout
, cin
and cerr
are variables of so-called
class-types. Such variables are commonly called objects. Classes
are discussed in detail in chapter 7 and are used extensively in
C++.
cin
extracts data from a stream and copies the
extracted information to variables (e.g., ival
in the above example) using
the extraction operator (two consecutive >
characters: >>). We will
describe later how operators in C++ can perform quite different actions
than what they are defined to do by the language, as is the case
here. Function overloading has already been mentioned. In C++
operators can also have multiple definitions, which is called operator
overloading.
cin
, cout
and cerr
(i.e.,
>> and <<) also manipulate variables of different types. In the
above example cout
<< ival
results in the printing of an integer
value, whereas cout
<< "Enter a number"
results in the printing
of a string. The actions of the operators therefore depend on the types of
supplied variables.
"\n"
or
'\n'
. But when inserting the
endl
symbol the line is terminated
followed by the flushing of the stream's internal buffer. Thus, endl
can
usually be avoided in favor of '\n'
resulting in somewhat more efficient
code.
cin
, cout
and cerr
are not part of the
C++ grammar proper. The streams are part of the definitions in the header
file iostream
. This is comparable to functions like printf
that are
not part of the C grammar, but were originally written by people who
considered such functions important and collected them in a run-time library.
A program may still use the old-style functions like printf
and scanf
rather than the new-style streams. The two styles can even be mixed. But
streams offer several clear advantages and in many C++ programs have
completely replaced the old-style C functions. Some advantages of using
streams are:
printf
and
scanf
can define wrong format specifiers for their arguments, for which
the compiler sometimes can't warn. In contrast, argument checking with
cin
, cout
and cerr
is performed by the compiler. Consequently it
isn't possible to err by providing an int
argument in places where,
according to the format string, a string argument should appear. With streams
there are no format strings.
printf
and scanf
(and other functions using
format strings) in fact implement a mini-language which is interpreted at
run-time. In contrast, with streams the C++ compiler knows exactly which
in- or output action to perform given the arguments used. No mini-language
here.
printf
cannot be extended.
cin, cout
and cerr
. In chapter 6 iostreams will be covered in
greater detail. Even though
printf
and friends can still be used in
C++ programs, streams have practically replaced the old-style C
I/O
functions like printf
. If you think you still need to use
printf
and related functions, think again: in that case you've probably
not yet completely grasped the possibilities of stream objects.
struct
s (see
section 2.5.13). Such functions are called
member functions.
This section briefly discusses how to define such functions.
The code fragment below shows a struct
having data fields for a person's
name and address. A function print
is included in the
struct
's definition:
struct Person { char name[80]; char address[80]; void print(); };When defining the member function
print
the structure's name
(Person
) and the scope resolution operator (::
) are used:
void Person::print() { cout << "Name: " << name << "\n" "Address: " << address << '\n'; }The implementation of
Person::print
shows how the fields of the
struct
can be accessed without using the structure's type name. Here the
function Person::print
prints a variable name
. Since Person::print
is itself a part of struct
person
, the variable name
implicitly
refers to the same type.
This struct Person
could be used as follows:
Person person; strcpy(person.name, "Karel"); strcpy(p.address, "Marskramerstraat 33"); p.print();The advantage of member functions is that the called function automatically accesses the data fields of the structure for which it was invoked. In the statement
person.print()
the object person
is the
`substrate': the variables name
and address
that are used in the code
of print
refer to the data stored in the person
object.
C++ has three keywords that are related to data hiding:
private
,
protected
and
public
. These keywords can be used in the definition of
struct
s. The keyword public
allows all subsequent fields of a
structure to be accessed by all code; the keyword private
only allows code
that is part of the struct
itself to access subsequent fields. The keyword
protected
is discussed in chapter 13, and is somewhat
outside of the scope of the current discussion.
In a struct
all fields are public
, unless explicitly stated otherwise.
Using this knowledge we can expand the struct
Person
:
struct Person { private: char d_name[80]; char d_address[80]; public: void setName(char const *n); void setAddress(char const *a); void print(); char const *name(); char const *address(); };As the data fields
d_name
and d_address
are in a private
section they are only accessible to the member functions which are defined in
the struct
: these are the functions setName
, setAddress
etc.. As
an illustration consider the following code:
Person fbb; fbb.setName("Frank"); // OK, setName is public strcpy(fbb.d_name, "Knarf"); // error, x.d_name is privateData integrity is implemented as follows: the actual data of a
struct
Person
are mentioned in the structure definition. The data are accessed by
the outside world using special functions that are also part of the
definition. These member functions control all traffic between the data fields
and other parts of the program and are therefore also called `interface'
functions. The thus implemented data hiding is illustrated in
Figure 2.
The members setName
and setAddress
are declared with char const
*
parameters. This indicates that the functions will not alter the strings
which are supplied as their arguments. Analogously, the members name
and address
return char const *
s: the compiler will prevent callers of
those members from modifying the information made accessible through the
return values of those members.
Two examples of member functions of the struct
Person
are shown
below:
void Person::setName(char const *n) { strncpy(d_name, n, 79); d_name[79] = 0; } char const *Person::name() { return d_name; }The power of member functions and of the concept of data hiding results from the abilities of member functions to perform special tasks, e.g., checking the validity of the data. In the above example
setName
copies
only up to 79 characters from its argument to the data member name
,
thereby avoiding a
buffer overflow.
Another illustration of the concept of data hiding is the following. As an
alternative to member functions that keep their data in memory a library could
be developed featuring member functions storing data on file. To convert a
program which stores Person
structures in memory to one that stores the
data on disk no special modifications would be required. After recompilation
and linking the program to a new library it will have converted from storage
in memory to storage on disk. This example illustrates a broader concept than
data hiding; it illustrates
encapsulation. Data hiding is a kind of
encapsulation. Encapsulation in general results in reduced coupling of
different sections of a program. This in turn greatly enhances reusability and
maintainability of the resulting software. By having the structure encapsulate
the actual storage medium the program using the structure becomes independent
of the actual storage medium that is used.
Though data hiding can be implemented using struct
s, more often (almost
always) classes are used instead. A class is a kind of struct, except that
a class uses private access by default, whereas structs use public access by
default. The definition of a class
Person
is therefore identical to
the one shown above, except for the fact that the keyword class
has
replaced struct
while the initial private:
clause can be omitted. Our
typographic suggestion for class names (and other type names defined by the
programmer) is to start with a capital character to be followed by the
remainder of the type name using lower case letters (e.g., Person
).
struct
, which then require a pointer to the
struct
as one of their arguments. An imaginary C header file showing
this concept is:
/* definition of a struct PERSON This is C */ typedef struct { char name[80]; char address[80]; } PERSON; /* some functions to manipulate PERSON structs */ /* initialize fields with a name and address */ void initialize(PERSON *p, char const *nm, char const *adr); /* print information */ void print(PERSON const *p); /* etc.. */
In C++, the declarations of the involved functions are put inside
the definition of the struct
or class
. The argument denoting
which struct
is involved is no longer needed.
class Person { char d_name[80]; char d_address[80]; public: void initialize(char const *nm, char const *adr); void print(); // etc.. };In C++ the
struct
parameter is not used. A C function call
such as:
PERSON x; initialize(&x, "some name", "some address");becomes in C++:
Person x; x.initialize("some name", "some address");
int int_value; int &ref = int_value;In the above example a variable
int_value
is defined. Subsequently a
reference ref
is defined, which (due to its initialization) refers to the
same memory location as int_value
. In the definition of ref
, the
reference operator
&
indicates that ref
is not
itself an int
but a reference to one. The two statements
++int_value; ++ref;have the same effect: they increment
int_value
's value. Whether that
location is called int_value
or ref
does not matter.
References serve an important function in C++ as a means to pass modifiable arguments to functions. E.g., in standard C, a function that increases the value of its argument by five and returning nothing needs a pointer parameter:
void increase(int *valp) // expects a pointer { // to an int *valp += 5; } int main() { int x; increase(&x); // pass x's address }This construction can also be used in C++ but the same effect is also achieved using a reference:
void increase(int &valr) // expects a reference { // to an int valr += 5; } int main() { int x; increase(x); // passed as reference }It is arguable whether code such as the above should be preferred over C's method, though. The statement
increase
(x)
suggests that not
x
itself but a copy is passed. Yet the value of x
changes because
of the way increase()
is defined. However, references can also be used to
pass objects that are only inspected (without the need for a copy or a const
*) or to pass objects whose modification is an accepted side-effect of their
use. In those cases using references are strongly preferred over existing
alternatives like copy by value or passing pointers.
Behind the scenes references are implemented using pointers. So, as far as the compiler is concerned references in C++ are just const pointers. With references, however, the programmer does not need to know or to bother about levels of indirection. An important distinction between plain pointers and references is of course that with references no indirection takes place. For example:
extern int *ip; extern int &ir; ip = 0; // reassigns ip, now a 0-pointer ir = 0; // ir unchanged, the int variable it refers to // is now 0.
In order to prevent confusion, we suggest to adhere to the following:
void some_func(int val) { cout << val << '\n'; } int main() { int x; some_func(x); // a copy is passed }
void by_pointer(int *valp) { *valp += 5; }
void by_reference(string const &str) { cout << str; // no modification of str } int main () { int x = 7; by_pointer(&x); // a pointer is passed // x might be changed string str("hello"); by_reference(str); // str is not altered }References play an important role in cases where the argument is not changed by the function but where it is undesirable to copy the argument to initialize the parameter. Such a situation occurs when a large object is passed as argument, or is returned by the function. In these cases the copying operation tends to become a significant factor, as the entire object must be copied. In these cases references are preferred.
If the argument isn't modified by the function, or if the caller shouldn't
modify the returned information, the const
keyword should be
used. Consider the following example:
struct Person // some large structure { char name[80]; char address[90]; double salary; }; Person person[50]; // database of persons // printperson expects a // reference to a structure // but won't change it void printperson (Person const &p) { cout << "Name: " << p.name << '\n' << "Address: " << p.address << '\n'; } // get a person by indexvalue Person const &person(int index) { return person[index]; // a reference is returned, } // not a copy of person[index] int main() { Person boss; printperson (boss); // no pointer is passed, // so variable won't be // altered by the function printperson(person(5)); // references, not copies // are passed here }
References could result in extremely `ugly' code. A function may return a reference to a variable, as in the following example:
int &func() { static int value; return value; }This allows the use of the following constructions:
func() = 20; func() += func();It is probably superfluous to note that such constructions should normally not be used. Nonetheless, there are situations where it is useful to return a reference. We have actually already seen an example of this phenomenon in our previous discussion of streams. In a statement like
cout
<<
"Hello"
<< '\n';
the insertion operator returns a reference to
cout
. So, in this statement first the "Hello"
is inserted into
cout
, producing a reference to cout
. Through this reference the
'\n'
is then inserted in the cout
object, again producing a reference
to cout
, which is then ignored.
Several differences between pointers and references are pointed out in the next list below:
int &ref;
ref
refer to?
external
. These references were
initialized elsewhere.
&
is used with a
reference, the expression yields the address of the variable to which the
reference applies. In contrast, ordinary pointers are variables themselves, so
the address of a pointer variable has nothing to do with the address of the
variable pointed to.
const &
types. the C++0x standard adds a new reference type called an
rvalue reference, defined as
typename &&
.
The name rvalue reference is derived from assignment statements, where the variable to the left of the assignment operator is called an lvalue and the expression to the right of the assignment operator is called an rvalue. Rvalues are often temporary (or anonymous) values, like values returned by functions.
In this parlance the C++ reference should be considered an
lvalue reference (using the notation typename &
). They can be
contrasted to rvalue references (using the notation typename &&
).
The key to understanding rvalue references is anonymous variable. An anonymous variable has no name and this is the distinguishing feature for the compiler to associate it automatically with an lvalue reference if it has a choice. Before introducing some interesting and new constructions that weren't available before C++0x let's first have a look at some distinguishing applications of lvalue references. The following function returns a temporary (anonymous) value:
int intVal() { return 5; }Although the return value of
intVal
can be assigned to an int
variable it requires a copying operation, which might become prohibitive when
a function does not return an int
but instead some large object. A
reference or pointer cannot be used either to collect the anonymous
return value as the return value won't survive beyond that. So the following
is illegal (as noted by the compiler):
int &ir = intVal(); // fails: refers to a temporary int const &ic = intVal(); // OK: immutable temporary int *ip = &intVal(); // fails: no lvalue available
Apparently it is not possible to modify the temporary returned by
intVal
. But now consider the next function:
void receive(int &value) { cout << "int value parameter\n"; } void receive(int &&value) { cout << "int R-value parameter\n"; }and let's call this function from
main
:
int main() { receive(18); int value = 5; receive(value); receive(intVal()); }This program produces the following output:
int R-value parameter int value parameter int R-value parameterIt shows the compiler selecting
receive(int &&value)
in all cases
where it receives an anonymous int
as its argument. Note that this
includes receive(18)
: a value 18 has no name and thus receive(int
&&value)
is called. Internally, it actually uses a temporary variable to
store the 18, as is shown by the following example which modifies receive
:
void receive(int &&value) { ++value; cout << "int R-value parameter, now: " << value << '\n'; // displays 19 and 6, respectively. }Contrasting
receive(int &value)
with receive(int &&value)
has
nothing to do with int &value
not being a const reference. If
receive(int const &value)
is used the same results are obtained. Bottom
line: the compiler selects the overloaded function using the rvalue reference
if the function is passed an anonymous value.
The compiler runs into problems if void receive(int &value)
is
replaced by void receive(int value)
, though. When confronted with the
choice between a value parameter and a reference parameter (either lvalue or
rvalue) it cannot make a decision and reports an ambiguity. In practical
contexts this is not a problem. Rvalue refences were added to the language in
order to be able to distinguish the two forms of references: named values
(for which lvalue references are used) and anonymous values (for which
rvalue references are used).
It is this distinction that allows the implementation of move semantics and perfect forwarding. At this point the concept of move semantics cannot yet fully be discussed (but see section 8.6 for a more thorough discussusion) but it is very well possible to illustrate the underlying ideas.
Consider the situation where a function returns a struct Data
containing a
pointer to dynamically allocated characters. Moreover, the struct defines a
member function copy(Data const &other)
that takes another Data
object
and copies the other's data into the current object. The (partial) definition
of the struct Data
might look like this (To the observant reader:
in this example the memory leak that results from using Data::copy()
should be ignored):
struct Data { char *text; size_t size; void copy(Data const &other) { text = strdup(other.text); size = strlen(text); } };Next, functions
dataFactory
and main
are defined as follows:
Data dataFactory(char const *txt) { Data ret = {strdup(txt), strlen(txt)}; return ret; } int main() { Data d1 = {strdup("hello"), strlen("hello")}; Data d2; d2.copy(d1); // 1 (see text) Data d3; d3.copy(dataFactory("hello")); // 2 }At (1)
d2
appropriately receives a copy of d1
's text. But at (2)
d3
receives a copy of the text stored in the temporary returned by the
dataFactory
function. As the temporary ceases to exist after the call to
copy()
two releated and unpleasant consequences are observed:
d3
. Now d3
copies the
temporary's data which clearly is somewhat overdone.
Data
object is lost following the call to
copy()
. Unfortunately its dynamically allocated data is lost as well
resulting in a memory leak.
copy
member with a member copy(Data &&other)
the compiler is able
to distinguish situations (1) and (2). It now calls the initial copy()
member in situation (1) and the newly defined overloaded copy()
member in
situation (2):
struct Data { char *text; size_t size; void copy(Data const &other) { text = strdup(other.text); } void copy(Data &&other) { text = other.text; other.text = 0; } };Note that the overloaded
copy()
function merely moves the
other.text
pointer to the current object's text
pointer followed by
reassigning 0 to other.text
. Struct Data
suddenly has become
move-aware and implements move semantics, removing the drawbacks of
the previously shown approach:
other.text
doesn't point to dynamically allocated
memory anymore the memory leak is prevented.
Rvalue references for *this
and initialization of class
objects by rvalues are not yet supported by the g++
compiler.
\n, \\
and \"
. In some
cases it is useful to avoid escaping strings (e.g., in the context of XML). To
this end, the C++0x standard offers
raw string
literals.
Raw string literals start with an R
, followed by a double quote, followed
by a label (which is an arbitrary sequence of characters not equal to [
),
followed by [
. The raw string ends at the closing bracket ]
, followed
by the label which is in turn followed by a double quote. Example:
R"[A Raw \ "String"]" R"delimiter[Another \ Raw "[String]]delimiter"In the first case, everything between
"[
and ]"
is part of the
string. Escape sequences aren't supported so \ "
defines three characters:
a backslash, a blank character and a double quote. The second example shows a
raw string defined between the markers "delimiter[
and ]delimiter"
.
int
values, thereby bypassing
type safety. E.g., values of different enumeration types may be
compared for (in)equality, albeit through a (static) type cast.
Another problem with the current enum
type is that their values are not
restricted to the enum type name itself, but to the scope where the
enumeration is defined. As a consequence, two enumerations having the same
scope cannot have identical values.
In the C++0x standard these problems are solved by defining enum classes. An enum class can be defined as in the following example:
enum class SafeEnum { NOT_OK, // 0, by implication OK = 10, MAYBE_OK // 11, by implication };Enum classes use
int
values by default, but the used value type can
easily be changed using the : type
notation, as in:
enum class CharEnum: unsigned char { NOT_OK, OK };To use a value defined in an enum class its enumeration name must be provided as well. E.g.,
OK
is not defined, CharEnum::OK
is.
Using the data type specification (noting that it defaults to int
) it
is possible to use enum class forward declarations.
E.g.,
enum Enum1; // Illegal: no size available enum Enum2: unsigned int; // Legal in C++0x: explicitly declared type enum class Enum3; // Legal in C++0x: default int type is used enum class Enum4: char; // Legal in C++0x: explicitly declared type
C++ extends this concept in the C++0x standard by introducing the
type
initializer_list<Type>
where Type
is replaced by the type name of
the values used in the initializer list. Initializer lists in C++ are,
like their counterparts in C, recursive, so they can also be used with
multi-dimensional arrays, structs and classes.
Like in C, initializer lists consist of a list of values surrounded by curly braces. But unlike C, functions can define initializer list parameters. E.g.,
void values(std::initializer_list<int> iniValues) { }A function like
arrayValues
could be called as follows:
values({2, 3, 5, 7, 11, 13});The initializer list appears as an argument which is a list of values surrounded by curly braces. Due to the recursive nature of initializer lists a two-dimensional series of values can also be passes, as shown in the next example:
void values2(std::initializer_list<int> iniValues) { } values2({{1, 2}, {2, 3}, {3, 5}, {4, 7}, {5, 11}, {6, 13}});Initializer lists are constant expressions and cannot be modified. However, their size and values may be retrieved using their
size, begin
, and end
members as follows:
void values(initializer_list<int> iniValues) { cout << "Initializer list having " << iniValues.size() << "values\n"; for ( initializer_list<int>::const_iterator begin = iniValues.begin(); begin != iniValues.end(); ++begin ) cout << "Value: " << *begin << '\n'; }
Initializer lists can also be used to initialize objects of classes (cf. section 7.3).
auto
is defined by the C++0x standard
allowing the compiler to determine the type of a variable automatically rather
than requiring the software engineer to define a variable's type explicitly.
In parallel, the use of auto
as a storage class specifier is no longer
supported in the C++0x standard. According to that standard a variable
definition like auto int var
results in a compilation error.
This can be very useful in situations where it is very hard to determine the variable's type in advance. These situations occur, e.g., in the context of templates, topics covered in chapters 18 until 22.
At this point in the Annotations only simple examples can be given, and some
hints will be provided about more general uses of the auto
keyword.
When defining and initializing a variable int variable = 5
the type of the
initializing expression is well known: it's an int
, and unless the
programmer's intentions are different this could be used to define
variable
's type (although it shouldn't in normal circumstances as it
reduces rather than improves the clarity of the code):
auto variable = 5;
Here are some examples where using auto
is useful.
In chapter 5 the
iterator concept is introduced (see also
chapters 12 and 18). Iterators sometimes have long type
definitions, like
std::vector<std::string>::const_reverse_iteratorFunctions may return types like this. Since the compiler knows the types returned by functions we may exploit this knowledge using
auto
. Assuming that a function begin()
is declared as follows:
std::vector<std::string>::const_reverse_iterator begin();Rather than writing the verbose variable definition (at
//
1
) a much shorter definition (at // 2
) may be used:
std::vector<std::string>::const_reverse_iterator iter = begin(); // 1 auto iter = begin(); // 2It's easy to define additional variables of this type. When initializing those variables using
iter
the auto
keyword can be used again:
auto start = iter;
If start
can't be initialized immediately using an existing
variable the type of a well known variable of function can be used in
combination with the
decltype
keyword, as in:
decltype(iter) start; decltype(begin()) spare;The keyword
decltype
may also receive an expression as its
argument. This feature is already available in the C++0x standard
implementation in g++ 4.3. E.g., decltype(3 + 5)
represents an int,
decltype(3 / double(3))
represents double
.
The auto
keyword can also be used to postpone the definition of a
function's return type. The declaration of a function intArrPtr
returning
a pointer to an array of 10 int
s looks like this:
int (*intArrPtr())[10];Such a declaration is fairly complex. E.g., among other complexities it requires `protection of the pointer' using parentheses in combination with the function's parameter list. In situations like these the specification of the return type can be postponed using the
auto
return type, followed by the specification of the function's return type after
any other specification the function might receive (e.g., as a const member
(cf. section 7.5) or following its exception throw list
(cf. section 9.6)).
Using auto
to declare the above function, the declaration becomes:
auto intArrPtr() -> int (*)[10];A return type specification using
auto
is called a
late-specified return type.
The auto
keyword can also be used to defined types that are related to
the actual auto
associated type. Here are some examples:
vector<int> vi; auto iter = vi.begin(); // standard: auto is vector<int>::iterator auto &&rref = vi.begin(); // auto is rvalue ref. to the iterator type auto *ptr = &iter; // auto is pointer to the iterator type auto *ptr = &rref; // same
for (init; cond; inc) statementIn many cases, however, the initialization, condition and increment parts are fairly obvious as in situations where all elements of an array or vector must be processed. Many languages offer the
foreach
statement for that and
C++ offers the std::for_each
generic algorithm (cf. section
19.1.17).
The C++0x standard adds a new for
statement syntax to this. The new
syntax can be used to process each element of a
range. Three types of
ranges are distinguished:
int array[10]
);
std::pair
(cf. section 12.2, e.g.,
std::pair subRange(array + 1, array + 8)
).
// assume int array[30] for (int &element: array) statementhere an
int &element
is defined whose lifetime and scope is restricted
to the lifetime of the for-statement. It refers to each of the subsequent
elements of array
at each new iteration of the for-statement, starting
with the first element of the range.
void, char,
short, int, long, float
and double
. C++ extends these built-in types
with several additional built-in types: the types
bool
,
wchar_t
,
long long
and
long double
(Cf.
ANSI/ISO draft (1995),
par. 27.6.2.4.1 for examples of these very long types). The type
long long
is merely a double-long long
datatype. The type
long double
is merely a double-long double
datatype. These built-in
types as well as pointer variables are called
primitive types
in the C++ Annotations.
There is a subtle issue to be aware of when converting applications developed
for 32-bit architectures to 64-bit architectures. When converting 32-bit
programs to 64-bit programs, only long
types and pointer types change in
size from 32 bits to 64 bits; integers of type int
remain at their size of
32 bits. This may cause data truncation when assigning pointer or long
types to int
types. Also, problems with sign extension can occur when
assigning expressions using types shorter than the size of an int
to an
unsigned long
or to a pointer. More information about this issue can be
found
here.
Except for these built-in types the class-type string
is available
for handling character strings. The datatypes bool
, and wchar_t
are
covered in the following sections, the datatype string
is covered in
chapter 5. Note that recent versions of C may also have adopted
some of these newer data types (notably bool
and wchar_t
).
Traditionally, however, C doesn't support them, hence they are mentioned
here.
Now that these new types are introduced, let's refresh your memory about letters that can be used in literal constants of various types. They are:
b
or B
: in addition to its use to indicate a hexadecimal
value, it can also be used to define a
binary constant. E.g., 0b101
equals the decimal value 5.
E
or e
:
the
exponentiation character in floating point literal values. For example:
1.23E+3
. Here, E
should be pronounced (and interpreted) as: times 10
to the power. Therefore, 1.23E+3
represents the value 1230
.
F
can be used as postfix to a
non-integral numeric constant to indicate a value of type float
, rather
than double
, which is the default. For example: 12.F
(the dot
transforms 12 into a floating point value); 1.23E+3F
(see the previous
example. 1.23E+3
is a double
value, whereas 1.23E+3F
is a
float
value).
L
can be used as prefix to
indicate a character string whose elements are wchar_t
-type
characters. For example: L"hello world"
.
L
can be used as postfix to an
integral value to indicate a value of type long
, rather than int
,
which is the default. Note that there is no letter indicating a short
type. For that a static_cast<short>()
must be used.
p
, to specify the power
in
hexadecimal floating point numbers. E.g. 0x10p4
. The exponent itself is
read as a decimal constant and can therefore not start with 0x. The exponent
part is interpreted as a power of 2. So 0x10p2
is (decimal) equal to 64:
16 * 2^2
.
U
can be used as postfix to an
integral value to indicate an unsigned
value, rather than an int
.
It may also be combined with the postfix L
to produce an unsigned long
int
value.
x
and a
until f
characters can be used to
specify hexadecimal constants (optionally using capital letters).
bool
represents boolean (logical) values, for which the (now
reserved) constants
true
and
false
may be used. Except for these
reserved values, integral values may also be assigned to variables of type
bool
, which are then implicitly converted to true
and false
according to the following
conversion rules (assume intValue
is an
int
-variable, and boolValue
is a bool
-variable):
// from int to bool: boolValue = intValue ? true : false; // from bool to int: intValue = boolValue ? 1 : 0;Furthermore, when
bool
values are inserted into streams then true
is represented by 1
, and false
is represented by 0
. Consider the
following example:
cout << "A true value: " << true << "\n" "A false value: " << false << '\n';The
bool
data type is found in other programming languages as
well. Pascal has its type Boolean
; Java has a boolean
type. Different from these languages, C++'s type bool
acts like a kind
of int
type. It is primarily a documentation-improving type, having just
two values true
and false
. Actually, these values can be interpreted
as enum
values for 1
and 0
. Doing so would ignore the philosophy
behind the bool
data type, but nevertheless: assigning true
to an
int
variable neither produces warnings nor errors.
Using the bool
-type is usually clearer than using
int
. Consider the following prototypes:
bool exists(char const *fileName); // (1) int exists(char const *fileName); // (2)For the first prototype, readers will expect the function to return
true
if the given filename is the name of an existing
file. However, using the second prototype some ambiguity arises: intuitively
the return value 1 is appealing, as it allows constructions like
if (exists("myfile")) cout << "myfile exists";On the other hand, many system functions (like
access
,
stat
, and
many other) return 0 to indicate a successful operation, reserving other
values to indicate various types of errors.
As a rule of thumb I suggest the following: if a function should inform
its caller about the success or failure of its task, let the function return a
bool
value. If the function should return success or various types of
errors, let the function return enum values, documenting the situation by
its various symbolic constants. Only when the function returns a conceptually
meaningful integral value (like the sum of two int
values), let the
function return an int
value.
wchar_t
type is an extension of the char
built-in type, to accomodate
wide character values (but see also the next section). The g++
compiler reports sizeof(wchar_t)
as 4, which easily accomodates all 65,536
different Unicode character values.
Note that Java's char
data type is somewhat comparable to C++'s
wchar_t
type. Java's char
type is 2 bytes wide, though. On the
other hand, Java's byte
data type is comparable to C++'s char
type: one byte. Confusing?
L
(e.g., L"hello"
) defines a
wchar_t
string literal.
The new C++0x standard adds to this support for 8, 16 and 32 bit
Unicode encoded strings. Furthermore, two new data types are introduced:
char16_t
and char32_t
storing, respectively, a
UTF-16
and
UTF-32
unicode value.
In addition, char
will be large enough to contain any
UTF-8
unicode
value as well (i.e., it will remain an 8-bit value).
String literals for the various types of unicode encodings (and associated variables) can be defined as follows:
char utf_8[] = u8"This is UTF-8 encoded."; char16_t utf16[] = u"This is UTF-16 encoded."; char32_t utf32[] = U"This is UTF-32 encoded.";Alternatively, unicode constants may be defined using the
\u
escape
sequence, followed by a hexadecimal value. Depending on the type of the
unicode variable (or constant) a UTF-8, UTF-16
or UTF-32
value is
used. E.g.,
char utf_8[] = u8"\u2018"; char16_t utf16[] = u"\u2018"; char32_t utf32[] = U"\u2018";Unicode strings can be delimited by double quotes but raw string literals can also be used.
long long int
to the set of standard
types. On 32 bit systems it will have at least 64 usable bits. Some compilers
already supported long long int
as an extension, but C++0x officially
adds it to C++.
size_t
type is not really a built-in primitive data type, but a data
type that is promoted by
POSIX as a typename to be used for non-negative
integral values answering questions like `how much' and `how many', in which
case it should be used instead of
unsigned int
. It is not a specific
C++ type, but also available in, e.g., C. Usually it is defined
implictly when a (any) system header file is included. The header file
`officially' defining size_t
in the context of C++ is
cstddef
.
Using size_t
has the advantage of being a conceptual type, rather than
a standard type that is then modified by a modifier. Thus, it improves
the self-documenting value of source code.
Sometimes functions explictly require unsigned int
to be used. E.g., on
amd
-architectures the
X-windows function
XQueryPointer
explicitly
requires a pointer to an unsigned int
variable as one of its arguments. In
such situations a pointer to a size_t
variable can't be used, but the
address of an unsigned int
must be provided. Such situations are
exceptional, though.
Other useful bit-represented types also exists. E.g.,
uint32_t
is
guaranteed to hold 32-bits unsigned values. Analogously,
int32_t
holds
32-bits signed values. Corresponding types exist for 8, 16 and 64 bits
values. These types are defined in the header file
cstdint
.
(typename)expressionhere
typename
is the name of a valid type, and expression
is an
expression.
C style casts are now deprecated. Although C++ offers function call notations using the following syntax:
typename(expression)the function call notation does in fact not represents a cast, but a request to the compiler to construct an (anonymous) variable having type
typename
from expression
. Although this form is very often used in C++, it
should not be used for casting. Instead, there are now four
new-style casts available, that are introduced in the following
sections.
The C++0x standard defines the shared_ptr type (cf. section 18.4). To cast shared pointers specialized casts should be used. These are discussed in section 18.4.5.
static_cast<type>(expression)This type of cast is used to convert, e.g., a
double
to an int
:
both are numbers, but as the int
has no fractions precision is potentially
reduced. But the converse also holds true. When the quotient of
two int
values must be assigned to a double
the fraction part of the
division will get lost unless a cast is used.
Here is an example of such a cast is (assuming quotient
is of type
double
and lhs
and rhs
are int
-typed variables):
quotient = static_cast<double>(lhs) / rhs;If the cast is omitted, the division operator will ignore the remainder as its operands are
int
expressions. Note that the division should be put
outside of the cast expression. If the division is put inside (as in
static_cast<double>(lhs / rhs)
) an integer division will have been
performed before the cast has had a chance to convert the type of an
operand to double
.
Another nice example of code in which it is a good idea to use the
static_cast<>()
-operator is in situations where the arithmetic assignment
operators are used in mixed-typed expressions. Consider the following
expression (assume doubleVar
is a variable of type double
):
intVar += doubleVar;This statement actually evaluates to:
intVar = static_cast<int>(static_cast<double>(intVar) + doubleVar);Here
IntVar
is first promoted to a double
, and is then added as a
double
value to doubleVar
. Next, the sum is cast back to an int
.
These two casts are a bit overdone. The same result is obtained by
explicitly casting doubleVar
to an int
, thus obtaining an
int
-value for the right-hand side of the expression:
intVar += static_cast<int>(doubleVar);
A static_cast
can also be used to undo or introduce the
signed-modifier of an int
-typed variable. The C function tolower
requires an int
representing the value of an unsigned char
. But
char
by default is a signed type. To call tolower
using an available
char ch
we should use:
tolower(static_cast<unsigned char>(ch))Casts like these provide information to the compiler about how to handle the provided data. Very often (especially with data types differing only in size but not in representation) the cast won't require any additional code. Additional code will be required, however, to convert one representation to another, e.g., when converting
double
to int
.
const
keyword has been given a special place in casting. Normally
anything const
is const
for a good reason. Nonetheless situations
may be encountered where the const
can be ignored. For these special
situations the const_cast
should be used. Its syntax is:
const_cast<type>(expression)A
const_cast<type>(expression)
expression is used to undo the
const
attribute of a (pointer) type.
The need for a const_cast
may occur in combination with functions from
the standard C library which traditionally weren't always as const-aware
as they should. A function strfun(char *s)
might be available, performing
some operation on its char *s
parameter without actually modifying the
characters pointed to by s
. Passing char const hello[] = "hello";
to
strfun
will produce the warning
passing `const char *' as argument 1 of `fun(char *)' discards constA
const_cast
is the appropriate way to prevent the warning:
strfun(const_cast<char *>(hello));
reinterpret_cast
. It is somewhat reminiscent of the
static_cast
, but reinterpret_cast
should be used when it is known
that the information as defined in fact is or can be interpreted as something
completely different. Its syntax is:
reinterpret_cast<pointer type>(pointer expression)
A
reinterpret_cast<type>(expression)
operator is appropriately used to
reinterpret a void *
to a pointer of a well-known type. Void pointers are
encountered with functions from the C library like qsort
. The
qsort
function expects a pointer to a (comparison) function having two
void const *
parameters. In fact, the void const *
s point to data
elements of the array to sort, and so the comparison function may cast the
void const *
parameters to pointers to the elements of the array to be
sorted. E.g., if the array is an int array[]
and the compare function's
parameters are void const *p1, void const *p2
then the compare function may obtain the address of the int
pointed to by
p1
by using:
reinterpret_cast<int const *>(p1)
Another example of a reinterpret_cast
is found in combination with the
write
functions that are available for files and streams. In C++
streams are the preferred interface to, e.g., files. Output
streams (like cout
) offer write
members having the prototype
write(char const *buffer, int length)To write a
double
to a stream using write
a reinterpret_cast
is
needed as well. E.g., to write the raw bytes of a variable double value
to
cout
we would use:
cout.write(reinterpret_cast<char const *>(&value), sizeof(double));All casts are potentially dangerous, but the
reinterpret_cast
is the
most dangerous of all casts. Effectively we tell the compiler: back off, we
know what we're doing, so stop fuzzing. All bets are off, and we'd better
do know what we're doing in situations like these. As a case in point
consider the following code:
int value = 0x12345678; // assume a 32-bits int cout << "Value's first byte has value: " << hex << static_cast<int>( *reinterpret_cast<unsigned char *>(&value) );The above code will show different results on little and big endian computers. Little endian computers will show the value 78, big endian computers the value 12. Also note that the different representations used by little and big endian computers renders the previous example (
cout.write(...)
) non-portable over computers of different architectures.
As a rule of thumb: if circumstances arise in which casts have to be used, clearly document the reasons for their use in your code, making double sure that the cast will not eventually cause a program to misbehave.
dynamic_cast<type>(expression)It is used at run-time to convert, a pointer to an object of a class to a pointer to an object of a class that is found further down its so-called class hierarchy (which is also called a downcast). At this point in the Annotations a
dynamic_cast
cannot yet be discussed extensively,
but we will return to this topic in section 14.5.1.
alignof compl explicit namespace return typeid and concept export new short typename and_eq const extern not signed union asm const_cast false not_eq sizeof unsigned auto constexpr float nullptr static using axiom continue for operator static_cast virtual bitand decltype friend or struct void bitor default goto or_eq switch volatile bool delete if private template wchar_t break do import protected this while case double inline public throw xor catch dynamic_cast int register true xor_eq char else long reinterpret_cast try class enum mutable requires typedef
Notes:
export
keyword is removed from the language under the C++0x
standard, but remains a keyword, reserved for future use.
nullptr
keyword is defined in the C++0x standard
(not yet supported by the g++
compiler).
and, and_eq, bitand, bitor, compl,
not, not_eq, or, or_eq, xor
and xor_eq
are symbolic alternatives for,
respectively, &&, &=, &, |, ~, !, !=, ||, |=, ^
and ^=
.
Keywords can only be used for their intended purpose and cannot be used as
names for other entities (e.g., variables, functions, class-names, etc.). In
addition to keywords
identifiers starting with an underscore and living in
the
global namespace (i.e., not using any explicit namespace or using the
mere ::
namespace specification) or living in the std namespace are
reserved identifiers in the sense that their use is a prerogative of the
implementor.