Don't hesitate to send in feedback: send an e-mail if you like the C++ Annotations; if you think that important material was omitted; if you find errors or typos in the text or the code examples; or if you just feel like e-mailing. Send your e-mail to Frank B. Brokken.Please state the document version you're referring to, as found in the title (in this document: 8.3.1) and please state chapter and paragraph name or number you're referring to.
All received mail is processed conscientiously, and received suggestions for improvements will usually have been processed by the time a new version of the Annotations is released. Except for the incidental case I will normally not acknowledge the receipt of suggestions for improvements. Please don't interpret this as me not appreciating your efforts.
In contrast to the set of functions that handle
memory allocation in C
(i.e.,
malloc
etc.), memory allocation in C++ is handled by
the operators
new
and
delete
.
Important differences between malloc
and new
are:
malloc
doesn't `know' what the allocated memory
will be used for. E.g., when memory for int
s is allocated, the programmer
must supply the correct expression using a multiplication by
sizeof(int)
. In contrast, new
requires a type to be specified; the
sizeof
expression is implicitly handled by the compiler. Using new
is
therefore
type safe.
malloc
is initialized by
calloc
,
initializing the allocated characters to a configurable initial value. This
is not very useful when objects are available. As operator new
knows about
the type of the allocated entity it may (and will) call the constructor of an
allocated class type object. This constructor may be also supplied with
arguments.
NULL
-returns. This is not required anymore when new
is used. In fact,
new
's behavior when confronted with failing memory allocation is
configurable through the use of a new_handler (cf. section
8.2.2).
free
and delete
:
delete
makes sure that when an object is deallocated, its
destructor is automatically called.
The automatic calling of constructors and destructors when objects are created and destroyed has consequences which we shall discuss in this chapter. Many problems encountered during C program development are caused by incorrect memory allocation or memory leaks: memory is not allocated, not freed, not initialized, boundaries are overwritten, etc.. C++ does not `magically' solve these problems, but it does provide us with tools to prevent these kinds of problems.
As a consequence of malloc
and friends becoming deprecated
the very frequently used
str...
functions, like
strdup
, that are all malloc
based, should be avoided in
C++ programs. Instead, the facilities of the string
class and
operators new
and delete
should be used instead.
Memory allocation procedures influence the way classes dynamically allocating
their own memory should be designed. Therefore, in this chapter these topics
are discussed in addition to discussions about operators new
and
delete
. We'll first cover the peculiarities of operators new
and
delete
, followed by a discussion about:
this
pointer, allowing explicit references to the object for
which a member function was called;
new
and
delete
.
Here is a simple example illustrating their use. An int
pointer variable
points to memory allocated by operator new
. This memory is later released
by operator delete
.
int *ip = new int; delete ip;
Here are some characteristics of operators new
and delete
:
new
and delete
are operators and therefore do not
require parentheses, as required for functions like malloc
and
free
;
new
returns a pointer to the kind of memory that's asked for by
its operand (e.g., it returns a pointer to an int
);
new
uses a type as its operand, which has the important
benefit that the correct amount of memory, given the type of the object to be
allocated, is made available;
new
is a
type safe operator as it always
returns a pointer to the type that was mentioned as its operand. In addition,
the type of the receving pointer
must match the type specified with operator new
;
new
may fail, but this is normally of no concern to the
programmer. In particular, the program does not have to test the success
of the memory allocation, as is required for malloc
and
friends. Section 8.2.2 delves into this aspect of new
;
delete
returns void
;
new
a matching delete
should eventually be
executed, lest a
memory leak occurs;
delete
can safely operate on a
0-pointer (doing nothing);
delete
must only be used to return memory allocated
by new
. It should not be used to return memory allocated by
malloc
and friends.
Operator new
can be used to
allocate primitive types but also to
allocate objects. When a primitive type
or a struct
type without a constructor is allocated the allocated
memory is not guaranteed to be initialized to 0, but an
initialization expression may be provided:
int *v1 = new int; // not guaranteed to be initialized to 0 int *v1 = new int(); // initialized to 0 int *v2 = new int(3); // initialized to 3 int *v3 = new int(3 * *v2); // initialized to 9When a class-type object is allocated, the arguments of its constructor (if any) are specified immediately following the type specification in the
new
expression and the object will be initialized according to the thus
specified constructor. For example, to allocate string
objects the
following statements could be used:
string *s1 = new string; // uses the default constructor string *s2 = new string(); // same string *s3 = new string(4, ' '); // initializes to 4 blanks.
In addition to using new
to allocate memory for a single entity or an
array of entities there is also a variant that allocates
raw memory:
operator new(sizeInBytes)
. Raw memory is returned as a void *
. Here
new
allocates a block of memory for unspecified purpose. Although raw
memory may consist of multiple characters it should not be interpreted as an
array of characters. Since raw memory returned by new
is returned as a
void *
its return value can be assigned to a void *
variable. More
often it is assigned to a char *
variable, using a cast. Here is an
example:
char *chPtr = static_cast<char *>(operator new(numberOfBytes));The use of raw memory is frequently encountered in combination with the placement new operator, discussed in section 8.1.5.
new[]
is used to
allocate arrays. The generic notation
new[]
is used in the C++ Annotations. Actually, the number of elements to be
allocated must be specified between the square brackets and it must, in turn,
be prefixed by the type of the entities that must be allocated. Example:
int *intarr = new int[20]; // allocates 20 ints string *stringarr = new string[10]; // allocates 10 strings.Operator
new
is a different operator than operator new[]
. A
consequence of this difference is discussed in the next section
(8.1.2).
Arrays allocated by operator new[]
are called
dynamic arrays. They are constructed during the
execution of a program, and their lifetime may exceed the lifetime of the
function in which they were created. Dynamically allocated arrays may last for
as long as the program runs.
When new[]
is used to allocate an array of primitive values or an
array of objects, new[]
must be specified with a type and an (unsigned)
expression between its square brackets. The type and expression together are
used by the compiler to determine the required size of the block of memory to
make available. When new[]
is used the array's elements are stored
consecutively in memory. An array index expression may thereafter be used to
access the array's individual elements: intarr[0]
represents the first
int
value, immediately followed by intarr[1]
, and so on until the last
element (intarr[19]
).
With non-class types (primitive types, struct
types without constructors) the block of memory returned by operator new[]
is not guaranteed to be initialized to 0.
When operator new[]
is used to allocate arrays of objects their
constructors are automatically used. Consequently new string[20]
results
in a block of 20 initialized string
objects. When allocating arrays of
objects the class's
default constructor is used to initialize each individual object in
turn. A non-default constructor cannot be called, but often it is possible to
work around that as discussed in section 13.8.
The expression between brackets of operator new[]
represents the
number of elements of the array to allocate. The C++ standard allows
allocation of
0-sized arrays. The statement
new int[0]
is correct C++. However, it is also
pointless and confusing and should be avoided. It is pointless as it doesn't
refer to any element at all, it is confusing as the returned pointer has a
useless non-0 value. A pointer intending to point to an array of values should
be initialized (like any pointer that isn't yet pointing to memory) to 0,
allowing for expressions like if (ptr) ...
Without using operator new[]
, arrays of variable sizes can also be
constructed as
local arrays. Such arrays are not dynamic arrays and
their lifetimes are restricted to the lifetime of the block in which they were
defined.
Once allocated, all arrays
have fixed sizes. There is no
simple way to enlarge or shrink arrays. C++ has no operator
`
renew
'. Section 8.1.3 illustrates how to
enlarge arrays.
delete[]
. It expects a pointer to a block of memory, previously allocated
by operator
new[]
.
When operator delete[]
's operand is a pointer to an array of objects
two actions will be performed:
std::string *sp = new std::string[10]; delete[] sp;No special action is performed if a dynamically allocated array of primitive typed values is deleted. Following
int *it = new int[10]
the
statement delete[] it
simply returns the memory pointed at by it
is
returned. Realize that, as a pointer is a primitive type, deleting a
dynamically allocated array of pointers to objects will not result in the
proper destruction of the objects the array's elements point at. So, the
following example results in a
memory leak:
string **sp = new string *[5]; for (size_t idx = 0; idx != 5; ++idx) sp[idx] = new string; delete[] sp; // MEMORY LEAK !In this example the only action performed by
delete[]
is to return an
area the size of five pointers to strings to the common pool.
Here's how the destruction in such cases should be performed:
delete
for each of the array's elements;
for (size_t idx = 0; idx != 5; ++idx) delete sp[idx]; delete[] sp;One of the consequences is of course that by the time the memory is going to be returned not only the pointer must be available but also the number of elements it contains. This can easily be accomplished by storing pointer and number of elements in a simple class and then using an object of that class.
Operator delete[]
is a different operator than operator
delete
. The
rule of thumb is: if
new[]
was used, also use
delete[]
.
renew
operator. The basic steps to take when enlarging an array are the
following:
#include <string> using namespace std; string *enlarge(string *old, unsigned oldsize, unsigned newsize) { string *tmp = new string[newsize]; // allocate larger array for (size_t idx = 0; idx != oldsize; ++idx) tmp[idx] = old[idx]; // copy old to tmp delete[] old; // delete the old array return tmp; // return new array } int main() { string *arr = new string[4]; // initially: array of 4 strings arr = enlarge(arr, 4, 6); // enlarge arr to 6 elements. }
The procedure to enlarge shown in the example also has several drawbacks.
newsize
constructors to be called;
oldsize
of them
are immediately reassigned to the corresponding values in the original array;
new
allocates the memory for an object and
subsequently initialize that object calling one of its constructors. Likewise,
operator delete
calls an object's destructor and subsequently returns the
memory allocated by operator new
to the common pool.
In the next section we'll encounter another use of new
, allowing us to
initialize objects in so-called
raw memory: memory merely consisting of
bytes that have been made available either by static or dynamic
allocation.
Raw memory is made available by
operator new(sizeInBytes)
. This should not be
interpreted as an array of any kind but just a series of memory locations that
were dynamically made available. operator new
returns a void *
so a
(static) cast is required to use it as memory of some type. Here are two
examples:
// room for 5 ints int *ip = static_cast<int *>(operator new(5 * sizeof(int))); // room for 5 strings string *sp = static_cast<int *>(operator new(5 * sizeof(string)));As
operator new
has no concept of data types the size of the intended
data type must be specified when allocating raw memory for a certain number of
objects of an intended type. The use of operator new
therefore somewhat
resembles the use of
malloc
.
The counterpart of operator new
is
operator delete
. Operator new
expects a void *
(so a pointer to any type can be passed to it). The
pointer is interpreted as a pointer to raw memory and is returned to the
common pool. Operator delete
does not call a destructor. The use of
operator delete
therefore resembles the use of
free
. To return the
memory pointed at by the abovementioned variables ip
and sp
operator delete
should be used:
// delete raw memory allocated by operator new operator delete(ip); operator delete(sp);
new
is called the
placement new
operator. Before using placement new
the
<memory>
header file must have been included.
With placement new
operator new
is provided with an existing block of
memory in which an object or value is initialized. The block of memory should
of course be large enough to contain the object, but apart from that no other
requirements exist. It is easy to determine how much memory is used by en
entity (object or variable) of type Type
: the
sizeof
operator returns the number of bytes required by an Type
entity. Entities may of course dynamically allocate memory for their own use.
Dynamically allocated memory, however, is not part of the entity's memory
`footprint' but it is always made available externally to the entity
itself. This is why sizeof
returns the same value when applied to
different string
objects returning different length and capacity values.
The placement new
operator uses the following syntax (using Type
to
indicate the used data type):
Type *new(void *memory) Type(arguments);Here,
memory
is a block of memory of at least sizeof(Type)
bytes
and Type(arguments)
is any constructor of the class Type
.
The placement new
operator is useful in situations where classes set
aside memory to be used later. This is used, e.g., by std::string
to
change its capacity. Calling string::reserve
may enlarge that capacity
without making memory beyond the string's length immediately available. But
the object itself may access its additional memory and so when information
is added to a string
object it can draw memory from its capacity rather
than having to perform a reallocation for each single addition of information.
Let's apply that philosophy to a class Strings
storing std::string
objects. The class defines a char *d_memory
accessing the memory holding
its d_size
string objects as well as d_capacity - d_size
reserved
memory. Assuming that a default constructor initializes d_capacity
to 1,
doubling d_capacity
whenever an additional string
must be stored, the
class must support the following essential operations:
reserve
) has been consumed;
string
object
Strings
object ceases to exist.
void Strings::reserve
, assuming d_capacity
has already
been given its proper value:
void Strings::reserve() { using std::string; string *newMemory = static_cast<string *>(memcpy( operator new(d_capacity), d_memory, d_size * sizeof(string) )); delete d_memory; d_memory = newMemory; }
The member append
adds another string
object to a Strings
object. A (public) member reserve(request)
ensures that the String
object's capacity is sufficient. Then the placement new
operator is used
to install the next string into the raw memory's appropriate location:
void Strings::append(std::string const &next) { reserve(d_size + 1); new (d_memory + d_size) std::string(next); ++d_size; }
At the end of the String
object's lifetime all its dynamically
allocated memory must be returned. This is the responsibility of the
destructor, as explained in the next section. The
destructor's full definition is postponed to that section, but its actions
when placement new
is involved can be discussed here.
With placement new
an interesting situation is encountered. Objects,
possibly themselves allocating memory, are installed in memory that may or may
not have been allocated dynamically, but that is definitely not
completely filled with such objects. So a simple delete[]
can't be used,
but a delete
for each of the objects that are available can't be used
either, since that would also delete the memory of the objects themselves,
which wasn't dynamically allocated.
This peculiar situation is solved in a peculiar way, only
encountered in cases where the placement new
operator has been used:
memory allocated by objects initialized using placement new
is returned by
explicitly calling the object's destructor.
The destructor is declared as a member having the class preceded by a tilde as
its name, not using any arguments. So, std::string
's destructor is named
~string
. The memory allocated by our class Strings
is therefore
properly destroyed as follows (in the example assume that using namespace
std
was specified):
for (string *sp = d_memory + d_size; sp-- != d_memory; ) sp->~string(); operator delete(d_memory);
So far, so good. All is well as long as we're using but one object. What
about allocating an array of objects? Initialization is performed as usual.
But as with delete
, delete[]
cannot be called when the buffer was
allocated statically. Instead, when multiple objects were initialized using
the placement new
operator in combination with a statically allocated
buffer all the objects' destructors must be called explicitly, as in the
following example:
char buffer[3 * sizeof(string)]; string *sp = new(buffer) string [3]; for (size_t idx = 0; idx < 3; ++idx) sp[idx].~string();
exit
call, only the destructors of
already initialized global objects are called. In that situation destructors
of objects defined locally by functions are also not called. This is
one (good) reason for avoiding exit
in C++ programs.
Destructors obey the following syntactical requirements:
class Strings { public: Strings(); ~Strings(); // the destructor };By convention the constructors are declared first. The destructor is declared next, to be followed by other member functions.
A destructor's
main task is to ensure that
memory allocated by an object is properly returned when the object ceases to
exist. Consider the following interface of the class Strings
:
class Strings { std::string *d_string; size_t d_size; public: Strings(); Strings(char const *const *cStrings, size_t n); ~Strings(); std::string const &at(size_t idx) const; size_t size() const; };
The constructor's task is to initialize the data fields of the object. E.g, its constructors are defined as follows:
Strings::Strings() : d_string(0), d_size(0) {} Strings::Strings(char const *const *cStrings, size_t size) : d_string(new string[size]), d_size(size) { for (size_t idx = 0; idx != size; ++idx) d_string[idx] = cStrings[idx]; }
As objects of the class Strings
allocate memory a destructor is
clearly required. Destructors may or may not be called automatically. Here are
the rules:
delete
using the object's address as its operand;
delete[]
using the address of the array's first element as its
operand;
new
is
activated by explicitly calling the object's destructor.
Strings
's
destructor would therefore be to delete the memory to which d_string
points. Its implementation is:
Strings::~Strings() { delete[] d_string; }
The next example shows Strings
at work. In process
a Strings store
is created, and its data are displayed. It returns a
dynamically allocated Strings
object to main
. A
Strings *
receives the address of the allocated object and deletes the
object again. Another Strings
object is then created in a block of
memory made available locally in main
, and an
explicit call to ~Strings
is required
to return the memory allocated by that object. In the example only once a
Strings
object is automatically destroyed: the local Strings
object defined by process
. The other two Strings
objects require
explicit actions to prevent memory leaks.
#include "strings.h" #include <iostream> using namespace std;; void display(Strings const &store) { for (size_t idx = 0; idx != store.size(); ++idx) cout << store.at(idx) << '\n'; } Strings *process(char *argv[], int argc) { Strings store(argv, argc); display(store); return new Strings(argv, argc); } int main(int argc, char *argv[]) { Strings *sp = process(argv, argc); delete sp; char buffer[sizeof(Strings)]; sp = new (buffer) Strings(argv, argc); sp->~Strings(); }
new
and delete
are used when an object or variable is
allocated. One of the advantages of the operators new
and
delete
over functions like
malloc
and
free
is that new
and
delete
call the corresponding object constructors and destructors.
The allocation of an object by operator new
is a two-step
process. First the memory for the object itself is allocated. Then its
constructor is called, initializing the object. Analogously to the
construction of an object, the destruction is also a two-step process: first,
the destructor of the class is called deleting the memory controlled by the
object. Then the memory used by the object itself is freed.
Dynamically allocated arrays of objects can also be handled by new
and
delete
. When allocating an array of objects using operator new
the
default constructor is called for each object in the array. In cases like this
operator
delete[]
must be used to ensure that the destructor is called for
each of the objects in array.
However, the addresses returned by new Type
and new Type[size]
are of identical types, in both cases a Type *
. Consequently it cannot be
determined by the type of the pointer whether a pointer to dynamically
allocated memory points to a single entity or to an array of entities.
What happens if delete
rather than delete[]
is used? Consider the
following situation, in which the destructor ~Strings
is modified so
that it tells us that it is called. In a main
function an array of two
Strings
objects is allocated by new
, to be deleted by delete
[]
. Next, the same actions are repeated, albeit that the delete
operator
is called without []
:
#include <iostream> #include "strings.h" using namespace std; Strings::~Strings() { cout << "Strings destructor called" << '\n'; } int main() { Strings *a = new Strings[2]; cout << "Destruction with []'s" << '\n'; delete[] a; a = new Strings[2]; cout << "Destruction without []'s" << '\n'; delete a; } /* Generated output: Destruction with []'s Strings destructor called Strings destructor called Destruction without []'s Strings destructor called */From the generated output, we see that the destructors of the individual
Strings
objects are called when delete[]
is used, while only the
first object's destructor is called if the []
is omitted.
Conversely, if delete[]
is called in a situation where delete
should have been called the results are unpredicable, and will most likely
cause the program to crash. This problematic behavior is caused by the way the
run-time system stores information about the size of the allocated array
(usually right before the array's first element). If a single object is
allocated the array-specific information is not available, but it is
nevertheless assumed present by delete[]
. This latter operator will
interpret bogus values before the array's first element as size information,
thus usually causing the program to fail.
If no destructor is defined, a trivial destructor is defined by the compiler. The trivial destructor ensures that the destructors of composed objects (as well as the destructors of base classes if a class is a derived class, cf. chapter 13) are called. This has serious implications: objects allocating memory will cause a memory leak unless precautionary measures are taken (by defining an appropriate destructor). Consider the following program:
#include <iostream> #include "strings.h" using namespace std; Strings::~Strings() { cout << "Strings destructor called" << '\n'; } int main() { Strings **ptr = new Strings* [2]; ptr[0] = new Strings[2]; ptr[1] = new Strings[2]; delete[] ptr; }This program produces no output at all. Why is this? The variable
ptr
is defined as a pointer to a pointer. The dynamically allocated array
therefore consists of pointer variables and pointers are of a primitive type.
No destructors exist for primitive typed variables. Consequently only the
array itself is returned, and no Strings
destructor is called.
Of course, we don't want this, but require the Strings
objects
pointed to by the elements of ptr
to be deleted too. In this case we have
two options:
ptr
array,
calling delete
for each of the array's elements. This procedure was
demonstrated in the previous section.
Strings
). Rather than using a pointer to a
pointer to Strings
objects a pointer to an array of wrapper-class
objects is used. As a result delete[] ptr
calls the destructor of each of
the wrapper class objects, in turn calling the Strings
destructor for
their d_strings
members. Example:
#include <iostream> using namespace std; class Strings // partially implemented { public: ~Strings(); }; inline Strings::~Strings() { cout << "destructor called\n"; } class Wrapper { Strings *d_strings; public: Wrapper(); ~Wrapper(); }; inline Wrapper::Wrapper() : d_strings(new Strings()) {} inline Wrapper::~Wrapper() { delete d_strings; } int main() { delete[] new Strings *[4]; // memory leak: no destructor called cout << "===========\n"; delete[] new Wrapper[4]; // OK: 4 x destructor called } /* Generated output: =========== destructor called destructor called destructor called destructor called */
new
. Operator new
's default behavior may
be modified in various ways. One way to modify its behavior is to redefine the
function that's called when memory allocation fails. Such a function
must comply with the following requirements:
void
.
A redefined error function might, e.g., print a message and terminate
the program. The user-written error function becomes part of the allocation
system through the function
set_new_handler
.
Such an error function is illustrated below ( This implementation applies to the Gnu C/C++ requirements. Actually using the program given in the next example is not advised, as it will probably slow down your computer enormously due to the resulting use of the operating system's swap area.):
#include <iostream> #include <string> #include <cstring> using namespace std; void outOfMemory() { cout << "Memory exhausted. Program terminates." << '\n'; exit(1); } int main() { long allocated = 0; set_new_handler(outOfMemory); // install error function while (true) // eat up all memory { memset(new int [100000], 0, 100000 * sizeof(int)); allocated += 100000 * sizeof(int); cout << "Allocated " << allocated << " bytes\n"; } }Once the new error function has been installed it is automatically invoked when memory allocation fails, and the program is terminated. Memory allocation may fail in indirectly called code as well, e.g., when constructing or using streams or when strings are duplicated by low-level functions.
So far for the theory. On some systems the ` out of memory' condition may actually never be reached, as the operating system may interfere before the run-time sypport system gets a chance to stop the program (see also this link).
The standard C functions allocating memory (like
strdup
,
malloc
,
realloc
etc.) do not trigger the new
handler when memory allocation
fails and should be avoided in C++ programs.
Person
:
class Person { char *d_name; char *d_address; char *d_phone; public: Person(); Person(char const *name, char const *addr, char const *phone); ~Person(); private: char *strdupnew(char const *src); // returns a copy of src. };
Person
's data members are initialized to zeroes or to copies of the
ASCII-Z strings passed to Person
's constructor, using some variant of
strdup
. Its destructor will return the allocated memory again.
Now consider the consequences of using Person
objects in the following
example:
void tmpPerson(Person const &person) { Person tmp; tmp = person; }Here's what happens when
tmpPerson
is called:
Person
as its parameter person
.
tmp
, whose data members are initialized
to zeroes.
person
is copied to tmp
:
sizeof(Person)
number of bytes are copied from person
to tmp
.
person
are pointers, pointing to allocated memory. After the
assignment this memory is addressed by two objects: person
and
tmp
.
tmpPerson
terminates: tmp
is
destroyed. The destructor of the class Person
releases the memory pointed
to by the fields d_name
, d_address
and d_phone
: unfortunately,
this memory is also pointed at by person
....
Having executed tmpPerson
, the object referenced by
person
now contains
pointers to deleted memory.
This is undoubtedly not a desired effect of using a function like
tmpPerson
. The deleted memory will likely be reused by subsequent
allocations. The pointer members of person
have effectively become
wild pointers, as they don't point to allocated
memory anymore. In general it can be concluded that
Person
object to another, is
not to copy the contents of the object bytewise. A better way is to
make an equivalent object. One having its own allocated memory containing
copies of the original strings.
The way to
assign a Person
object to another is
illustrated in Figure 5.
There are several ways to assign a Person
object to another. One way
would be to define a special member function to handle the assignment. The
purpose of this member function would be to create a copy of an object having
its own name
, address
and phone
strings. Such a member function
could be:
void Person::assign(Person const &other) { // delete our own previously used memory delete[] d_name; delete[] d_address; delete[] d_phone; // copy the other Person's data d_name = strdupnew(other.d_name); d_address = strdupnew(other.d_address); d_phone = strdupnew(other.d_phone); }Using
assign
we could rewrite the offending function tmpPerson
:
void tmpPerson(Person const &person) { Person tmp; // tmp (having its own memory) holds a copy of person tmp.assign(person); // now it doesn't matter that tmp is destroyed.. }This solution is valid, although it only tackles a symptom. It requires the programmer to use a specific member function instead of the assignment operator. The original problem (assignment produces wild pointers) is still not solved. Since it is hard to `strictly adhere to a rule' a way to solve the original problem is of course preferred.
Fortunately a solution exists using operator overloading: the
possibility C++ offers to redefine the actions of an operator in a given
context. Operator overloading was briefly mentioned earlier, when the
operators << and >> were redefined to be used with streams (like
cin
, cout
and cerr
), see section 3.1.4.
Overloading the assignment operator is probably the most common form of operator overloading in C++. A word of warning is appropriate, though. The fact that C++ allows operator overloading does not mean that this feature should indiscriminately be used. Here's what you should keep in mind:
Person
.
std::string
: assiging one string object to another provides the
destination string with a copy of the contents of the source string. No
surprises here.
int
s do. The way operators behave when applied to int
s is what is
expected, all other implementations probably cause surprises and confusion.
Therefore, overloading the insertion (<<) and extraction (>>)
operators in the context of streams is probably ill-chosen: the stream
operations have nothing in common with bitwise shift operations.
To overload the assignment operator =
, a member operator=(Class const
&rhs)
is added to the class interface. Note that the function name consists
of two parts: the keyword
operator
, followed by the operator itself. When
we augment a class interface with a member function operator=
, then that
operator is redefined for the class, which prevents the default operator
from being used. In the previous section the function
assign
was provided to solve the problems resulting from using the
default assignment operator. Rather than using an ordinary member
function C++ commonly uses a dedicated operator generalizing the
operator's default behavior to the class in which it is defined.
The assign
member mentioned before may be redefined as follows (the member
operator=
presented below is a first, rather unsophisticated, version of
the overloaded assignment operator. It will shortly be improved):
class Person { public: // extension of the class Person // earlier members are assumed. void operator=(Person const &other); };Its implementation could be
void Person::operator=(Person const &other) { delete[] d_name; // delete old data delete[] d_address; delete[] d_phone; d_name = strdupnew(other.d_name); // duplicate other's data d_address = strdupnew(other.d_address); d_phone = strdupnew(other.d_phone); }This member's actions are similar to those of the previously mentioned member
assign
, but this member is automatically called when the assignment
operator =
is used. Actually there are two ways to
call overloaded operators as shown in the next example:
void tmpPerson(Person const &person) { Person tmp; tmp = person; tmp.operator=(person); // the same thing }Overloaded operators are seldom called explicitly, but explicit calls must be used (rather than using the plain operator syntax) when you explicitly want to call the overloaded operator from a pointer to an object (it is also possible to dereference the pointer first and then use the plain operator syntax, see the next example):
void tmpPerson(Person const &person) { Person *tmp = new Person; tmp->operator=(person); *tmp = person; // yes, also possible... delete tmp; }
this
, to reach this substrate.
The this
keyword is a pointer variable that always contains the
address
of the object for which the member
function was called. The this
pointer is implicitly declared by each
member function (whether public, protected
, or private
). The this
ponter is a constant pointer to an object of the member function's
class. For example, the members of the class Person
implicitly declare:
extern Person *const this;A member function like
Person::name
could be implemented in two ways:
with or without using the this
pointer:
char const *Person::name() const // implicitly using `this' { return d_name; } char const *Person::name() const // explicitly using `this' { return this->d_name; }The
this
pointer is seldom explicitly used, but situations do exist
where the this
pointer is actually required (cf. chapter
16).
a = b = c;the expression
b = c
is evaluated first, and its result in turn is
assigned to a
.
The implementation of the overloaded assignment operator we've encountered
thus far does not permit such constructions, as it returns void
.
This imperfection can easily be remedied using the this
pointer. The
overloaded assignment operator expects a reference to an object of its
class. It can also return a reference to an object of its class. This
reference can then be used as an argument in sequential assignments.
The overloaded assignment operator commonly returns a reference to the
current object (i.e., *this
). The next version of the overloaded
assignment operator for the class Person
thus becomes:
Person &Person::operator=(Person const &other) { delete[] d_address; delete[] d_name; delete[] d_phone; d_address = strdupnew(other.d_address); d_name = strdupnew(other.d_name); d_phone = strdupnew(other.d_phone); // return current object as a reference return *this; }Overloaded operators may themselves be overloaded. Consider the
string
class, having overloaded assignment operators operator=(std::string const
&rhs), operator=(char const *rhs)
, and several more overloaded
versions. These additional overloaded versions are there to handle different
situations which are, as usual, recognized by their argument types. These
overloaded versions all follow the same mold: when necessary dynamically
allocated memory controlled by the object is deleted; new values are assigned
using the overloaded operator's parameter values and *this
is returned.
Strings
, introduced in section 8.2,
once again. As it contains several primitive type data members as well as a
pointer to dynamically allocated memory it needs a constructor, a destructor,
and an overloaded assignment operator. In fact the class offers two
constructors: in addition to the default constructor it offers a constructor
expecting a char const *const *
and a size_t
.
Now consider the following code fragment. The statement references are discussed following the example:
int main(int argc, char **argv) { Strings s1(argv, argc); // (1) Strings s2; // (2) Strings s3(s1); // (3) s2 = s1; // (4) }
s1
is initialized using main
's parameters: Strings
's second
constructor is used.
Strings
's
default constructor is used, initializing
an empty Strings
object.
Strings
object is created, using a
constructor accepting an existing Strings
object. This form of
initializations has not yet been discussed. It is called a
copy construction and the constructor performing the
initialization is called the copy constructor. Copy constructions are also
encountered in the following form:
Strings s3 = s1;This is a construction and therefore an initialization. It is not an assignment as an assignment needs a left-hand operand that has already been defined. C++ allows the assignment syntax to be used for constructors having only one parameter. It is somewhat deprecated, though.
The copy constructor encountered here is new. It does not result in a compilation error even though it hasn't been declared in the class interface. This takes us to the following rule:
A copy constructor is always available, even if it isn't declared in the class's interface.The copy constructor made available by the compiler is also called the trivial copy constructor. Starting with the C++0x standard it can easily be suppressed (using the
= delete
idiom). The trivial copy constructor
performs a byte-wise copy operation of the existing object's primitive data to
the newly created object, calls copy constructors to intialize the object's
class data members from their counterparts in the existing object and, when
inheritance is used, calls the copy constructors of the base class(es) to
initialize the new object's base classes.
Consequently, in the above example the trivial copy constructor is
used. As it performs a byte-by-byte copy operation of the object's
primitive type data members that is exactly what happens at statement 3.
By the time s3
ceases to exist its destructor will delete its array of
strings. Unfortunately d_string
is of a primitive data type and so it also
deletes s1
's data. Once again we encounter wild pointers as a result of an
object going out of scope.
The remedy is easy: instead of using the trivial copy constructor a copy constructor must explicitly be added to the class's interface and its definition must prevent the wild pointers, comparably to the way this was realized in the overloaded assignment operator. An object's dynamically allocated memory is duplicated, so that it will contain its own allocated data. The copy constructor is simpler than the overloaded assignment operator in that it doesn't have to delete previously allocated memory. Since the object is going to be created no previously allocated memory already exists.
Strings
's copy constructor can be implemented as follows:
Strings::Strings(Strings const &other) : d_string(new string[other.d_size]), d_size(other.d_size) { for (size_t idx = 0; idx != d_size; ++idx) d_string[idx] = other.d_string[idx]; }
The copy constructor is always called when an object is initialized using another object of its class. Apart from the plain copy construction that we encountered thus far, here are other situations where the copy constructor is used:
void process(Strings store) // no pointer, no reference { store.at(3) = "modified"; // doesn't modify `outer' } int main(int argc, char **argv) { Strings outer(argv, argc); process(outer); }
Strings copy(Strings const &store) { return store; }
store
is used to initialize copy
's return value. The returned
Strings
object is a temporary, anonymous object that may be
immediately used by code calling copy
but no assumptions can be made about
its lifetime thereafter.
As we've seen in our discussion of the destructor (section 8.2) the destructor can explicitly be called, but that doesn't hold true for the (copy) constructor. But let's briefly summarize what an overloaded assignment operator is supposed to do:
Strings &operator=(Strings const &other) { Strings tmp(other); // more to follow return *this; }The optimization
operator=(String tmp)
is enticing, but let's postpone
that for a little while (at least until section 8.6).
Now that we've done the copying part, what about the deleting part? And
isn't there another slight problem as well? After all we copied all right, but
not into our intended (current, *this
) object.
At this point it's time to introduce swapping. Swapping two variables
means that the two variables exchange their values. Many classes (e.g.,
std::string
) offer
swap
members allowing us to swap two of their
objects. The Standard Template Library (STL, cf. chapter 18) offers
various functions related to swappping. There is even a swap
generic
algorithm (cf. section 19.1.61). That latter algorithm, however, begs the
current question, as it is customarily implemented using the assignment
operator, so it's somewhat problematic to use it when implementing the
assignment operator.
As we've seen with the placement new
operator objects can be
constructed in blocks of memory of sizeof(Class)
bytes large. And so, two
objects of the same class each occupy sizeof(Class)
bytes. To swap these
objects we merely have to swap the contents of those sizeof(Class)
bytes. This procedure may be applied to classes whose objects may be
swapped using a member-by-member swapping operation and can also be used for
classes having reference data members. Here is its implementation for a
hypothetical class Class
, resulting in very fast swapping:
#include <cstring> void Class::swap(Class &other) { char buffer[sizeof(Class)]; memcpy(buffer, &other, sizeof(Class)); memcpy(&other, this, sizeof(Class)); memcpy(this, buffer, sizeof(Class)); }Let's add
void swap(Strings &other)
to the class Strings
and
complete its operator=
implementation:
Strings &operator=(Strings const &other) { Strings tmp(other); swap(tmp); return *this; }This
operator=
implementation is generic: it can be applied to every
class whose objects are directly swappable. How does it work?
other
object is used to initialize a
local tmp
object. This takes care of the copying part of the assignment
operator.
swap
ensures that the current object receives its new
values.
operator=
terminates its local tmp
object ceases to
exist and its destructor is called. But by now it contains the data previously
owned by the current object, so those data are now returned. Which takes
care of the destruction part of the assignment operation.
Moving information is based on the concept of anonymous (temporary)
data. Temporary values are returned by functions like operator-()
and
opertor+(Type const &lhs, Type const &rhs)
, and in general by functions
returning their results `by value' instead of returning references or
pointers.
Anonymous values are always short-lived. When the returned values are
primitive types (int, double
, etc.) nothing special happens, but if a
class-type object is returned by value then its destructor can be called
immediately following the function call that produced the value. In any case,
the value itself becomes inaccessible immediately after the call. Of course, a
temporary return value may be bound to a reference (lvalue or rvalue), but as
far as the compiler is concerned the value now has a name, which by itself
ends its status as a temporary value.
In this section we concentrate on anonymous temporary values and show how they can be used to improve the efficiency of object construction and assignment. These special construction and assignment methods are known as move construction and move assignment. Classes supporting move operations are called move aware.
Classes allocating their own memory usually benefit from becoming move-aware. But a class does not have to use dynamic memory allocation before it can benefit from move operations. Most classes using composition (or inheritance where the base class uses composition) can benefit from move operations as well.
Movable parameters for class Class
take the form Class const
&&tmp
. The parameter is a rvalue reference, and a rvalue reference only
binds to an anonymous temporary value. The compiler is required to call
functions offering movable parameters whenever possible. This happens when
the class defines functions supporting Class const &&
parameters and an
anonymous temporary value is passed to these functions. Once a temporary value
has a name (e.g., binding it to a const &
or const &&
) it is no longer
an anonymous temporary value, and the compiler will call the function
defining a Class const &
parameter instead.
Note that it is pointless to define a function having an rvalue reference return type. The compiler decides whether or not to use an overloaded member expecting an rvalue reference on the basis of the provided argument. If it is an anonymous temporary it will call the function defining the rvalue reference parameter, if such a function is available.
The compiler, when selecting a function to call applies a fairly simple algorithm, and also considers copy elision. This is covered shortly (section 8.7).
Strings
has, among other members a data member string
*d_string
. Clearly, Strings
should define a copy constructor, a
destructor and an overloaded assignment operator.
Now consider the following function loadStrings(std::istream &in)
extracting the strings for a Strings
object from in
. Next, the
Strings
object filled by loadStrings
is returned by value. The
function loadStrings
returns a temporary object, which can then used to
initialize an external Strings
object:
Strings loadStrings(std::istream &in) { Strings ret; // load the strings into 'ret' return ret; } // usage: Strings store(loadStrings(cin));In this example two full copies of a
Strings
object are required:
loadString
's value return type from its local
Strings ret
object;
store
from loadString
's return value
Strings
class move
constructor:
Strings(Strings const &&tmp);
Move constructors of classes using dynamic memory allocation are allowed
to assign the values of pointer data members to their own pointer data members
without requiring them to make a copy of the source's data. Next, the
temporary's pointer value is set to zero to prevent its destructor from
destroying data now owned by the just constructed object. The move constructor
has grabbed or
stolen the data from the temporary object. This is
OK as the temporary object cannot be referred to again (as it is anonymous, it
cannot be accessed by other code) and the temporary objects will cease to
exist shortly after the constructor's call. Here is the implementation of
Strings
move constructor:
Strings::Strings(Strings const &&tmp) : d_memory(tmp.d_memory), d_size(tmp.d_size), d_capacity(tmp.d_capacity) { const_cast<Strings &>(tmp.d_memory) = 0; }
Move operations cannot be implemented if the class type of a composed data
member does not support moving or copying. Currently, stream
classes fall
into this category.
An example of a move-aware class is the class std:string
. A class
Person
could use composition by defining std::string d_name
and
std::string d_address
. Its move constructor would then have the following
prototype:
Person(Person const &&tmp);
However, the following implementation of this move constructor is incorrect:
Person::Person(Person const &&tmp) : d_name(tmp.d_name), d_address(tmp.d_address) {}It is incorrect as it will call
string
's copy constructors rather than
string
's move constructors. If you're wondering why this happens then
remember that move operations are only performed for anonymous objects. To
the compiler anything having a name isn't anonymous. And so, by implication,
having available a rvalue reference does not mean that we're referring
to an anonymous object. But we know that the move constructor is only
called for anonymous arguments. To use the corresponding string
move
operations we have to inform the compiler that we're talking about anonymous
data members as well. For this a cast could be used (e.g.,
const_cast<Person &>(tmp)
), but the C++-0x standard provides the function
std::move
to
anonymize a named object. The correct implementation of
Person
's move construction is, therefore:
Person::Person(Person const &&tmp) : d_name( std::move(tmp.d_name) ), d_address( std::move(tmp.d_address) ) {}The function
std::move
is (indirectly) declared by many header files.
If no header is already declaring std::move
then include
utility
.
When a class using composition not only contains class type data members but also other types of data (pointers, references, primitive data types), then these other data types can be initialized as usual. Primitive data type members can simply be copied; references can be initialized as usual en pointers may use move operations as discussed in the previous section.
The compiler will not call move operations for variables having names. Let's
consider the implications of this by looking at the next example, assuming
Class
offers a move constructor and a copy constructor:
Class factory(); void fun(Class const &other); // a void fun(Class &&tmp); // b void callee(Class &&tmp); { fun(tmp); // 1 } int main() { callee(factory()); }
fun
's argument is not an anonymous temporary object but a
named temporary object.
fun(tmp)
might be called twice the compiler's choice is
understandable. If tmp
's data would have been grabbed at the first call,
the second call would receive tmp
without any data. But at the last call
we might know that tmp
is never used again and so we might like to ensure
that fun(Class &&)
is called. For this, once again, std::move
is used:
fun(std::move(tmp)); // last call!
Class &operator=(Class const &&tmp) { swap(const_cast<Class &>(tmp)); return *this; }If swapping is not supported then the assignment can be performed for each of the data members in turn, using
std::move
as shown in the previous
section with a class Person
. Here is an example showing how to do this
with that class Person
:
Person &operator=(Person const &&tmp) { d_name = std::move(tmp.d_name); d_address = std::move(tmp.d_address); return *this; }
If a class defines pointers to pointer data members there will usually not
only be a pointer that is moved, but also a size_t
defining the number of
elements in the array of pointers.
Once again, consider the class Strings
. Its destructor is implemented
like this:
Strings::~Strings() { for (string **end = d_string + d_size; end-- != d_string; ) delete *end; delete[] d_string; }The move constructor (and other move operations!) must realize that the distructor not only deletes
d_string
, but also considers d_size
. A
member implementing move operations should therefore not only set d_string
to zero but also d_size
. The previously shown move constructor for
Strings
is therefore incorrect. Its improved implementation is:
Strings::Strings(Strings const &&tmp) : d_memory(tmp.d_memory), d_size(tmp.d_size), d_capacity(tmp.d_capacity) { const_cast<Strings &>(tmp.d_memory) = 0; const_cast<Strings &>(tmp.d_size) = 0; }If operations by the destructor all depend on
d_string
having a
non-zero value then variations of the above approach are of course
possible. The move operations merely could decide to set d_memory
to 0,
and then test whether d_memory == 0
in the destructor (and if so, end the
destructor's actions), saving some d_size
assignments.
// assume char *filename ifstream inStream(openIstream(filename));For this example to work an
ifstream
constructor must offer a move
constructor. This way there will at any time be only one object referring to
the open istream
.
Once classes offer move semantics their objects can also safely be stored in standard containers. When such containers performs reallocation (e.g., when their sizes are enlarged) they will use the object's move constructors rather than their copy constructors. As move-only classes suppress copy semantics containers storing objects of move-only classes implement the correct behavior in that it is impossible to assign such containers to each other.
Class const
&
parameters a corresponding move-aware function expecting Class const &&
parameters should be considered.
const
keyword in Class const &&
parameters is their to
allow the function to be called from arguments defining Class const
return
types. Such const-aware functions are commonly encountered when binary
operators are overloaded (e.g., Class const operator+(Class const &lhs,
Class const &rhs)
). Since the returned value is a temporary value the
function receiving the anonymous Class const
object may modify it,
ignoring its const-ness, when performing a move-operation.
Below two tables are provided. The first table should be used in cases where a function argument has a name, the second table should be used in cases where the argument is anonymous. In each table select the const or non-const column and then use the topmost overloaded function that is available having the specified parameter type.
The tables do not handle functions defining value parameters. If a function has overloads expecting, respectively, a value parameter and some form of reference parameter the compiler reports an ambiguity when such a function is called. In the following selection procedure we may assume, without loss of generality, that this ambiguity does not occur and that all parameter types are reference parameters.
Parameter types matching a function's argument of type T
if the argument
is:
non-const | const |
(T &) | |
(T const &) | (T const &) |
int x
argument a function fun(int &)
is selected
rather than a function fun(int const &)
. If no fun(int &)
is available
the fun(int const &)
function is used. If neither is available the
compiler reports an error.
non-const | const |
(T &&) | |
(T const &&) | (T const &&) |
(T const &) | (T const &) |
int arg()
argument a function fun(int &&)
is
selected rather than a function fun(int const &&)
. If both functions are
unavailable but a fun(int const &)
is available, that function is used.
If none of these functions is available the compiler reports an error.
T const &
parameter. For anonymous arguments a
similar catch all is available having a higher priority: T const &&
matches all anonymous arguments. Thus, if named and anonymous arguments are to
be distinguished a T const &&
overloaded function will catch all
temporaries.
As we've seen the move constructor grabs the information from a temporary
for its own use. That is OK as the temporary is going to be destroyed after
that anyway. It also means that the temporary's data members are
modified. This modification can safely be considered a
non-mutating operation on the temporary. It may thus be modified even
if it was passed to a function specifying a T const &&
parameter. In cases
like these consider using a const_cast
to cast away the const-ness of the
rvalue reference. The Strings
move constructor encountered before might
therefore also have been implemented as follows, handling both Strings
and
Strings const
anonymous temporaries:
Strings::Strings(Strings const &&tmp) : d_string(tmp.d_string), d_size(tmp.d_size) { const_cast<Strings &>(tmp).d_string = 0; }Having defined appropriate copy and/or move constructors it may be somewhat surprising to learn that the compiler may decide to stay clear of a copy or move operation. After all making no copy and not moving is more efficient than copying or moving.
The option the compiler has to avoid making copies (or perform move operations) is called copy elision or return value optimization. In all situations where copy or move constructions are appropriate the compiler may apply copy elision. Here are the rules. In sequence the compiler considers the following options, stopping once an option can be selected:
class Elide; Elide fun() // 1 { Elide ret; return ret; } void gun(Elide par); Elide elide(fun()); // 2 gun(fun()); // 3
ret
may never exist. Instead of using ret
and copying
ret
eventually to fun
's return value it may directly use the area used
to contain fun
's return value.
fun
's return value may never exist. Instead of defining an
area containing fun
's return value and copying that return value to
elide
the compiler may decide to use elide
to create fun
's return
value in.
par
parameter: fun
's return value is directly created in par
's area, thus
eliding the copy operation from fun
's return value to par
.
double,
bool
and std::string
these three different data types may be aggregated
using a struct
that merely exists to pass along values. Data protection
and functionality is hardly ever an issue. For such cases C and C++
use structs
. But as a C++ struct
is just a class
with special
access rights some members (constructors, destructor, overloaded assignment
operator) may implicitly be defined. The plain old data capitalizes on this
concept by requiring that its definition remains as simple as
possible. Specifically the C++0x standard considers pod to be a class or
struct having the following characteristics:
A standard-layout class or struct
Furthermore, in the context of class derivation (cf. chapters 14 and 13), a standard-layout class or struct:
Classes having pointer data members, pointing to dynamically allocated memory controlled by the objects of those classes, are potential sources of memory leaks. The extensions introduced in this chapter implement the standard defense against such memory leaks.
Encapsulation (data hiding) allows us to ensure that the object's data integrity is maintained. The automatic activation of constructors and destructors greatly enhance our capabilities to ensure the data integrity of objects doing dynamic memory allocation.
A simple conclusion is therefore that classes whose objects allocate memory controlled by themselves must at least implement a destructor, an overloaded assignment operator and a copy constructor. Implementing a move constructor remains optional, but it allows us to use factory functions with classes not allowing copy construction and/or assignment.
In the end, assuming the availability of at least a copy or move constructor, the compiler might avoid them using copy elision. The compiler is free to use copy elision wherever possible; it is, however, never a requirement. The compiler may therefore always decide not to use copy elision. In all situations where otherwise a copy or move constructor would have been used the compiler may consider to use copy elision.