Chapter 12: Abstract Containers

Don't hesitate to send in feedback: send an e-mail if you like the C++ Annotations; if you think that important material was omitted; if you find errors or typos in the text or the code examples; or if you just feel like e-mailing. Send your e-mail to Frank B. Brokken.

Please state the document version you're referring to, as found in the title (in this document: 8.3.1) and please state chapter and paragraph name or number you're referring to.

All received mail is processed conscientiously, and received suggestions for improvements will usually have been processed by the time a new version of the Annotations is released. Except for the incidental case I will normally not acknowledge the receipt of suggestions for improvements. Please don't interpret this as me not appreciating your efforts.

C++ offers several predefined datatypes, all part of the Standard Template Library, which can be used to implement solutions to frequently occurring problems. The datatypes discussed in this chapter are all containers: you can put stuff inside them, and you can retrieve the stored information from them.

The interesting part is that the kind of data that can be stored inside these containers has been left unspecified at the time the containers were constructed. That's why they are spoken of as abstract containers.

Abstract containers rely heavily on templates, covered in chapter 20 and beyonf. To use abstract containers, only a minimal grasp of the template concept is required. In C++ a template is in fact a recipe for constructing a function or a complete class. The recipe tries to abstract the functionality of the class or function as much as possible from the data on which the class or function operates. As the data types on which the templates operate were not known when the template was implemented, the datatypes are either inferred from the context in which a function template is used, or they are mentioned explicitly when a class template is used (the term that's used here is instantiated). In situations where the types are explicitly mentioned, the angle bracket notation is used to indicate which data types are required. For example, below (in section 12.2) we'll encounter the pair container, which requires the explicit mentioning of two data types. Here is a pair object containing both an int and a string:

    pair<int, string> myPair;
The object myPair is defined as an object holding both an int and a string.

The angle bracket notation is used intensively in the upcoming discussion of abstract containers. Actually, understanding this part of templates is the only real requirement for using abstract containers. Now that we've introduced this notation, we can postpone the more thorough discussion of templates to chapter 20, and concentrate on their use in this chapter.

Most of the abstract containers are sequential containers: they contain data that can be stored and retrieved in some sequential way. Examples are the vector, implementing an extendable array; the list, implementing a datastructure that allows for the easy insertion or deletion of data the queue, also called a FIFO ( first in, first out) structure, in which the first element that is entered will be the first element that will be retrieved; and the stack, which is a first in, last out ( FILO or LIFO) structure.

In addition to sequential containers several special containers are available. The pair is a basic container in which a pair of values (of types that are left open for further specification) can be stored, like two strings, two ints, a string and a double, etc.. Pairs are often used to return data elements that naturally come in pairs. For example, the map is an abstract container storing keys and their associated values. Elements of these maps are returned as pairs.

A variant of the pair is the complex container, implementing operations that are defined on complex numbers.

All abstract containers described in this chapter as well as the string and stream datatypes (cf. chapters 5 and 6) are part of the Standard Template Library.

All containers support the following operators:

Note that before a user-defined type (usually a class-type) can be stored in a container, the user-defined type should at least support: With the advent of the C++0x standard sequential containers can also be initialized using initializer lists.

Most containers (exceptions are the stack (section 12.3.10), priority_queue (section 12.3.4), and queue (section 12.3.3) containers) support members to determine their maximum sizes (through their member function max_size).

Closely linked to the standard template library are the generic algorithms. These algorithms may be used to perform frequently occurring tasks or more complex tasks than is possible with the containers themselves, like counting, filling, merging, filtering etc.. An overview of generic algorithms and their applications is given in chapter 19. Generic algorithms usually rely on the availability of iterators, representing begin and end-points for processing data stored inside containers. The abstract containers usually support constructors and members expecting iterators, and they often have members returning iterators (comparable to the string::begin and string::end members). In this chapter the iterator concept is not further investigated. Refer to chapter 18 for this.

The url http://www.sgi.com/Technology/STL is worth visiting as it offers more exensive coverage of abstract containers and the standard template library than can be provided by the C++ annotations.

Containers often collect data during their lifetimes. When a container goes out of scope, its destructor tries to destroy its data elements. This only succeeds if the data elements themselves are stored inside the container. If the data elements of containers are pointers to dynamically allocated memory then the data pointed to by these pointers will not be destroyed, resulting in a memory leak. A consequence of this scheme is that the data stored in a container should often be considered the `property' of the container: the container should be able to destroy its data elements when the container's destructor is called. So, normally containers should not contain pointers to data. Also, a container should not be required to contain const data, as const data prevent the use of many of the container's members, like the assignment operator.

12.1: Notations used in this chapter

In this chapter about containers, the following notational conventions are used: Some containers, e.g., the map container, contain pairs of values, usually called `keys' and `values'. For such containers the following notational convention is used in addition:

12.2: The `pair' container

The pair container is a rather basic container. It is used to store two elements, called first and second, and that's about it. Before using pair containers the header file <utility> must have been included.

The pair's data types are specified when the pair object is defined (or declared) using the template's angle bracket notation (cf. chapter 20). Examples:

    pair<string, string> piper("PA28", "PH-ANI");
    pair<string, string> cessna("C172", "PH-ANG");
here, the variables piper and cessna are defined as pair variables containing two strings. Both strings can be retrieved using the first and second fields of the pair type:
    cout << piper.first << '\n' <<      // shows 'PA28'
            cessna.second << '\n';      // shows 'PH-ANG'
The first and second members can also be used to reassign values:
    cessna.first = "C152";
    cessna.second = "PH-ANW";
If a pair object must be completely reassigned, an anonymous pair object can be used as the right-hand operand of the assignment. An anonymous variable defines a temporary variable (which receives no name) solely for the purpose of (re)assigning another variable of the same type. Its generic form is
    type(initializer list)
Note that when a pair object is used the type specification is not completed by just mentioning the containername pair. It also requires the specification of the data types which are stored within the pair. For this the (template) angle bracket notation is used again. E.g., the reassignment of the cessna pair variable could have been accomplished as follows:
    cessna = pair<string, string>("C152", "PH-ANW");
In cases like these, the type specification can become quite elaborate, which has caused a revival of interest in the possibilities offered by the typedef keyword. If many pair<type1, type2> clauses are used in a source, the typing effort may be reduced and readability might be improved by first defining a name for the clause, and then using the defined name later. E.g.,
    typedef pair<string, string> pairStrStr;

    cessna = pairStrStr("C152", "PH-ANW");
Apart from this (and the basic set of operations (assignment and comparisons)) the pair offers no further functionality. It is, however, a basic ingredient of the upcoming abstract containers map, multimap and hash_map.

The C++0x standard offers a generalized pair container: the tuple, covered in section 21.5.5.

12.3: Sequential Containers

12.3.1: The `vector' container

The vector class implements an expandable array. Before using the vector container the <vector> header file must have been included.

The following constructors, operators, and member functions are available:

12.3.2: The `list' container

The list container implements a list data structure. Before using a list container the header file <list> must have been included.

The organization of a list is shown in figure 6.

Figure 6 is shown here.
Figure 6: A list data-structure


Igure 6 shows that a list consists of separate list-elements, connected by pointers. The list can be traversed in two directions: starting at Front the list may be traversed from left to right, until the 0-pointer is reached at the end of the rightmost list-element. The list can also be traversed from right to left: starting at Back, the list is traversed from right to left, until eventually the 0-pointer emanating from the leftmost list-element is reached.

As a subtlety note that the representation given in figure 6 is not necessarily used in actual implementations of the list. For example, consider the following little program:

    int main()
    {
        list<int> l;
        cout << "size: " << l.size() << ", first element: " <<
                l.front() << '\n';
    }
When this program is run it might actually produce the output:
    size: 0, first element: 0
Its front element can even be assigned a value. In this case the implementor has choosen to provide the list with a hidden element. The list actually is a circular list, where the hidden element serves as terminating element, replacing the 0-pointers in figure 6. As noted, this is a subtlety, which doesn't affect the conceptual notion of a list as a data structure ending in 0-pointers. Note also that it is well known that various implementations of list-structures are possible (cf. Aho, A.V., Hopcroft J.E. and Ullman, J.D., (1983) Data Structures and Algorithms (Addison-Wesley)).

Both lists and vectors are often appropriate data structures in situations where an unknown number of data elements must be stored. However, there are some rules of thumb to follow when selecting the appropriate data structure.

At present lists aren't as useful anymore as they used to be (when computers were much slower and more memory-constrained). Except maybe for some rare cases, a vector should be the preferred container; even when implementing algorithms traditionally using lists.

Other considerations related to the choice between lists and vectors should also be given some thought. Although it is true that the vector is able to grow dynamically, the dynamic growth requires data-copying. Clearly, copying a million large data structures takes a considerable amount of time, even on fast computers. On the other hand, inserting a large number of elements in a list doesn't require us to copy non-involved data. Inserting a new element in a list merely requires us to juggle some pointers. In figure 7 this is shown: a new element is inserted between the second and third element, creating a new list of four elements.

Figure 7 is shown here.
Figure 7: Adding a new element to a list


Removing an element from a list is also fairly easy. Starting again from the situation shown in figure 6, figure 8 shows what happens if element two is removed from our list. Again: only pointers need to be juggled. In this case it's even simpler than adding an element: only two pointers need to be rerouted.

Figure 8 is shown here.
Figure 8: Removing an element from a list


To summarize the comparison between lists and vectors: it's probably best to conclude that there is no clear-cut answer to the question what data structure to prefer. There are rules of thumb, which may be adhered to. But if worse comes to worst, a profiler may be required to find out what's best.

The list container offers the following constructors, operators, and member functions are available:

12.3.3: The `queue' container

The queue class implements a queue data structure. Before using a queue container the header file <queue> must have been included.

A queue is depicted in figure 9.

Figure 9 is shown here.
Figure 9: A queue data-structure


In figure 9 it is shown that a queue has one point (the back) where items can be added to the queue, and one point (the front) where items can be removed (read) from the queue. A queue is therefore also called a FIFO data structure, for first in, first out. It is most often used in situations where events should be handled in the same order as they are generated.

The following constructors, operators, and member functions are available for the queue container:

Note that the queue does not support iterators or a subscript operator. The only elements that can be accessed are its front and back element. A queue can be emptied by:

12.3.4: The `priority_queue' container

The priority_queue class implements a priority queue data structure. Before using a priority_queue container the <queue> header file must have been included.

A priority queue is identical to a queue, but allows the entry of data elements according to priority rules. A real-life priority queue is found, e.g., at airport check-in terminals. At a terminal the passengers normally stand in line to wait for their turn to check in, but late passengers are usually allowed to jump the queue: they receive a higher priority than other passengers.

The priority queue uses operator< of the data type stored in the priority queue to decide about the priority of the data elements. The smaller the value, the lower the priority. So, the priority queue could be used to sort values while they arrive. A simple example of such a priority queue application is the following program: it reads words from cin and writes a sorted list of words to cout:

#include <iostream>
#include <string>
#include <queue>
using namespace std;

int main()
{
    priority_queue<string> q;
    string word;

    while (cin >> word)
        q.push(word);

    while (q.size())
    {
        cout << q.top() << '\n';
        q.pop();
    }
}

Unfortunately, the words are listed in reversed order: because of the underlying <-operator the words appearing later in the ASCII-sequence appear first in the priority queue. A solution to that problem is to define a wrapper class around the string datatype, reversing string's operator<. Here is the modified program:

#include <iostream>
#include <string>
#include <queue>

class Text
{
    std::string d_s;

    public:
        Text(std::string const &str)
        :
            d_s(str)
        {}
        operator std::string const &() const
        {
            return d_s;
        }
        bool operator<(Text const &right) const
        {
            return d_s > right.d_s;
        }
};

using namespace std;

int main()
{
    priority_queue<Text> q;
    string word;

    while (cin >> word)
        q.push(word);

    while (q.size())
    {
        word = q.top();
        cout << word << '\n';
        q.pop();
    }
}

Other possibilities to achieve the same exist. One would be to store the contents of the priority queue in, e.g., a vector, from which the elements can be read in reversed order.

The following constructors, operators, and member functions are available for the priority_queue container:

Note that the priority queue does not support iterators or a subscript operator. The only element that can be accessed is its top element. A priority queue can be emptied by:

12.3.5: The `deque' container

The deque (pronounce: `deck') class implements a doubly ended queue data structure (deque). Before using a deque container the header file <deque> must have been included.

A deque is comparable to a queue, but it allows for reading and writing at both ends. Actually, the deque data type supports a lot more functionality than the queue, as will be clear from the following overview of available member functions. A deque is a combination of a vector and two queues, operating at both ends of the vector. In situations where random insertions and the addition and/or removal of elements at one or both sides of the vector occurs frequently using a deque should be considered.

The following constructors, operators, and member functions are available for deques:

12.3.6: The `map' container

The map class offers a (sorted) associative array. Before using a map container the <map> header file must have been included.

A map is filled with key/value pairs, which may be of any container-accepted type. Since types are associated with both the key and the value, we must specify two types in the angle bracket notation, comparable to the specification we've seen with the pair container (cf. section 12.2). The first type represents the key's type, the second type represents the value's type. For example, a map in which the key is a string and the value is a double can be defined as follows:

    map<string, double> object;
The key is used to access its associated information. That information is called the value. For example, a phone book uses the names of people as the key, and uses the telephone number and maybe other information (e.g., the zip-code, the address, the profession) as value. Since a map sorts its keys, the key's operator< must be defined, and it must be sensible to use it. For example, it is generally a bad idea to use pointers for keys, as sorting pointers is something different than sorting the values pointed at by those pointers.

The two fundamental operations on maps are the storage of Key/Value combinations, and the retrieval of values, given their keys. The index operator using a key as the index, can be used for both. If the index operator is used as lvalue, insertion will be performed. If it is used as rvalue, the key's associated value is retrieved. Each key can be stored only once in a map. If the same key is entered again, the new value replaces the formerly stored value, which is lost.

A specific key/value combination can implicitly or explicitly be inserted into a map. If explicit insertion is required, the key/value combination must be constructed first. For this, every map defines a value_type which may be used to create values that can be stored in the map. For example, a value for a map<string, int> can be constructed as follows:

    map<string, int>::value_type siValue("Hello", 1);
The value_type is associated with the map<string, int>: the type of the key is string, the type of the value is int. Anonymous value_type objects are also often used. E.g.,
    map<string, int>::value_type("Hello", 1);
Instead of using the line map<string, int>::value_type(...) over and over again, a typedef is frequently used to reduce typing and to improve readability:
    typedef map<string, int>::value_type StringIntValue
Using this typedef, values for the map<string, int> may now be constructed using:
    StringIntValue("Hello", 1);
Alternatively, pairs may be used to represent key/value combinations used by maps:
    pair<string, int>("Hello", 1);

12.3.6.1: The `map' constructors

The following constructors are available for the map container:

12.3.6.2: The `map' operators

The map supports, in addition to the standard operators for containers, the index operator.

The index operator may be used to retrieve or reassign individual elements of the map. The argument of the index operator is called a key.

If the provided key is not available in the map, a new data element is automatically added to the map using the default value or default constructor to initialize the value part of the new element. This default value is returned if the index operator is used as an rvalue.

When initializing a new or reassigning another element of the map, the type of the right-hand side of the assignment operator must be equal to (or promotable to) the type of the map's value part. E.g., to add or change the value of element "two" in a map, the following statement can be used:

    mapsm["two"] = MyClass();

12.3.6.3: The `map' public members

The following member functions are available for the map container:

12.3.6.4: The `map': a simple example

As mentioned at the beginning of section 12.3.6, the map represents a sorted associative array. In a map the keys are sorted. If an application must visit all elements in a map the begin and end iterators must be used.

The following example illustrates how to make a simple table listing all keys and values found in a map:

    #include <iostream>
    #include <iomanip>
    #include <map>

    using namespace std;

    int main()
    {
        pair<string, int>
            pa[] =
            {
                pair<string,int>("one", 10),
                pair<string,int>("two", 20),
                pair<string,int>("three", 30),
            };
        map<string, int>
            object(&pa[0], &pa[3]);

        for
        (
            map<string, int>::iterator it = object.begin();
                it != object.end();
                    ++it
        )
            cout << setw(5) << it->first.c_str() <<
                    setw(5) << it->second << '\n';
    }
    /*
        Generated output:
      one   10
    three   30
      two   20
    */

12.3.7: The `multimap' container

Like the map, the multimap class implements a (sorted) associative array. Before using a multimap container the header file <map> must have been included.

The main difference between the map and the multimap is that the multimap supports multiple values associated with the same key, whereas the map contains single-valued keys. Note that the multimap also accepts multiple identical values associated with identical keys.

The map and the multimap have the same set of member functions, with the exception of the index operator which is not supported with the multimap. This is understandable: if multiple entries of the same key are allowed, which of the possible values should be returned for object[key]?

Refer to section 12.3.6 for an overview of the multimap member functions. Some member functions, however, deserve additional attention when used in the context of the multimap container. These members are discussed below.

Although the functions lower_bound and upper_bound act identically in the map and multimap containers, their operation in a multimap deserves some additional attention. The next example illustrates lower_bound, upper_bound and equal_range applied to a multimap:
    #include <iostream>
    #include <map>
    using namespace std;

    int main()
    {
        pair<string, int> pa[] =
        {
            pair<string,int>("alpha", 1),
            pair<string,int>("bravo", 2),
            pair<string,int>("charley", 3),
            pair<string,int>("bravo", 6),   // unordered `bravo' values
            pair<string,int>("delta", 5),
            pair<string,int>("bravo", 4),
        };
        multimap<string, int> object(&pa[0], &pa[6]);

        typedef multimap<string, int>::iterator msiIterator;

        msiIterator it = object.lower_bound("brava");

        cout << "Lower bound for `brava': " <<
                it->first << ", " << it->second << '\n';

        it = object.upper_bound("bravu");

        cout << "Upper bound for `bravu': " <<
                it->first << ", " << it->second << '\n';

        pair<msiIterator, msiIterator>
            itPair = object.equal_range("bravo");

        cout << "Equal range for `bravo':\n";
        for (it = itPair.first; it != itPair.second; ++it)
            cout << it->first << ", " << it->second << '\n';
        cout << "Upper bound: " << it->first << ", " << it->second << '\n';

        cout << "Equal range for `brav':\n";
        itPair = object.equal_range("brav");
        for (it = itPair.first; it != itPair.second; ++it)
            cout << it->first << ", " << it->second << '\n';
        cout << "Upper bound: " << it->first << ", " << it->second << '\n';
    }
    /*
        Generated output:

        Lower bound for `brava': bravo, 2
        Upper bound for `bravu': charley, 3
        Equal range for `bravo':
        bravo, 2
        bravo, 6
        bravo, 4
        Upper bound: charley, 3
        Equal range for `brav':
        Upper bound: bravo, 2
    */
In particular note the following characteristics:

12.3.8: The `set' container

The set class implements a sorted collection of values. Before using set containers the <set> header file must have been included.

A set contains unique values (of a container-acceptable type). Each value is stored only once.

A specific value can be explicitly created: Every set defines a value_type which may be used to create values that can be stored in the set. For example, a value for a set<string> can be constructed as follows:

    set<string>::value_type setValue("Hello");
The value_type is associated with the set<string>. Anonymous value_type objects are also often used. E.g.,
    set<string>::value_type("Hello");
Instead of using the line set<string>::value_type(...) over and over again, a typedef is often used to reduce typing and to improve readability:
    typedef set<string>::value_type StringSetValue
Using this typedef, values for the set<string> may be constructed as follows:
    StringSetValue("Hello");
Alternatively, values of the set's type may be used immediately. In that case the value of type Type is implicitly converted to a set<Type>::value_type.

The following constructors, operators, and member functions are available for the set container:

12.3.9: The `multiset' container

Like the set, the multiset class implements a sorted collection of values. Before using multiset containers the header file <set> must have been included.

The main difference between the set and the multiset is that the multiset supports multiple entries of the same value, whereas the set contains unique values.

The set and the multiset have the same set of member functions. Refer to section 12.3.8 for an overview of the multiset member functions. Some member functions, however, behave slightly different than their counterparts of the set container. Those members are mentioned here.

Although the functions lower_bound and upper_bound act identically in the set and multiset containers, their operation in a multiset deserves some additional attention. With a multiset container lower_bound and upper_bound produce the same result for non-existing keys: they both return the first element having a key exceeding the provided key.

Here is an example showing the use of various member functions of a multiset:

    #include <iostream>
    #include <set>

    using namespace std;

    int main()
    {
        string
            sa[] =
            {
                "alpha",
                "echo",
                "hotel",
                "mike",
                "romeo"
            };

        multiset<string>
            object(&sa[0], &sa[5]);

        object.insert("echo");
        object.insert("echo");

        multiset<string>::iterator
            it = object.find("echo");

        for (; it != object.end(); ++it)
            cout << *it << " ";
        cout << '\n';

        cout << "Multiset::equal_range(\"ech\")\n";
        pair
        <
            multiset<string>::iterator,
            multiset<string>::iterator
        >
            itpair = object.equal_range("ech");

        if (itpair.first != object.end())
            cout << "lower_bound() points at " << *itpair.first << '\n';
        for (; itpair.first != itpair.second; ++itpair.first)
            cout << *itpair.first << " ";

        cout << '\n' <<
                object.count("ech") << " occurrences of 'ech'" << '\n';

        cout << "Multiset::equal_range(\"echo\")\n";
        itpair = object.equal_range("echo");

        for (; itpair.first != itpair.second; ++itpair.first)
            cout << *itpair.first << " ";

        cout << '\n' <<
                object.count("echo") << " occurrences of 'echo'" << '\n';

        cout << "Multiset::equal_range(\"echoo\")\n";
        itpair = object.equal_range("echoo");

        for (; itpair.first != itpair.second; ++itpair.first)
            cout << *itpair.first << " ";

        cout << '\n' <<
                object.count("echoo") << " occurrences of 'echoo'" << '\n';
    }
    /*
        Generated output:

        echo echo echo hotel mike romeo
        Multiset::equal_range("ech")
        lower_bound() points at echo

        0 occurrences of 'ech'
        Multiset::equal_range("echo")
        echo echo echo
        3 occurrences of 'echo'
        Multiset::equal_range("echoo")

        0 occurrences of 'echoo'
    */

12.3.10: The `stack' container

The stack class implements a stack data structure. Before using stack containers the header file <stack> must have been included.

A stack is also called a first in, last out ( FILO or LIFO) data structure as the first item to enter the stack is the last item to leave. A stack is an extremely useful data structure in situations where data must temporarily remain available. For example, programs maintain a stack to store local variables of functions: the lifetime of these variables is determined by the time these functions are active, contrary to global (or static local) variables, which live for as long as the program itself lives. Another example is found in calculators using the Reverse Polish Notation ( RPN), in which the operands of operators are kept in a stack, whereas operators pop their operands off the stack and push the results of their work back onto the stack.

As an example of the use of a stack, consider figure 10, in which the contents of the stack is shown while the expression (3 + 4) * 2 is evaluated. In the RPN this expression becomes 3 4 + 2 *, and figure 10 shows the stack contents after each token (i.e., the operands and the operators) is read from the input. Notice that each operand is indeed pushed on the stack, while each operator changes the contents of the stack.

Figure 10 is shown here.
Figure 10: The contents of a stack while evaluating 3 4 + 2 *


The expression is evaluated in five steps. The caret between the tokens in the expressions shown on the first line of figure 10 shows what token has just been read. The next line shows the actual stack-contents, and the final line shows the steps for referential purposes. Note that at step 2, two numbers have been pushed on the stack. The first number (3) is now at the bottom of the stack. Next, in step 3, the + operator is read. The operator pops two operands (so that the stack is empty at that moment), calculates their sum, and pushes the resulting value (7) on the stack. Then, in step 4, the number 2 is read, which is dutifully pushed on the stack again. Finally, in step 5 the final operator * is read, which pops the values 2 and 7 from the stack, computes their product, and pushes the result back on the stack. This result (14) could then be popped to be displayed on some medium.

From figure 10 we see that a stack has one location (the top) where items can be pushed onto and popped off the stack. This top element is the stack's only immediately visible element. It may be accessed and modified directly.

Bearing this model of the stack in mind, let's see what we formally can do with the stack container. For the stack, the following constructors, operators, and member functions are available:

The stack does not support iterators or a subscript operator. The only elements that can be accessed is its top element. To empty a stack:

12.3.11: Hash Tables (C++0x)

The C++0x standard officially adds hash tables to the language.

Before using hash table containers the header file <unordered_map> must have been included. Variants are hash-based sets and multi-sets. Before using these hash-based set containers the header file <unordered_set> must have been included.

As discussed, the map is a sorted data structure. The keys in maps are sorted using the operator< of the key's data type. Generally, this is not the fastest way to either store or retrieve data. The main benefit of sorting is that a listing of sorted keys appeals more to humans than an unsorted list. However, a by far faster method to store and retrieve data is to use hashing.

Hashing uses a function (called the hash function) to compute an (unsigned) number from the key, which number is thereupon used as an index in the table in which the keys are stored. Retrieval of a key is as simple as computing the hash value of the provided key, and looking in the table at the computed index location: if the key is present, it is stored in the table, and its value can be returned. If it's not present, the key is not stored.

Collisions occur when a computed index position is already occupied by another element. For these situations the abstract containers have solutions available. A simple solution, adopted by the C++0x standard is to use linear chaining which uses a linked list to store colliding table elements in.

In the C++0x standard the term unordered is used rather than hash to avoid name collisions with hash tables developed before the advent of the C++0x standard. Except where unordered is required as part of a type name, the term hash will be used here as it is the term commonly encountered.

Four forms of unordered data structures are supported: unordered_map, unordered_multimap, unordered_set, and unordered_multiset.

Below the unordered_map container is discussed. The other containers using hashing also use hashing but provide functionality corresponding to, respectively, the multimap, set and multiset.

Concentrating on the unordered_map, its constructor needs a key type, a value type, an object computing a hash value for the key, and an object comparing two keys for equality. Predefined hash functions are available for std::string keys, and for all standard scalar numeric types (char, short, int etc.). If another data type is used, a hash function object and an equality function object must be made available (see also section 10.10). Examples follow below.

The class implementing the hash function could be called hash. Its function call operator ( operator()) returns the (size_t) hash value of the key that it received as its argument.

A generic algorithm (see chapter 19) exists performing tests of equality (i.e., equal_to). These tests can be used if the key's data type supports the equality operator. Alternatively, an overloaded operator== or specialized function object could be constructed returning true if two keys are equal and false otherwise. Examples follow.

The unordered_map class implements an associative array in which the elements are stored according to some hashing scheme.

Constructors, operators and member functions available for the map are also available for the unordered_map. The map and unordered_map support the same set of operators and member functions. However, the efficiency of a unordered_map in terms of speed should greatly exceed the efficiency of the map. Comparable conclusions may be drawn for the unordered_set, unordered_multimap and the unordered_multiset.

Compared to the map container, the unordered_map has an additional constructor:

        unordered_map<...> hash(n);
where n is a size_t value. It is used to construct a unordered_map consisting of an initial number of at least n empty slots to put key/value combinations in. This number is automatically extended when needed.

The hashed key type is almost always text. So, a unordered_map in which the key's data type is a std::string occurs most often. Note that although a char * is allowed as key type this is almost always a bad idea since two char * variables pointing to equal C-strings stored at different locations will be considered different keys.

The following program defines a unordered_map containing the names of the months of the year and the number of days these months (usually) have. Then, using the subscript operator the days in several months are displayed. The equality operator used the generic algorithm equal_to<string>, which is the default fourth argument of the unordered_map constructor:

    #include <unordered_map>
    #include <iostream>
    #include <string>
    using namespace std;

    int main()
    {
        unordered_map<string, int> months;

        months["january"] = 31;
        months["february"] = 28;
        months["march"] = 31;
        months["april"] = 30;
        months["may"] = 31;
        months["june"] = 30;
        months["july"] = 31;
        months["august"] = 31;
        months["september"] = 30;
        months["october"] = 31;
        months["november"] = 30;
        months["december"] = 31;

        cout << "september -> " << months["september"] << '\n' <<
                "april     -> " << months["april"] << '\n' <<
                "june      -> " << months["june"] << '\n' <<
                "november  -> " << months["november"] << '\n';
    }
    /*
        Generated output:
    september -> 30
    april     -> 30
    june      -> 30
    november  -> 30
    */

A comparable example, showing the use of explicitly defined hash and equality functions and key-type char const *:

    #include <unordered_map>
    #include <iostream>
    #include <string>
    #include <cstring>
    using namespace std;

    struct EqualCp
    {
        bool operator()(char const *l, char const *r) const
        {
            return strcmp(l, r) == 0;
        }
    };
    struct HashCp
    {
        size_t operator()(char const *str) const
        {
            return hash<std::string const &>()(str);
        }
    };
    int main()
    {
        unordered_map<char const *, int, HashCp, EqualCp> months;

        months["april"] = 30;
        months["november"] = 31;

        string apr("april");    // different pointers, same string

        cout << "april     -> " << months["april"] << '\n' <<
                "april     -> " << months[apr.c_str()] << '\n';
    }

The unordered_multimap, unordered_set and unordered_multiset containers are used analogously. For these containers the equal and hash classes must also be defined. The unordered_multimap also requires the unordered_map header file.

12.3.12: Regular Expressions (C++0x, ?)

The C++0x standard adds handling of regular expressions to the language. Before using regular expressions as offered by the C++ standard the header file <regex> must have been included.

Regular expressions were already available in C++ via its C heritage as C has always offered functions like regcomp and regexec that are used by, e.g., the Pattern class of the Bobcat library.

Regular expressions are extensively documented elsewhere (e.g., regex(7), Friedl, J.E.F Mastering Regular Expressions, O'Reilly) and the reader is referred to these sources for a refresher on the topic of regular expressions.

The C++0x standard adds native object based support for regular expressions by defining several new classes and other facilities. Currently, however, regular expressions are not yet supported by the g++ library and therefore in this section only the basic building blocks the C++0x standard offers to handle regular expressions are mentioned. Once regular expressions actually become available this section will be updated to cover the actually available features.

Eventually, regular expressions will be represented by objects of the class regex. Once a regex regular expression object has been defined its member regex_search can be called to process its regular expression. This function expects arguments representing, respectively, the text against which the regular expression will be matched; an object of the class cmatch representing the results of the matching operation and an object of the class regex representing the used regular expression. Furthermore, a member regex_replace will be available to perform textual replacements based on regular expressions.

Regular expressions using the regex class are currently not yet available in the g++ library.

12.4: The `complex' container

The complex container defines the standard operations that can be performed on complex numbers. Before using complex containers the header file <complex> must have been included.

The complex number's real and imaginary types are specified as the container's data type. Examples:

    complex<double>
    complex<int>
    complex<float>
Note that the real and imaginary parts of complex numbers have the same datatypes.

When initializing (or assigning) a complex object, the imaginary part may be omitted from the initialization or assignment resulting in its value being 0 (zero). By default, both parts are zero.

Below it is silently assumed that the used complex type is complex<double>. Given this assumption, complex numbers may be initialized as follows:

Anonymous complex values may also be used. In the next example two anonymous complex values are pushed on a stack of complex numbers, to be popped again thereafter:
    #include <iostream>
    #include <complex>
    #include <stack>

    using namespace std;

    int main()
    {
        stack<complex<double>>
            cstack;

        cstack.push(complex<double>(3.14, 2.71));
        cstack.push(complex<double>(-3.14, -2.71));

        while (cstack.size())
        {
            cout << cstack.top().real() << ", " <<
                    cstack.top().imag() << "i" << '\n';
            cstack.pop();
        }
    }
    /*
        Generated output:
    -3.14, -2.71i
    3.14, 2.71i
    */

The following member functions and operators are defined for complex numbers (below, value may be either a primitve scalar type or a complex object):