Chapter 18: Concrete examples of C++

We're always interested in getting feedback. E-mail us if you like this guide, if you think that important material is omitted, if you encounter errors in the code examples or in the documentation, if you find any typos, or generally just if you feel like e-mailing. Send your email to Frank Brokken.

Please state the document version you're referring to, as found in the title (in this document: 4.4.2).

This chapter presents a number of concrete examples of programming in C++. Items from this document such as virtual functions, static members, etc. are rediscussed. Examples of container classes are shown.

Another example digs into the peculiarities of using a parser- and scanner-generator with C++. Once the input for a program exceeds a certain level of complexity, it's advantageous to use a scanner- and parser-generator for creating the code which does the actual input recognition. The example describes the usage of these tool in a C++ environment.

18.1: Storing objects: Storable and Storage

A reoccurring task of many programs is the storage of data, which are then sorted, selected, etc.. Storing data can be as simple as maintaining an array of ints, but can also be much more complex, such as maintaining file system information by the kernel of an operating system.

In this section we take a closer look at the storage of generic objects in memory (i.e., during the execution of a program). Conforming to the object-oriented recipe we shall develop two classes: a class Storage, which stores objects, and a class Storable, the prototype of objects which can be stored.

18.1.1: The global setup

As far as the functionality of the class Storage is concerned, objects can be added to the storage and objects can be obtained from the storage. Also it must be possible to obtain the number of objects in the storage.

As far as the internal data organization of the storage is concerned, we opt for an approach in which Storage maintains an array which can be reallocated, consisting of pointers to the stored objects.

The internal organization of the class Storage is illustrated in figure 17.

figure 17 is shown here.
figure 17: Internal organization of the class Storage.

18.1.1.1: Interface functions of the class Storage

The usage (interface) of the class Storage is contained in three member functions. The following list describes these member functions and mentions the class Storable, more on this later.

18.1.1.2: To copy or not to copy?

There are two distinct design alternatives for the function add(). These considerations address the choice whether the stored objects (the squares on the right side of figure 17) should be copies of the original objects, or the objects themselves.

In other words, should the function add() of the class Storage:

These considerations are not trivial. Consider the following example:


    Storage
        store;
    Storable
        something;

    store.add(something);           // add to storage

    // let's assume that Storable::modify() is defined
    something.modify();     // modify original object,

    Storable
        *retrieved = store.get(0); // retrieve from storage

    // NOW: is "*retrieved" equal to "something" ?!

If we choose to store (addresses of) the objects themselves, then at the end of the above code fragment, the object pointed to by retrieved will equal something. A manipulation of previously stored objects thereby alters the contents of the storage.

If we choose to store copies of objects, then obviously *retrieved will not equal something but will remain the original, unaltered, object. This approach has a great merit: objects can be placed into storage as a `safeguard', to be retrieved later when an original object was altered or even ceased to exist. In this implementation we therefore choose for this approach.

18.1.1.3: Who makes the copy?

The fact that copies of objects should be stored presents a small problem. If we want to keep the class Storage as universal as possible, then the making of a copy of a Storable object cannot occur here. The reason for this is that the actual type of the objects to store is not known in advance. A simplistic approach, such as the following:


    void Storage::add(Storable const *obj)
    {
        Storable
            *to_store = new Storable(*obj);
        // now add to_store instead of obj
        .
        .
    }

shall not work. This code attempts to make a copy of obj by using the operator new, which in turn calls the copy constructor of Storable. However, if Storable is only a base class, and the class of the object to store is a derived class (say, a Person), how can the copy constructor of the class Storable create a copy of a Person?

The making of a copy therefore must lie with the actual class of the object to store, i.e., with the derived class. Such a class must have the functionality to create a duplicate of the object in question and to return a pointer to this duplicate. If we call this function duplicate() then the code of the adding function becomes:


    void Storage::add(Storable const *obj)
    {
        Storable
            *to_store = obj->duplicate();
        // now add to_store instead of obj
        .
        .
    }

The function duplicate() is called in this example by using a pointer to the original object (this is the pointer obj). The class Storable is in this example only a base class which defines a protocol, and not the class of the actual objects which will be stored. Ergo, the function duplicate() need not be defined in Storable, but must be concretely implemented in derived classes. In other words, duplicate() is a pure virtual function.

18.1.2: The class Storable

Using the above discussed approach we can now define the class Storable. The following questions are of importance:

The class definition and its functions are given below:


    class Storable
    {
        public:
            virtual ~Storable();
            virtual Storable *duplicate() const = 0;
    };

    Storable::~Storable()
    {
    }

18.1.2.1: Converting an existing class to a Storable

To show how (existing) classes can be converted to derivation from a Storable, consider the below class Person from section 6.1. This class is re-created here, conforming to Storable's protocol (only the relevant or new code is shown):


    class Person: public Storable
    {
        public:
            // copy constructor
            Person(Person const &other);
    
            // assignment
            Person const &operator=(Person const &other);
    
            // duplicator function
            Storable *duplicate() const;

        .
        .
    };
 
When implementing the function Person::duplicate() we can use either the copy constructor or the default constructor with the overloaded assignment operator. The implementation of duplicate() is quite simple:

    // first version: 
    Storable *Person::duplicate() const
    {
        // uses default constructor in new Person
        Person
            *dup = new Person;

        // uses overloaded assignment in *dup = *this
        *dup = *this;

        return (dup);
    }

    // second version:
    Storable *Person::duplicate() const
    {
        // uses copy constructor in new Person(*this)
        return (new Person(*this));
    }
 
The above conversion from a class Person to the needs of a Storable supposes that the sources of Person are at hand and can be modified. However, even if the definition of a Person class is not available, but is e.g., contained in a run-time library, the conversion to the Storable format poses no difficulties:

    class StorablePerson: public Person, public Storable
    {
        public:
            // duplicator function
            Storable *duplicate() const;
    };

    Storable *StorablePerson::duplicate() const
    {
        return (new StorablePerson(*this));
    }

18.1.3: The class Storage

We can now implement the class Storage. The class definition is given below:


    class Storage: public Storable
    {
        public:
            // destructors, constructor
            ~Storage();
            Storage();
            Storage(Storage const &other);

            // overloaded assignment
            Storage const &operator=(Storage const &other);

            // functionality to duplicate storages
            Storable *duplicate() const;

            // interface
            void add(Storable *newobj);
            int nstored() const;
            Storable *get(int index);

        private:
            // copy/destroy primitives
            void destroy();
            void copy(Storage const &other);

            // private data
            int n;
            Storable **storage;
    };

Concerning the class definition we remark:

The destructor, constructors and the overloaded assignment function are listed below:


    // default constructor
    Storage::Storage()
    {
        n = 0;
        storage = 0;
    }

    // copy constructor
    Storage::Storage(Storage const &other)
    {
        copy(other);
    }

    // destructor
    Storage::~Storage()
    {
        destroy();
    }

    // overloaded assignment
    Storage const &Storage::operator=(Storage const &other)
    {
        if (this != &other)
        {
            destroy();
            copy(other);
        }
        return (*this);
    }

The primitive functions copy() and destroy() unconditionally copy another Storage object, or destroy the contents of the current one. Note that copy() calls duplicate() to duplicate the other's stored objects:


    void Storage::copy(Storage const &other)
    {
        n = other.n;
        storage = new Storable* [n];
        for (int i = 0; i < n; i++)
            storage [i] = other.storage [i]->duplicate();
    }

    void Storage::destroy()
    {
        for (register int i = 0; i < n; i++)
            delete storage [i];
        delete storage;
    }

The function duplicate(), which is required since Storage itself should be a Storable, uses the copy constructor to duplicate the current object:


    Storable *Storage::duplicate() const
    {
        return (new Storage (*this));
    }

Finally, here are the interface functions which add objects to the storage, return them, or determine the number of stored objects ( Note: the function realloc() that is used in this section should actually not be used. A better procedure would be to create a C++ variant for the realloc() function. A modification is in the pipeline....)


    void Storage::add(Storable const *newobj)
    {
        // reallocate storage array
        storage = (Storable **) realloc(storage,
                    (n + 1) * sizeof(Storable *));
        // put duplicate of newobj in storage
        storage [n] = newobj->duplicate();
        // increase number of obj in storage
        n++;
    }

    Storable *Storage::get(int index)
    {
        // check if index within range
        if (index < 0 || index >= n)
            return (0);
        // return address of stored object
        return (storage [index]);
    }

    int Storage::nstored() const
    {
        return (n);
    }

18.2: A binary tree

This section shows an implementation of a binary tree in C++. Analogously to the classes Storage and Storable (see section 18.1) two separate classes are used: one to represent the tree itself, and one to represent the objects which are stored in the tree. The classes will be appropriately named Tree and Node.

18.2.1: The Node class

The class Node is an abstract (pure virtual) class, which defines the protocol for the usage of derived classes with a Tree. Concerning this protocol we remark the following: The complete definition and declaration of the class Node is given below:
    class Node
    {
        public:
            virtual ~Node()                     // destructor
            {}
            virtual Node* clone() const = 0;    // duplicator
            virtual void process() = 0;         // Node processor
                                                // comparing 2 Nodes
            virtual int compare(Node const &other) const = 0;

            virtual void rejected(Node const &twice)  // called when the node
            {                                   // was found in the tree
            }
    };

18.2.2: The Tree class

The class Tree is responsible for the storage of objects which are derived from a Node. To implement the recursive tree structure, the class Tree has two private pointers as its data, pointing to subtrees: a Tree *left and Tree *right. The information which is contained in a node of the tree is represented as a private field Node *info.

Tree objects may be constructed empty and they may be constructed storing an initial Node object.

To scan a binary tree, the class Tree offers three methods: preorder, inorder and postorder. When scanning in preorder first the left subtree is scanned, then the leaf itself is processed and finally the right subtree is scanned. When scanning in inorder first a leaf in a node is processed, then the left subtree is scanned and finally the right subtree is scanned. When scanning in postorder first the left and right subtrees are scanned and then the leaf itself is processed.

The definition of the class Tree is given below:

    #include "node.h"

    class Tree
    {
        public:
            // destructor, constructors
            ~Tree();
            Tree();
            Tree(Node const &node);
            Tree(Tree const &other);

            // assignment
            Tree const &operator=(Tree const &other);

            // addition of a Node
            void add(Node const &node);

            // processing order in the tree
            void preorder_walk();
            void inorder_walk();
            void postorder_walk();

        private:
            // primitives
            void construct(Tree const &other);
            void destroy();

            // called by add(Node const &node)
            void add(Tree *&branch, Node const &node);

            // data
            Tree 
                *left, 
                *right;
            Node 
                *node;
    };

18.2.2.1: Constructing a tree

There are three constructors defined in the Tree class. The copy constructor is presented in the next section, the other two constructors are:

    #include "tree.h"

    // default constructor: initializes to 0
    Tree::Tree()
    :
        left(0),
        right(0),
        node(0)
    {
    }

    // Node constructor: add a Node object
    Tree::Tree(Node const &node)
    :
        left(0),
        right(0),
        node(node.clone())
    {
    }

18.2.2.2: The `standard' functions

As can be seen from the class definition, Tree contains pointer fields. This means that the class will need a destructor, a copy constructor and an overloaded assignment function to ensure that no allocation problems occur.

The destructor, the copy constructor and the overloaded assignment function are implemented with two primitive operations construct() and destroy() (presented later):

    #include "tree.h"

    // destructor: destroys the tree
    Tree::~Tree()
    {
        destroy();
    }

    // copy constructor: initializes to contents of other object
    Tree::Tree(Tree const &other)
    {
        construct(other);
    }

18.2.2.3: Adding an object to the tree

Adding a new object to the tree is a recursive process. When the function add() is called to insert an object into the tree, there are only three possibilities:

Here are the two add() functions:
    #include "tree.h"

    void Tree::add(Node const &newNode)
    {
        if (!node)
        {
            node = newNode.clone();
            return;
        }

        int
            cmp = node->compare(newNode);

        if (!cmp)               // already stored
        {
            node->rejected(newNode);
            return;
        }

        add
        (
            cmp < 0 ? left : right, newNode
        );
    }

    void Tree::add(Tree *&tree, Node const &newNode)
    {
        if (!tree)
            tree = new Tree(newNode);
        else
            tree->add(newNode);
    }

18.2.2.4: Scanning the tree

The class Tree offers three methods of scanning a binary tree: preorder, inorder and postorder. The three functions defining these actions are recursive:

    #include "tree.h"

    void Tree::preorder_walk()
    {
        if (left)
            left->preorder_walk();
        if (node)
            node->process();
        if (right)
            right->preorder_walk();
    }

    void Tree::inorder_walk()
    {
        if (node)
            node->process();
        if (left)
            left->inorder_walk();
        if (right)
            right->inorder_walk();
    }

    void Tree::postorder_walk()
    {
        if (left)
            left->postorder_walk();
        if (right)
            right->postorder_walk();
        if (node)
            node->process();
    }

18.2.2.5: The primitive operations copy() and destroy()

The functions copy() and destroy() are two private member functions which implement primitive operations of the class Tree: the copying of the contents of another Tree or the destroying of the tree.

    #include "tree.h"

    void Tree::destroy()
    {
        delete node;
        if (left)
            delete left;
        if (right)
            delete right;
    }

    void Tree::construct(Tree const &other)
    {
        node = other.node ? other.node->clone() : 0;
        left = other.left ? new Tree(*other.left) : 0;
        right = other.right ? new Tree(*other.right) : 0;
    }

Concerning this implementation we remark the following:

18.2.3: Using Tree and Node

We illustrate the usage of the classes Tree and Node by a program that counts words in files. Words are defined, rather blandly, as series of characters, separated by white spaces. The program shows which words are present in which file, and how many times.

Below is the listing of a class Strnode. This class is derived from Node and implements the virtual functions. Note how this class implements the counting of words; when a given word occurs more than one time, Tree will call the member function rejected(). This function simply increases the private counter variable times.

#include <fstream>
#include <iomanip>
#include <string>

#include "tree.h"

class Strnode: public Node
{
    public:
        Strnode(string const &s)
        :
            str(s),
            times(1)
        {
        }
        Node* clone() const
        {
            return (new Strnode(*this));
        }
        int compare (Node const &other) const
        {
            return 
            (
                str.compare
                (
                    static_cast<Strnode const &>(other).str
                )
            );
        }
        void process ()
        {
            if (times)
                cout << setw(20) << str.c_str() << ": " << 
                        setw(3) << times << endl;
        }
        void rejected(Node const &node)
        {
            ++times;
        }
    private:
        string
            str;
        unsigned
            times;
};

void countfile(istream &inf)
{
    Tree
        tree;
    string
        word;

    while (inf >> word)
    {
        Strnode
            next(word);
       
        tree.add(next);
    }
    tree.inorder_walk();
}

int main (int argc, char **argv)
{
    int
        exitstatus = 0;
    
    if (argc > 1)
    {
        for (int i = 1; i < argc; ++i)
        {
            ifstream
                inf(argv[i]);

            if (!inf)
            {
                cerr << "can't open " << argv[i] << endl;
                exitstatus++;
                continue;
            }
            countfile(inf);
        }
    }
    else
        countfile (cin);

    return (exitstatus);
}

18.3: Classes to process program options

Programs usually can be given options by which the program can be configured to a particular task. Often programs have sensible default values for their options. Given those defaults, a resource file may be used to overrule the options that were hard-coded into the program. The resource file is normally used to configure the program to the specific needs of a particular computer system. Finally, the program can be given command-line options, by which the program can be configured to its task during one particular run.

In this section we will develop a set of classes starting from the class Configuration, whose objects can process a great variety of options. Actually, we'll start from a small demo program, in which an object of the class Configuration is used. From there, the class Configuration will be developed, working our way down to the auxiliary classes that are used with the Configuration class.

The resulting program will be available as a zip-file containing the sources and (Linux) binary program at our ftp-site. The zip-archive contains all the sources and auxiliary files for creating the program, as well as an icmake build script.

18.3.1: Functionality of the class Configuration

What functionality must a Configuration object have?

Maybe of similar importance as the functionality the object can perform is what the object can not perform:

What if we accidently try to use a copy-constructor or (overloaded) assignment operator? Those situations will be covered by the following trick: we will mention a copy constructor and an overloaded assignment operator in the interface of the class, but will not implement it. The compiler will, where needed, happily generate code calling these two functions, but the program can't be linked, since the copy constructor and the overloaded assignment operator aren't available. Thus we prevent the accidental use of these functions. This approach is used also with other, auxiliary, classes.

Now that we've specified the functionality we're ready to take a look at the interface.

18.3.1.1: The interface of the class Configuration

Here is the full interface of the class Configuration. In the interface, we recognize the functions we required when specifying the functionality of the class: the constructor, destructor, and the (not to be implemented) copy constructor and overloaded assignment operator.

To process the resource file we have loadResourceFile(), the command-line options are processed by loadCommandLineOptions(). Next we see two plain accessors: programName() will return the plain program name, while resourceFile() will return the name of the resource file. To obtain the number of command-line arguments that are available when all command-line options have been processed we have argc(). The arguments themselves are obtained by overloaded index operator, using an unsigned argument. Finally, options can be obtained by name: for this another overloaded index operator is available, this time using a string (char const *) for its argument.

The private section contains data: variables to access argc and argv, using reference-type variables; variables to store the program- and resource filenames, and two Hashtables (the class Hashtable will be covered in section 18.3.6) containing, respectively, the precompiled options and the command-line options.

Here is the interface of the class Configuration:

#ifndef _Configuration_H_
#define _Configuration_H_

#include "../hashtable/hashtable.h"

class Option;

class Configuration
{
    public:
        Configuration(int &argc, char const **&argv, int initialCap = 20,
                    double maxLoadFactor = 0.75);
    
        ~Configuration();

        Configuration(Configuration const &other);            // NI
        Configuration &operator=(Configuration const &right); // NI
    
        void loadResourceFile(char const *fname);
        void loadCommandLineOptions();
        char const *programName();      // name of the program
        char const *resourceFile();     // name of used resourcefile
        unsigned argc() const;          // count beyond [0], c.q. options
                                        // returns argv[index] | 0
                                        // also beyond [0] c.q. options
                                                        // option [name]
        Option const * operator[](char const *name) const;
        char const *operator[](unsigned index) const;   // argument[index]
        
    private:
        int
            argcShift,
            &argC;
        char const
            **&argv;
        char 
            *progName;            
        Hashtable
            optionTable,
            cmdLineOption;
        char
            *resourceFilename;
};

#include <string.h>
#include "../option/option.h"
#include "../string/string.h"
#include "../mem/mem.h"
#include "../ustream/ustream.h"
#include "../stringtokenizer/stringtokenizer.h"

#endif  _Configuration_H_

18.3.1.2: An example of a program using the class Configuration

Below we present the source of the demonstration program. The program sets up the memoryhandler, to make sure that failing memory allocations will be noticed.

Next, a configuration object is created. This object is passed to an auxiliary function showing us interesting aspects of the object (showConfigurationInformation()). Although this function tells us things about the Configuration object, it was not made part of the class, since it was specifically designed in the context of the demonstration program, without adding any real functionality to the Configuration class.

Having displayed the raw information stored in the Configuration object, the resource-file is loaded. This might alter the values of the program-parameters, of which there are four in the demonstration program. Having loaded the resourcefile, the contents of the Configuration object are shown again.

Then, the command-line options (if any) are processed, followed by yet another display of the contents of the Configuration object.

Here is the source of the demonstration program:

#include "demo.h"

int main(int argc, char const **argv)
{
    Mem::installNewHandler();

    Configuration
        config(argc, argv);

    showConfigurationInformation(config, "After constructing 'config'");

    config.loadResourceFile("demo.rc");

    showConfigurationInformation(config, "After reading demo.rc");

    config.loadCommandLineOptions();

    showConfigurationInformation(config, 
            "After processing command-line options");

    return (0);
}

18.3.2: Implementation of the class Configuration

18.3.2.1: The constructor

The constructor of the class Configuration expects argc and argv as reference-type variables. Apart from these two, tho extra parameters are defined, for which the interface defines default values: initialCap defines the initial capacity of the hashtables that are used by the Configuration object, and maxLoadFactor defining the maximum load percentage of the hashtables. So, with the default parameters the hashtables would be enlarged once more than 15 elements are stored in them.

Having initialized the reference variables and the hashtables the options are stored in the hashtables for fast access. The Option-class function nextOptionDefinition() produces a sequence of all options that are defined for the program. Each option's name and value is stored in the optionTable hashtable, and each option's command-line character and name is stored in the cmdLineOption hashtable. Therefore, the values of options can be retrieved immediately, given the name of the option, while the option's command-line character can be used to produce the name of the option, which can then be used in a second step to obtain the value of the option.

Here is the source of the constructor:


#include "configuration.h"

Configuration::Configuration(int &argCount, char const **&argVector, 
                             int initialCap, double maxLoadFactor)
:
    argC(argCount),
    argv(argVector),
    optionTable(initialCap, maxLoadFactor),
    cmdLineOption(initialCap, maxLoadFactor)
{
    resourceFilename = Mem::strdup("");

    Option
        *option;

    while ((option = Option::nextOptionDefinition()))
    {
        String
            *name = new String(option->getName());

        optionTable.put(name, option);

        String const
            *cmdopt =  &(option->getCmdLineOption());
                
        if (strlen(*cmdopt))
            cmdLineOption.put(new String(*cmdopt), new String(*name)); 
    }

    char const 
        *cp = strrchr(argv[0], '/');

    progName = 
        Mem::strdup
        (
            !cp ?
                argv[0]
            :
                cp + 1
        );

    argcShift = 1;
}








18.3.2.2: loadResourceFile()

The function loadResourceFile() processes a unix-style resource-files. In these files, empty lines are ignored, as well as information on a line beyond hash-marks (#) if these hashmarks are preceded by the beginning of the line or white space. Long lines may be stretched out over several lines by adding a continuation character (the backslash (\)) at the end of each line that continues on the next line.

To obtain the remaining lines of the configuration file, loadResourceFile() creates a Ustream object. The class Ustream was specifically designed for the processing of unix-style resource-files. As this class doesn't add much to the understanding of the Configuration-class its interface and implementation is not discussed in the annotations. Rather, interface and implementation is found in the configdemo.zip file at our ftp-site.

The processing of the information in the configuration file is based on the assumption that all information on a line is organized as follows:

With respect to this format, each key is looked up in the optionTable. If found, the value of the option is set to the key's value. Otherwise, if the key is not found, a warning message is written, by catching the exception thrown by the hashtable when it receives an undefined option-name.

Apart from the Ustream object, the function loadResourceFile() also uses a StringTokenizer object, which splits lines from the Ustream file into words. The first word is interpreted as key, while the function range(index) produces the unsplit line beyond word index. The class StringTokenizer is also found in the distributed zip-file.

18.3.2.3: loadCommandLineOptions()

The function loadCommandLineOptions() uses the function getopt() which is available on unix systems to retrieve command-line options (and possibly their values) and to separate them from the remaining command-line arguments. The function getopt() expects (among other arguments) a string of command-line option letters, which are possibly followed by a colon. If a colon is following a command-line option, then information trailing the command-line option character or the next command-line argument is interpreted as the value of the command-line option. E.g., a command-line option character specified as n: may be specified on the command-line as -n20 or -n 20.

The function Hashtable::catKeys() is used to obtain a list of command-line option characters. Next, the options are extracted from the command-line arguments using getopt(). When an option has been found, the cmdLineOption hashtable is used to obtain the name of the option, then the optionTable hashtable is used to obtain a pointer to the option.

Next the option receives a new value, through the virtual function assign(). This function is available for all options, and allows loadCommandLineOptions() to assign a new value to an option irrespective of the actual type of the option.

Here is the code of the function loadCommandLineOptions():

#include "configuration.h"

void  Configuration::loadCommandLineOptions()
{
    String
        list;

    cmdLineOption.catKeys(list);

    register int
        optionChar;
    String
        opt;
    register char
        *cp;                    
    opterr = 0;                         // no error messages from getopt() 
    while                               // while options are found
    (
        (optionChar = getopt(argC, (char *const *)argv, list)) != -1
        &&
        (cp = strchr(list, optionChar))
    )
    {
        opt = " :";

        opt[0] = (char)optionChar;      // create option-string
        if (cp[1] != ':')               // no option value ?
            opt[1] = 0;                 // then remove ':' from opt.

        Option                          // get the configuration option
            *option = (Option *)optionTable[cmdLineOption[&opt]];

        option->assign(optarg);         // assign the value
    }
    argcShift = optind;                 // first non-option index in argv 
}






18.3.3: The class Option

The class Option is designed as an abstract base class, defining the protocol to which all derived classes must adhere. Derived classes representing logical values (Boolean), integer values (Int), real values (Double) and textstrings (Text) will be constructed later on.

The class itself is derived from another abstract base class, Object. Pointers to Objects are stored in, e.g., Hashtables.

The class Option (cf. section 18.3.3.1), has a constructor, expecting an option name and the specification of a command-line parameter, and a virtual destructor to be able to deleting memory allocated by derived class objects through an Option pointer.

Default implementations returning the logical, int, double and textvalues of options are available as well. These implementations are replaced in derived classes by member functions returning the real, rather than the default, value of the derived class' object.

Since the options must be storable in a hashtable, and since the hashtable must be able to compare two different object for equality, abstract members hashCode() and equals() are available, to be implemented in the derived class' objects.

The name and command-line option are obtained via two accessor functions: getName() and getCmdLineOption(), respectively.

To assign a value to an option one more function must be implemented by derived class options: assign(), to assign a value to an option.

The static Option *nextOptionDefinition() member function returns a pointer to an object of a class derived from Option. The returned option is constructed by a function that can be called from an element of the

static Option *(*optionConstructor[])(Mold const &mold)
array of pointers to functions returning pointers to Options. Each of these functions expects a reference to a Mold struct.

An array of these structs must be available as static Mold mold[]. The Mold array allows us to specify as data the ingredients of any option we require in our program. In other words: by defining the elements of an array Option::Mold Option::mold[] all kinds of program-options and their default values. can easily be defined.

For example, in our demonstration program four program options were defined, representing a logical value, an integer value, a real value and a textual string. Note that the following mold[] array is defined as data:

#include "../demo.h"

Option::Mold Option::mold[] =
{
    {Boolean,   "colors",   "c",    "True"},
    {Int,       "trials",   "n:",   "20"},
    {Double,    "epsilon",  "e:",   "0.004"},
    {Text,      "files",    0,      "ls -Fla"},
    {},
};

The last element of the mold[] array is an empty struct, acting as a sentinel. The remaining lines (refer to the struct Mold in the interface of the class Option) contain four elements:

18.3.3.1: The interface of the class Option

Here is the complete interface of the abstract base class Option:

#ifndef _Option_H_
#define _Option_H_

#include "../string/string.h"

class Option: public Object
{
    public:
        Option(char const *name, char const *cmdLineOpt);
        ~Option();

        virtual int         BoolValue()     const; 
        virtual int         IntValue()      const; 
        virtual double      DoubleValue()   const; 
        virtual char const *TextValue()     const; 

        unsigned    hashCode()                      const;
        int         operator==(Object const &other) const;

        String const
            &getName() const,
            &getCmdLineOption() const;

        virtual void assign(char const *string) = 0;

        static Option *nextOptionDefinition();
    protected:                           
        enum Type
        {
            Sentinel,
            Int,
            Double,
            Text,
            Boolean,
        };

    private:
        struct Mold
        {
            Type
                optionType;
            char
                *name,
                *cmdLineOption,
                *defaultValue;
        };

        static Mold 
            mold[];

        static Option *(*optionConstructor[])(Mold const &mold);
                                    
        String
            name,
            cmdLineName;
};

#include <strstream.h>
#include "../booloption/booloption.h"
#include "../intoption/intoption.h"
#include "../doubleoption/doubleoption.h"
#include "../textoption/textoption.h"

#endif  _Option_H_


18.3.3.2: The static member nextOptionDefinition

The static member function nextOptionDefinition() is called repeatedly until it returns 0. The function visits all elements of the mold[] array, calling the static function optionConstructor associated with the option-type of the element of the array mold[] that is visited.

The variable optionConstructor[] is an array, which is initialized as data of the class Option. The elements of the optionConstructor[] array are pointers to Constructor() functions of all the derived classes. These functions construct actual derived class option objects, and expect the ingredients for the construction as a reference to a Mold struct.

The function nextOptionDefinition() is:

#include "option.h"

Option *Option::nextOptionDefinition()
{
    static unsigned
        index = 0;

    if (mold[index].optionType == Sentinel)
        return (0);

    Option
        *option = 
            optionConstructor[mold[index].optionType]
            (mold[index]);

    index++;
    return (option);
}

The array optionConstructor[] is initialized as follows:

#include "option.h"

Option *(*Option::optionConstructor[])(Mold const &mold) =
{
    0,
    IntOption::Constructor,
    DoubleOption::Constructor,
    TextOption::Constructor,
    BoolOption::Constructor,
};
Note that in this initialization reflects the ordering of the Option::Type enum. There is no constructor for the Sentinel enum-value, while the remaining elements contain the addresses for the different derived-class option types.

18.3.4: Derived from Option: The class TextOption

Below (in section 18.3.4.1) the interface of the class TextOption, derived from Option, is given. The class contains implementations of all the pure virtual functions of the class Option, and it mentions the existence of a copy constructor and overloaded assignment operator. However, these functions are (once again) not to be used, and are mentioned here as a safeguard against their being used accidently.

The interesting part of the interface is the function static Option *Constructor(Mold const &mold): it constructs a TextOption object (through TextOption's constructor), using the ingredients it encounters in the Mold it receives as its argument. Note that the prototype of Constructor corresponds to the prototype of the elements of the array Option::optionConstructor[]. As we have seen (in section 18.3.3.2), Option:optionConstructor[Text] has been given the value TextOption::Constructor, thus setting up the connection between an option-type and the constructor for such an option from the ingredients found in an Option::Mold.

The other three classes derived from the class Option are constructed similarly. The reader is referred to their interfaces and implementation in the zip-archive in our ftp-site.

18.3.4.1: The interface of the class TextOption

Here is the interface of the class TextOption, derived from Option:


#ifndef _TextOption_H_
#define _TextOption_H_

#include "../option/option.h"

class TextOption: public Option
{
    public:
        static Option *Constructor(Mold const &mold);
        TextOption(char const *name, char const *cmdLineOpt, 
                   char const *initialValue);
        ~TextOption();

        TextOption(TextOption const &other);                // NI
        TextOption &operator=(TextOption const &other);     // NI

        void assign(char const *str);
        char const *TextValue() const;
        char const *toString() const;
    private:
        char 
            *value;
};

#include "../mem/mem.h"

#endif  _TextOption_H_

18.3.4.2: The implementation of the assign() function

As an example of an implementation of an assign() function, we present the function TextOption::assign(). As defined by the interface of the class Option, this function has one parameter, a char const *str. It needs to perform only two tasks: First, the old value of the TextOption object is deleted, then a new value is assigned. Corresponding assign() functions are available for the other derived option classes.

Here is the implementation of TextOption::assign():

#include "textoption.h"

void TextOption::assign(char const *str)
{
    delete value;
    value = Mem::strdup(str);
}

18.3.5: The class Object

The class Object is an abstract base class. Pointers to Objects are be stored in Hashtables. The class is a very simple one, containing a virtual destructor (doing nothing in particular), and requiring the implementation of three pure virtual functions: Here is the interface of the class Object:
#ifndef _Object_H_
#define _Object_H_

class Object
{
    public:
        virtual ~Object();
        
        virtual int         operator==(Object const &other) const = 0;
        virtual unsigned    hashCode()                  const = 0;
        virtual char const *toString()                  const = 0;
};

#endif  _Object_H_

18.3.6: The class Hashtable

The class Hashtable is used to store and retrieve objects of classes derived from the class Object. The class contains two pointers to vectors of pointers to Objects, containing the keys and values that are stored in the hashtable. Furthermore, the class has data-members holding the actual number of elements that are stored in the hashtable (n), the number of elements of the two vectors of pointers to Objects (capacity), the original number of elements of these vectors (initialCapacity) and the maximum proportion of elements of the vectors that may be occupied (maxLoadFactor).

The Hashtable objects are self-expanding. Once maxLoadFactor threatens to be exceeded, the table is expanded automatically.

The functionality of the hashtable includes members for retrieving values of the objects stored in the table using either the name of a key (as a char const *) or a pointer to an Object; a member to add a new key/value pair to the table, and a utility member catKeys() returning a string containing the catenated names of all keys. This latter function is used by the Option::nextOptionDefinition() to tell getopt() what command-line option characters it can expect.

The interface of the class Hashtable also shows some private member functions, used for expanding the table, and for inserting and retrieving elements from the table. Some of these functions are covered in the following discussion. Functions not needing special attention are available in the zip-archive.

Here is the interface of the class Hashtable:


#ifndef _Hashtable_H_
#define _Hashtable_H_

#include "../string/string.h"

class Object;

class Hashtable
{
    public:
        Hashtable(int initialCapacity, double maxLoadFactor = 0.75);
        ~Hashtable();

        Hashtable(Hashtable const &other);                  // NI
        Hashtable const &operator=(Hashtable const &other); // NI

        Object const *operator[](Object const *key) const; 
        Object const *operator[](char const *key) const; 
        Object const *put(Object *key, Object *value);  // returns value

        void catKeys(String &target);               // catenate the keys
                                                    // as strings
        char const *toString() const;
    private:
        void installVectors(int capacityRequest);
        int lookup(Object const *key) const;    // key must exist
        int mayInsert(Object *key);             // key might not exist

                                            // the key in the table
        int expanded();                     // 1 if table was expanded

        unsigned
            capacity,
            initialCapacity,
            n;
        double
            maxLoadFactor;
        Object
            **keys,
            **values;
};

#include <unistd.h>
#include <stdlib.h>

#include "../option/option.h"

#endif  _Hashtable_H_



18.3.6.1: The Hashtable constructor

The constructor of the hashtable initializes the data-members of the table, and then calls installVectors() to initialize the keys and values vectors. Here is the constructor of the class Hashtable:

#include "hashtable.h"

Hashtable::Hashtable(int iniCap, double maxFactor)
{
    maxLoadFactor = maxFactor;
    n = 0;
    initialCapacity = iniCap;

    capacity = 0;
    keys = 0;
    values = 0;

    installVectors(initialCapacity);
}

The function installVectors() simply creates two vectors of the required number of elements (i.e., capacity), initializing the vectors with null-pointers.

18.3.6.2: The function mayInsert()

The functions mayInsert() returns the index of a key that is stored in the hashtable. The difference with the function lookup() is that the function lookup() requires the key to be available in the hashtable, whereas the function mayInsert() will insert the key when it isn't available yet.

If the function lookup() doesn't find the key in the table, it throws a char const * exeption, containing the name of the key. The exception is thereupon caught by the function Configuration::loadResourceFile(). The function mayInsert(), however, will try to insert a non-existing key into the hashtable.

Before looking for a key, both lookup() and mayInsert() first determine an initial hashcode, using the key's hashCode() function. A simple add-the-hash rehash scheme is used to cope with collisions. The add-the-hash value is at least 1 and at most the current capacity minus one. Using a prime-sized hashtable, this ensures that all elements of the hashtable are visited by repeatedly adding the add-the-hash value to the index value that was last used.

The insertion process itself consists of a perpetual loop, that terminates when the index of the key in the hashtable has been determined.

If an empty element of the key vector is hit, expand() is called, which may enlarge the hashtable. If the table was enlarged, both the hashcode and the add-the-hash value of the actual key are recomputed, and the perpetual loop starts its next cycle. Otherwise, the key is entered at the empty element's position, and its index value is returned.

If the key is found in the vector of keys, then the corresponding index position is returned. Alternatively, a collision may occur, and the index value is incremented by the add-the-hash value, followed by the next cycle of the perpetual loop.

Thus, the lookup() and mayInsert() functions return the index of the provided key. Apart from that, lookup() will throw an exception when the provided key isn't found in the table.

Here is the sourcetext of the function mayInsert():

#include "hashtable.h"

//  addTheHash is set in the range 1 .. capacity - 1, and the initial
//  index is made equal to the addTheHash value. Since addTheHash is non-zero
//  a new index computed by adding the addTheHash value to the index will 
//  always get another value. The zeroth index of the hashtable will only be
//  used as the result of a collision, but that doesn't matter: hashtables
//  aren't filled up completely anyway.

int Hashtable::mayInsert(Object *key)
{
    unsigned
        hashCode = key->hashCode();
    register unsigned
        addTheHash = 1 + hashCode % (capacity - 1),
        index = addTheHash;                 // within the capacity range

    while (1)
    {
        if (!keys[index])                   // empty slot ?
        {
            if (expanded())                 // hashtable was expanded ?
            {
                addTheHash = 1 + hashCode % (capacity - 1);
                index = addTheHash;         // new index after expansion

                continue;                   // restart the checking
            }
            keys[index] = key;              // place the key here
            ++n;                            // n contains #-elements

            return (index);                 // and produce its index
        }
                                        
        if (*keys[index] == *key)       // same object ?
            return (index);                 // return its index
    
        if ((index += addTheHash) >= capacity)   // collision: try next entry
            index -= capacity;
    }
}

18.3.6.3: The function expanded()

The function expanded() first checks the loadfactor of the hashtable: if the actual number of elements divided by the capacity of the table exceeds maxLoadFactor, the current keys and values vectors are saved, and new vectors containing initialCapacity extra elements are installed.

Next, the elements of the old keys vector are visited. If a non-empty element is found, that element and its value are stored in the hashtable using the function put(). This process continues until n elements (the number of non-empty elements in the old vectors) are stored in the enlarged table. Since the function put() owns the objects that its arguments point to (i.e., Object *s rather than Object const *s are used, the objects the elements of the old vectors point to must not be deleted. Therefore, at the end of the function expanded() the old keys and values vectors are simply deleted, disregarding the objects their elements point to.

18.3.7: Auxiliary classes

The classes we've covered so far rely on the specific functionality of other classes. The memory management class Mem is a good example: while standard functions are available for the allocation of memory, these functions reduce to the function malloc(), and not to the operator new. Since the operator new can be protected by the set_new_handler() function, it's a good idea to duplicate the popular standard memory allocating functions based on malloc() by functions using new.

Another example is found in the class Util, containing functions we think are useful, but which we could not place conceptually easy in other classes. For example, the utility class contains a function prime() returning a prime number.

The following utility classes are available:

The Mem and Util classes contain just static member functions, and do not require objects to be used. For the other classes objects must be defined.

The next sections will cover the interfaces of these classes. The implementation of the functions of these classes is found in the zip-archive at our ftp-site.

18.3.7.1: The class Mem

The class Mem contains functions related to the allocation of memory, using the operator new. Using new, it is easy to catch exhausted dynamic memory through the function set_new_handler().

The class contains functions to install a new-handler, to duplicate and concatenate strings, to compare strings, and to reallocate memory. As all these functions are static, there is no need to create a Mem object.

The function realloc() isn't a particularly elegant attempt to make available a function that resembles the standard malloc()-based realloc() function. Actually, in the demonstration program it's used only by the StringTokenizer constructor. However, by making it a member of the latter class, we feel we would mix up memory allocation with string handling.

The Mem::realloc() function does a rather crude job: it should be used only for enlarging the required amount of memory, in which case the extra allocated memory remains completely uninitialized.

The other member functions are implemented in a standard way. Most of them accept null-pointers as arguments as well. Here is the interface of the class Mem:

#ifndef _Mem_H_
#define _Mem_H_

class Mem
{
    public:
        static void installNewHandler();
        static char *strdup(char const *str);   
        static int casecmp(char const *s1, char const *s2);
        static int cmp(char const *s1, char const *s2);
        static char *strndup(char const *str, unsigned len);
        static char *strcat(char const *src1, char const *src2);
        static void *realloc(void *addressOfPointerToOldData,
                            unsigned dataSize, unsigned oldN,
                            unsigned newN);
    private:
        static void memoryExhausted();
};

#include <iostream.h>
#include <new.h>
#include <string.h>

#endif  _Mem_H_








18.3.7.2: The class String

Objects of the class String represent strings: 0-delimited series of ascii-characters. The class is derived from Object, so String objects can be stored in Hashtables.

Apart from the functions required by the class Object, the class String contains all standard members, like a copy constructor and a overloaded assignment operators. Apart from these members, there is a conversion operator, allowing the use of a String object as a char const *, and there are members for enlarging the string by catenating another string to it, and for retrieving a character using the index-operator.

Here is the interface of the class String:

#ifndef _String_H_
#define _String_H_

#include <iostream.h>
#include <stdarg.h>

#include "../object/object.h"

class String: public Object
{
    public:
        String();
        String(char const *arg);
        ~String();            
    
        String(String const &other);
        String &operator=(String const &rvalue);
        String &operator=(char const *rvalue);
    
        int operator==(Object const &other) const;
        unsigned hashCode() const;
        char const *toString() const;
        
        operator char const *() const;
        String &strcat(char const *str2);
        char &operator[](unsigned index);
    private:
        char
            *string;
};  

#include "../mem/mem.h"
#include "../hashtable/hashtable.h"

#endif  _String_H_




18.3.7.3: The class StringTokenizer

The class StringTokenizer is used for breaking up strings into substrings according to a (set of) delimiters. By default, the white-space delimiters are used. The constructor of the class expects an ascii-z string (and optionally a string of delimiter-characters) and will split the string into substrings according to the set of delimiters.

The substrings are retrievable through the overloaded index-operator, returning pointers to String objects, which are then owned by the calling function. Another member function is range(), returning the substring starting at a particular index-position. For example, if StringTokenizer st contains five substrings, st.range(3) will return the substring of the original string starting at st[3].

Here is the interface of the class StringTokenizer:


#ifndef _StringTokenizer_H_
#define _StringTokenizer_H_

#include "../string/string.h"

class StringTokenizer
{
    public:
        StringTokenizer(char const *cp, char const *delimiters = " \t\n");
        ~StringTokenizer();

        StringTokenizer(StringTokenizer const &other);              // NI
        StringTokenizer &operator=(StringTokenizer const &other);   // NI

        String *operator[](unsigned index);
        String *range(unsigned from);       // until the last one

    private:
        struct SubString
        {
            char
                *str;
            unsigned
                length;
        };  

        char
            *str;
        SubString
            *subString;
        unsigned
            n;
};

#endif  _StringTokenizer_H_





18.3.7.4: The class Ustream

The class Ustream processes files as unix-like configuration files. In these files empty lines are ignored, as is information starting at a hash-mark at the beginning of a line or preceded by a white-space character. Furthermore, lines are combined if the last character of a line is a backslash.

The constructor of the class expects one argument: the name of the file to be processed. Having created a Ustream object, the conversion operator operator void *() can be used to determine the successful opening of the file: it returns 0 if the file wasn't opened successfully.

The (non-empty, non-comment section of) lines of the file are returned by the member read() as a char *: the line is owned by the calling function. Calling read() succeeds until a null-pointer is returned.

After a successful read-operation, the member-function lineNr() will return the actual linenumber of the just read line in the original file. In this case empty and comment-lines are counted.

The file is closed when the Ustream object is destroyed.

Here is the interface of the class Ustream:


#ifndef _Ustream_H_
#define _Ustream_H_

#include <fstream.h>

class Ustream
{
    public:
        Ustream(char const *fname);
    
        Ustream(Ustream const &other);                  // NI
        Ustream const &operator=(Ustream const &right); // NI
    
        operator void *();              // direct status-check
    
        char *read();                   // 0 if no more lines
        int lineNr();
        
    private:
        ifstream
            stream;
        int
            line;
};

#include "../mem/mem.h"


#endif  _Ustream_H_

18.3.7.5: The class Util

The class Util contains several utility functions, which did not belong elsewhere. The functions atod() and atoi() convert, respectively, strings to doubles and strings to ints, and they differ from the standard functions atof() and atoi() only by the fact that the Util functions accept null-pointers as well.

The function prime() uses the sieve of Aristosthenes to generate the first prime exceeding the value given as its argument.

The function hashPjw() returns a hashvalue for a string. This algorithm is given in Aho, Sethi, and Ullman's Compilers: Principles, Techniques and Tools, 1986, p. 435 as P. J. Weinberger's algorithm for computing hash-values.

The interface of the class Util is given below:

#ifndef _Util_H_
#define _Util_H_

#include <values.h>
    // uses INTBITS to find the # of bits in a word, hence in an int

class Util
{
    public:
        static double atod(char const *value);      // convert to double
        static int atoi(char const *value);         // convert to int
        static unsigned prime(unsigned lowerBound); // first prime exceeding 
                                                    // lowerBound
        static unsigned hashPjw(char const *key);   // return hashvalue
    private:
        int const 
            bitsPerInt = INTBITS,
            moduloMask = bitsPerInt - 1;
        static int
            shiftBitsPerInt;
};

#include <stdlib.h>
#include <string.h>
#include <math.h>

#endif  _Util_H_

18.4: Using Bison and Flex

The example discussed in this section digs into the peculiarities of using a parser- and scanner-generator with C++. Once the input for a program exceeds a certain level of complexity, it's advantageous to use a scanner- and parser-generator for creating the code which does the actual input recognition. The example about this topic assumes that the reader knows how to use the scanner generator flex and the parser generator bison. Both bison and flex are well documented elsewhere. The original predecessors of bison and flex, called yacc and lex are described in several books, e.g. in O'Reilly's book `lex & yacc'.

However, the scanner and parser generators are also (and maybe even more commonly, nowadays) available as free software. Both bison and flex can be obtained from prep.ai.mit.edu/pub/gnu. Flex will create a C++ class when called as flex++, or when the -+ flag is used. With bison the situation is a bit more complex. Scattered over the Internet several bison++ archives can be found (e.g., in rzbsdi01.uni-trier.de). The information in these archives usually dates back to 1993, irrespective of the version number mentioned with the archive itself. (However, the given ftp-archive also contains dos-executables, for those who are interested....)

Using flex++ and bison++ a class-based scanner and parser can be generated. The advantage of this approach is that the interface to the scanner and the parser tends to become a bit cleaner than without using the class interface.

Below two examples are given. In the first example only a lexical scanner is used to monitor the production of a file from several parts. This example focuses on the lexical scanner, and on switching files while churning through the parts. The second example uses both a scanner and a parser to transform standard arithmetic expressions to their postfix notation, commonly encountered in code generated by compilers and in HP-calculators. The second example focuses on the parser.

18.4.1: Using Flex++ to create a scanner

In this example a lexical scanner is used to monitor the production of a file from several parts. This example focuses on the lexical scanner, and on switching files while churning through the parts. The setup is as follows: The input-language knows of an #include statement, which is followed by a string indicating the file which should be included at the location of the #include.

In order to avoid complexities that have nothing to do with the current example, the format of the #include statement is restricted to the form #include <filepath>. The file specified between the pointed brackets should be available at the location indicated by filepath. If the file is not available, the program should terminate using a proper error message.

The program is started with one or two filename arguments. If the program is started with just one filename argument, the output is written to the standard output stream cout. Otherwise, the output is written to the stream whose name is given as the program's second argument.

The program uses a maximum nesting depth. Once the maximum is exceeded, the program terminates with an appropriate error message. In that case, the filenamestack indicating where which file was included should be printed.

One minor extra feature is that comment-lines should be recognized: include directives in comment-lines should be ignored, comment being the standard C++ comment-types.

The program is created in the following steps:

18.4.1.1: The flex++ specification file

The organization of the lexical scanner specification file is similar to the one used with flex. However, flex++ creates a class (yyFlexLexer) from which the class Scanner will be derived.

The code associated with the regular expression rules will be located inside the class yyFlexLexer. However, it would be handy to access the member-functions of the derived class within that code. Fortunately, class derivation and inheritance helps us to realize this. In the specification of the class yyFlexLexer(), we notice that the function yylex() is a virtual function. In the FlexLexer.h header file we see virtual int yylex():


    class yyFlexLexer: public FlexLexer 
    {
        public:
            yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 );
    
            virtual ~yyFlexLexer();
    
            void yy_switch_to_buffer( struct yy_buffer_state* new_buffer );
            struct yy_buffer_state* yy_create_buffer( istream* s, int size );
            void yy_delete_buffer( struct yy_buffer_state* b );
            void yyrestart( istream* s );
    
            virtual int yylex();
            virtual void switch_streams( istream* new_in, ostream* new_out );
    
        protected:
            ...
    };
    
Consequently, if yylex() is defined in a derived class, then this function of the derived class will be called from a base class (i.e., yyFlexLexer) pointer. Since the yylex() function of the derived class is called, that function will have access to the members of its class, and to the public and protected members of its base class.

The context in which the generated scanner is placed is (by default) the function yyFlexLexer::yylex(). However, this context can be changed by defining the YY_DECL-macro. This macro, if defined, determines the context in which the generated scanner will be placed. So, in order to make the generated scanner part of the derived class function yylex(), three things must be done:

The definition of the YY_DECL macro and the yyFlexLexer::yylex() function can conveniently be placed in the lexer specification file, as shown below.

Looking at the regular expressions themselves, notice that we'll need rules for the recognition of the comment, for the include directive, and for the remaining characters. This is all fairly standard practice. When an include directive is detected, the derived-class' member function switchSource() is called, which will perform the required file switching. When the end of the file (EOF) is detected, the derived class' member function popSource() is called, which will pop the previous previously pushed file, returning 1. Once the file-stack is empty, the function will return 0, resulting in the call of yyterminate(), which will terminate the scanner.

The lexical scanner specification file has three sections: a C++ preamble, containing code which can be used in the code defining the actions to be performed once a regular expression is matched, a Flex++ symbol area, which is used for the definition of symbols, like a mini scanner, or options, like %option yylineno when the lexical scanner should keep track of the line numbers of the files it is scanning and, finally a rules section, in which the regular expressions and their actions are given. In the current example, the lexer should mainly copy information from the istream *yyin to the ostream *yyout, for which the predefined macro ECHO can be used.

Here is the complete and annotated lexical scanner specification file to be used with flex++:

%{
/* ----------------------------------------------------------------------------
                                 C++ -preamble.
   Include header files, other than those generated by flex++ and bison++.
      E.g., include the interface to the class derived from yyFlexLexer
----------------------------------------------------------------------------*/

                            // the yylex() function that's actually
                            // used
#define YY_DECL int Scanner::yylex()

#include "scanner.h"        // The interface of the derived class

int yyFlexLexer::yylex()    // not called: overruled by
{                           // Scanner::yylex()
    return (0);
}
  
%}

/* ----------------------------------------------------------------------------
                              Flex++ symbol area
                              ~~~~~~~~~~~~~~~~~~
      The symbols mentioned here are used for defining e.g., a miniscanner
---------------------------------------------------------------------------- */
%x comment 
%option yylineno

eolnComment     "//".*
anyChar         .|\n

/* ----------------------------------------------------------------------------
                               Flex rules area:
                               ~~~~~~~~~~~~~~~~
     Regular expressions below here define what the lexer will recognize.
---------------------------------------------------------------------------- */
%%
    /*
        The comment-rules: comment lines are ignored.    
    */
{eolnComment}
"/*"                    BEGIN comment;
<comment>{anyChar}
<comment>"*/"           BEGIN INITIAL;

    /*                
        File switching: #include <filepath>
    */
"#include "[^>]*">"     switchSource();

    /* 
        The default rules: eating all the rest, echoing it to output    
    */ 
{anyChar}               ECHO;

    /*
        The <<EOF>>)rule: pop a pushed file, or terminate the lexer
    */
<<EOF>>                 {
                            if (!popSource())
                                yyterminate();
                        }




Since the derived class is able to access the information stored within the lexical scanner itself (it can even access the information directly, since the data members of yyFlexLexer are protected, and thus accessible to derived classes), very much processing can be done by the derived class' member functions. This results in a very clean setup of the lexer specification file, in which hardly any code is required in the preamble.

18.4.1.2: The derived class: Scanner

The class Scanner is derived from the class yyFlexLexer, generated by flex++. The derived class has access to the data controlled by the lexical scanner. In particular, the derived class has access to the following data members:

Other members are available as well, but they are less often used in our experience. Details can be found in the file FlexLexer.h, which is part of the flex distribution.

The class Scanner has to perform two tasks: It should push file information about the current file to a filestack, and should pop the information pushed last once EOF is detected on a file.

Several member functions are needed for the accomplishment of these tasks. As they are auxiliary to the switchSource() and popSource() functions, they are private members. In practice, these private members are developed once the need for them arises. In the following interface of the Scanner class the final header file is given. Note that, apart from the private member functions, several private data members are used as well. These members are initialized in the constructor Scanner() and are used in the private member functions. They are discussed below, in the context of the member functions using them.

#include <FlexLexer.h>  // provides yyFlexLexer interface
#include <fstream.h>    
#include <stdio.h>
#include <string.h>

class Scanner: public yyFlexLexer
{
    public:             
        Scanner(istream *yyin);

        int yylex();        // overruling yyFlexLexer's yylex()
    private:
        void switchSource();
        int  popSource();
        int scanYYText(); // 1: nextSource contains new name
        void performSwitch();
        void checkCircularity();
        void checkDepth();

        yy_buffer_state
            **state;
        char         
            **fileName,
            *srcPtr,
            *nextSource;

        int
            stackTop;

        static int const
            sizeof_buffer = 16384,
            stackDepth = 10;
};

The switchSource() member function should interpret the information given in yytext: it is interpreted by scanYYText(). If scanYYText() can extract a filename from yytext a switch to another file can be performed. This switch is performed by performSwitch(). If the filename could not be extracted, a message is written to the outputstream. Here is the code of switchSource():


#include "scanner.h"

void Scanner::switchSource()
{   
    if (scanYYText())
        performSwitch();
}

The performSwitch() function and the matching function popSource() handle a simple file switch. In particular, the yylineno variable is not updated when a file switch is performed. If line numbers are to be monitored, the performSwitch() and popSource() functions should respectively push the current value of yylineno on a stack, and thereafter reset yylineno, and (at EOF) pop yylineno from the stack.

The member function scanYYText() performs a simple scan of the information in yytext. If a name is detected following #include < that name is stored in the private data member nextSource, and 1 is returned. Otherwise, the information in yytext is copied to yyout, and 0 is returned. Here is the source for scanYYText():


#include "scanner.h"

int Scanner::scanYYText()
{                               
    delete nextSource;          // build new buffer
    nextSource = new char[yyleng];

    if 
    (
        sscanf(yytext, "#include %[^ \t\n>]", nextSource) != 1
        ||
        !(srcPtr = strchr(nextSource, '<'))
    )
    {
        *yyout << yytext;       // copy #include to yyout
        return (0);             // scan failed
    }
    srcPtr++;
    return (1);
}

The function performSwitch() performs the actual file-switching. The yyFlexLexer class provides a series of member functions that can be used for file switching purposes. The file-switching capability of a yyFlexLexer object is founded on the struct yy_buffer_state, containing the state of the scan-buffer of the file that is currently scanned by the lexical scanner. This buffer is pushed on a stack when an #include is encountered, to be replaced with the buffer of the file that is mentioned in the #include directive.

The switching of the file to be scanned is realized in the following steps:

The sources for the member functions performSwitch(), checkDepth(), and checkCircularity() are given next:


#include "scanner.h"

void Scanner::performSwitch()
{   
    ++stackTop;
    checkDepth();
    checkCircularity();
    
    ifstream
        *newStream = new ifstream(srcPtr);
        
    if (!*newStream)
    {
        cerr << "Can't open " << srcPtr << endl;
        exit(1);
    }
    state[stackTop] = yy_current_buffer;
    yy_switch_to_buffer(yy_create_buffer(newStream, sizeof_buffer));
}


#include "scanner.h"

void Scanner::checkDepth()
{
    if (stackTop == stackDepth)
    {
        cerr << "Inclusion level exceeded. Maximum is " << stackDepth << endl;
        exit (1);
    }
}


#include "scanner.h"

void Scanner::checkCircularity()
{   
    delete fileName[stackTop];

    fileName[stackTop] = new char [strlen(srcPtr) + 1];
    strcpy(fileName[stackTop], srcPtr);
    
    int
        index;

    for (index = 0; strcmp(srcPtr, fileName[index]); index++)
        ;
        
    if (index != stackTop)
    {
        cerr << "Circular inclusion of " << srcPtr << endl;
        while (stackTop > index)
        {
            cerr << fileName[stackTop] << " was included in " << 
                    fileName[stackTop - 1] << endl;
            --stackTop;
        }
        exit (1);
    }
}

The member function popSource() is called to pop the previously pushed sourcefile from the stack, to continue its scan just beyond the just processed #include directive. The popSource() function first inspects stackTop: if the variable is at least 0, then it's an index into the yy_buffer_state array, and thus the current buffer is deleted, to be replaced by the state waiting on top of the stack. This is realized by the yyFlexLexer members yy_delete_buffer and yy_switch_to_buffer.

If a previous buffer waited on top of the stack, then 1 is returned, indicating a successful switch to the previously pushed file. If the stack was empty, 0 is returned, and the lexer will terminate.

Here is the source of the function popSource():


#include "scanner.h"

int Scanner::popSource()
{       
    if (stackTop >= 0)
    {
        yy_delete_buffer(yy_current_buffer);
        yy_switch_to_buffer(state[stackTop]);

        stackTop--;
        return (1);
    }
    return (0);
}

These functions complete the implementation of the complete lexical scanner. the lexical scanner itself is stored in the Scanner::yylex() function. The Scanner object itself only has three public member functions: one function to push a sourcefile on a stack when a switch to the next sourcefile is requested, one function to restore the previously pushed source, and of course yylex() itself.

Finally, the constructor will initialize the Scanner object. Note that the interface contains an overloaded assignment operator and a copy constructor. By mentioning these two functions in the interface only, without implementing them, they cannot be used in a program: the linking phase of a program using such functions would fail. In this case this is intended behavior: the Scanner object does its own job, and there simply is no need for the assignment of a Scanner object to another one, or for the duplication of a Scanner object.

The constructor itself is a simple piece of code. Here is its source:


#include "scanner.h"

Scanner::Scanner(istream *yyin)
{
    switch_streams(yyin, yyout);

    state = new yy_buffer_state * [stackDepth];
    memset(state, 0, stackDepth * sizeof(yy_buffer_state *));

    fileName = new char * [stackDepth];
    memset(fileName, 0, stackDepth * sizeof(char *));
    
    nextSource = 0;
    
    stackTop = -1;
}

18.4.1.3: The main() function

The main program is a very simple one. As the program expects a filename to start the scanning process at, initially the number of arguments is checked. If at least one argument was given, then a ifstream object is created. If this object can be created, then a Scanner object is created, receiving the address of the ifstream object as its argument. Then the yylex() member function of the Scanner object is called. This function is inherited from the Scanner's base class yyFlexLexer.

Here is the source-text of the main function:

/*                              lexer.cc

   A C++ main()-frame generated by C++ for lexer.cc

*/

#include "lexer.h"           /* program header file */


int main(int argc, char **argv)
{       
    if (argc == 1)
    {
        cerr << "Filename argument required\n";
        exit (1);
    }

    ifstream
        yyin(argv[1]);

    if (!yyin)
    {
        cerr << "Can't read " << argv[1] << endl;
        exit(1);
    }
     
    Scanner
        scanner(&yyin);

    scanner.yylex();
    return (0);
}

18.4.1.4: Building the scanner-program

The final program is constructed in two steps. These steps are given for a unix system, on which flex++ and the Gnu C++ compiler g++ have been installed:

For the purpose of debugging a lexical scanner the rules that are matched and the tokens that are returned are useful information. When flex++ is called with the -d flag, debugging code will be part of the generated scanner. Apart from that, the debugging code must be activated. Assuming the scanner object is called scanner, the statement
scanner.set_debug(1);
must be given following the creation of the scanner object.

18.4.2: Using both bison++ and flex++

When the input language exceeds a certain level of complexity, a parser is generally needed to control the complexity of the input language. In these cases, a parser generator is used to generate the code that's required to determine the grammatical correctness of the input language. The function of the scanner is to provided chunks of the input, called tokens, for the parser to work with.

Starting point for a program using both a parser and a scanner is the grammar: the grammar is specified first. This results in a set of tokens which can be returned by the lexical scanner (commonly called the lexer. Finally, auxiliary code is provided to fill in the blanks: the actions which are performed by the parser and the lexer are not normally specified with the grammatical rules or lexical regular expressions, but are executed by functions, which are called from within the parser's rules or associated with the lexer's regular expressions.

In the previous section we've seen an example of a C++ class generated by flex++. In the current section the parser is our main concern. The parser can be generated from a grammar specified for the program bison++. The specification of bison++ is similar to the specifications required for bison, but a class is generated, rather than a single function. In the next sections we'll develop a program converting infix expressions, in which binary operators are written between their operands, to postfix expressions, in which binary operators are written following their operands. A comparable situation holds true for the unary operators - and +: We can ignore the + operator, but the - is converted to a unary minus.

Our calculator will recognize a minimal set of operators: multiplication, addition, parentheses, and the unary minus. We'll distinguish real numbers from integers, to illustrate a subtlety in the bison-like grammar specifications, but that's about it: the purpose of this section, after all, is to illustrate a C++ program, using a parser and a lexer, and not to construct a full-fledged calculator.

In the next few sections we'll start developing the grammar in a bison++ specification file. Then, the regular expressions for the scanner are specified according to the requirements of flex++. Finally the program is constructed.

The class-generating bison software (bison++) is not widely available. The version used by us is 2.20. It can be obtained from

ftp.icce.rug.nl:/pub/unix/bison++2.20.tar.gz.

18.4.2.1: The bison++ specification file

The bison specification file used with bison++ is comparable to the specification file used with bison. Differences are related to the class nature of the resulting parser. The calculator will distinguish real numbers from ints, and will support the basic set of arithmetic operators.

The bison++ specification file consists of the following sections:

18.4.2.2: The bison++ token section

The token section contains all the tokens that are used in the grammar, as well as the priority rules as used for the mathematical operators. Moreover, several new items can be declared here:

18.4.2.3: The bison++ grammar rules

The rules and actions of the grammar are specified as usual. The grammar for our little calculator is given below. A lot of rules, but they illustrate the use of nonterminals associated with value-types.


lines:
    lines
    line
|
    line
;

line:
    intExpr
    '\n'
    {
        cerr << "int: " << $1 << endl;
    }
|
    doubleExpr
    '\n'
    {
        cerr << "double: " << $1 << endl;
    }
|
    '\n'
    {
        cout << "Good bye\n";
        YYACCEPT;
    }
|
    error
    '\n'
;

intExpr:
    intExpr '*' intExpr
    {
        $$ = $1 * $3;
    }
|
    intExpr '+' intExpr
    {
        $$ = $1 + $3;
    }
|
    '(' intExpr ')'
    {
        $$ = $2;
    }
|
    '-' intExpr         %prec UnaryMinus
    {
        $$ = -$2;
    }
|
    INT
    {
        $$ = atoi(lexer.YYText());
    }
;

doubleExpr:
    doubleExpr '*' doubleExpr
    {
        $$ = $1 * $3;
    }
|
    doubleExpr '+' doubleExpr
    {
        $$ = $1 + $3;
    }
|
    doubleExpr '*' intExpr
    {
        $$ = $1 * $3;
    }
|
    doubleExpr '+' intExpr
    {
        $$ = $1 + $3;
    }
|
    intExpr '*' doubleExpr
    {
        $$ = $1 * $3;
    }
|
    intExpr '+' doubleExpr
    {
        $$ = $1 + $3;
    }
|
    '(' doubleExpr ')'
    {
        $$ = $2;
    }
|
    '-' doubleExpr         %prec UnaryMinus
    {
        $$ = -$2;
    }
|
    DOUBLE
    {
        $$ = atof(lexer.YYText());
    }
;

With these rules a very simple calculator is defined in which integer and real values can be negated, added, and multiplied, and in which standard priority rules can be circumvented using parentheses. The rules show the use of typed nonterminal symbols: doubleExpr is linked to real (double) values, intExpr is linked to integer values. Precedence and type association is defined in the token section of the parser specification file, which is:

%name  Parser                
%union 
{
    int i;
    double d;
};
%token      INT
            DOUBLE
%type   <i> intExpr
        <d> doubleExpr

%left   '+'
%left   '*'
%right  UnaryMinus

%define MEMBERS                                         \ 
    virtual ~Parser()   {}                              \ 
    private:                                            \ 
        yyFlexLexer lexer;              
%define LEX_BODY {return lexer.yylex();}

%define ERROR_BODY { cerr << "error encountered\n"; }

In the token section we see the use of the %type specifiers, connecting intExpr to the i-field of the semantic-value union, and connecting doubleExpr to the d-field. At first sight it looks a bit complex, since the expression rules must be included for each individual returntype. On the other hand, if the union itself would have been used, we would have had to specify somewhere in the returned semantic values what field to use: less rules, but more complex and error-prone code.

Also, note that the lexical scanner is included as a member of the parser. There is no need to define the scanner outside of the parser, as it's not used outside of the parser object. The virtual destructor is included as an member to prevent the compiler from complaining about the parser having a non-virtual destructor.

18.4.2.4: The flex++ specification file

The flex-specification file to be used with our little calculator is simple: blanks are skipped, single characters are returned, and numerical values are returned as either Parser::INT or Parser::DOUBLE values. Here is the complete flex++ specification file:


    %{ 
    #include "parser.h"
    %}
    
    %%
    
    [ \t]                       ;
    [0-9]+                      return(Parser::INT);
    
    "."[0-9]*                   |                                        
    [0-9]+("."[0-9]*)?          return(Parser::DOUBLE);
    
    exit                        |
    quit                        return (Parser::DONE);
    
    .|\n                        return (*yytext);
        

18.4.2.5: The generation of the code

The code is generated in the same way as with bison and flex. To order bison++ to generate the files parser.cc and parser.h, the command

bison++ -d -o parser.cc parser
can be given.

Flex++ will thereupon generate code on lexer.cc using the command

flex++ -I -olexer.cc lexer
Note here that flex++ expects no blanks between the -o flag and lexer.cc.

On unix, linking and compiling the generated sources and the source for the main program (listed below) is realized with the following command:

g++ -o calc -Wall *.cc -lfl -s
Note the fact that the libfl.a library is mentioned here. If it's not mentioned unresolved functions like yywrap() emerge.

A source in which the main() function, the lexical scanner and the parser objects are defined is, finally:


    #include "parser.h"
    int main()
    {
        Parser
            parser;
        return (parser.yyparse());
    }