We're always interested in getting feedback. E-mail us if you like this guide, if you think that important material is omitted, if you encounter errors in the code examples or in the documentation, if you find any typos, or generally just if you feel like e-mailing. Send your email to Frank Brokken.Please state the document version you're referring to, as found in the title (in this document: 4.4.2).
This chapter presents a number of concrete examples of programming in
C++. Items from this document such as virtual functions, static
members,
etc. are rediscussed. Examples of container classes are shown.
Another example digs into the peculiarities of using a parser- and scanner-generator with C++. Once the input for a program exceeds a certain level of complexity, it's advantageous to use a scanner- and parser-generator for creating the code which does the actual input recognition. The example describes the usage of these tool in a C++ environment.
int
s, but can also be much more complex, such as maintaining file
system information by the kernel of an operating system.
In this section we take a closer look at the storage of generic objects in
memory (i.e., during the execution of a program). Conforming to the
object-oriented recipe we shall develop two classes: a class Storage
,
which stores objects, and a class Storable
, the prototype of objects
which can be stored.
Storage
is concerned, objects
can be added to the storage and objects can be obtained from the storage. Also
it must be possible to obtain the number of objects in the storage.
As far as the internal data organization of the storage is concerned, we
opt for an approach in which Storage
maintains an array which can be
reallocated, consisting of pointers to the stored objects.
The internal organization of the class Storage
is illustrated in
figure 17.
Storage
.
18.1.1.1: Interface functions of the class Storage
The usage (interface) of the class Storage
is contained in three member
functions. The following list describes these member functions and mentions
the class Storable
, more on this later.
add(Storable const *newobj)
adds an object to the
storage. The function reallocates the array of pointers to accommodate one
more and inserts the address of the object to store.
Storable const *get(int index)
returns a pointer
to the object which is stored at the index'th slot.
int nstored()
returns the number of objects in
the storage.
18.1.1.2: To copy or not to copy?
There are two distinct design alternatives for the function add()
. These
considerations address the choice whether the stored objects (the squares on
the right side of figure 17) should be copies of the
original objects, or the objects themselves.
In other words, should the function add()
of the class Storage
:
These considerations are not trivial. Consider the following example:
Storage store; Storable something; store.add(something); // add to storage // let's assume that Storable::modify() is defined something.modify(); // modify original object, Storable *retrieved = store.get(0); // retrieve from storage // NOW: is "*retrieved" equal to "something" ?!
If we choose to store (addresses of) the objects themselves, then at the end
of the above code fragment, the object pointed to by retrieved
will equal
something
. A manipulation of previously stored objects thereby alters the
contents of the storage.
If we choose to store copies of objects, then obviously *retrieved
will not equal something
but will remain the original, unaltered, object.
This approach has a great merit: objects can be placed into storage as a
`safeguard', to be retrieved later when an original object was altered or even
ceased to exist. In this implementation we therefore choose for this approach.
The fact that copies of objects should be stored presents a small problem. If
we want to keep the class Storage
as universal as possible, then the
making of a copy of a Storable
object cannot occur here. The reason for
this is that the actual type of the objects to store is not known in advance.
A simplistic approach, such as the following:
void Storage::add(Storable const *obj) { Storable *to_store = new Storable(*obj); // now add to_store instead of obj . . }
shall not work. This code attempts to make a copy of obj
by using the
operator new
, which in turn calls the copy constructor of Storable
.
However, if Storable
is only a base class, and the class of the object to
store is a derived class (say, a Person
), how can the copy constructor of
the class Storable
create a copy of a Person
?
The making of a copy therefore must lie with the actual class of the
object to store, i.e., with the derived class. Such a class must have the
functionality to create a duplicate of the object in question and to
return a pointer to this duplicate. If we call this function duplicate()
then the code of the adding function becomes:
void Storage::add(Storable const *obj) { Storable *to_store = obj->duplicate(); // now add to_store instead of obj . . }
The function duplicate()
is called in this example by using a pointer to
the original object (this is the pointer obj
). The class Storable
is
in this example only a base class which defines a protocol, and not the class
of the actual objects which will be stored. Ergo, the function
duplicate()
need not be defined in Storable
, but must be
concretely implemented in derived classes. In other words, duplicate()
is
a pure virtual function.
Storable
.
The following questions are of importance:
Storable
need a default constructor, or
possibly other constructors such as a copy constructor?
The answer is no. Storable
will be a bare prototype, from which
other classes will be derived.
Storable
need a destructor? Should this
destructor be (pure) virtual
?
Yes. The destructor will be called when, e.g., a Storage
object
ceases to exist. It is quite possible that classes which will be derived
from Storable
will have their own destructors: we should therefore
define a virtual
destructor, to ensure that when an object pointed to
by a Storable*
is deleted, the actual destructor of the derived class
is called.
The destructor however should not be pure virtual. It is quite
possible that the classes which will be derived from Storable
will
not need a destructor; in that case, an empty destructor function should
be supplied.
The class definition and its functions are given below:
class Storable { public: virtual ~Storable(); virtual Storable *duplicate() const = 0; }; Storable::~Storable() { }
18.1.2.1: Converting an existing class to a Storable
To show how (existing) classes can be converted to derivation from a
Storable
, consider the below class Person
from section
6.1. This class is re-created here, conforming to Storable
's
protocol (only the relevant or new code is shown):
class Person: public Storable { public: // copy constructor Person(Person const &other); // assignment Person const &operator=(Person const &other); // duplicator function Storable *duplicate() const; . . };When implementing the function
Person::duplicate()
we can use either
the copy constructor or the default constructor with the overloaded assignment
operator. The implementation of duplicate()
is quite simple:
// first version: Storable *Person::duplicate() const { // uses default constructor in new Person Person *dup = new Person; // uses overloaded assignment in *dup = *this *dup = *this; return (dup); } // second version: Storable *Person::duplicate() const { // uses copy constructor in new Person(*this) return (new Person(*this)); }The above conversion from a class
Person
to the needs of a
Storable
supposes that the sources of Person
are at hand and can be
modified. However, even if the definition of a Person
class is not
available, but is e.g., contained in a run-time library, the conversion to the
Storable
format poses no difficulties:
class StorablePerson: public Person, public Storable { public: // duplicator function Storable *duplicate() const; }; Storable *StorablePerson::duplicate() const { return (new StorablePerson(*this)); }
Storage
. The class definition is given
below:
class Storage: public Storable { public: // destructors, constructor ~Storage(); Storage(); Storage(Storage const &other); // overloaded assignment Storage const &operator=(Storage const &other); // functionality to duplicate storages Storable *duplicate() const; // interface void add(Storable *newobj); int nstored() const; Storable *get(int index); private: // copy/destroy primitives void destroy(); void copy(Storage const &other); // private data int n; Storable **storage; };
Concerning the class definition we remark:
add()
, get()
and nstored()
. These functions were previously discussed (see section
18.1.1.1).
Storage
contains a
pointer, which addresses allocated memory.
Storage
itself is derived from Storable
, as can be seen
in the classname definition and in the presence of the function
duplicate()
. This means that Storage
objects can themselves be
placed in a Storage
, thereby creating `super-storages': say, a list
of groups of Person
s.
Storage
defines two private
functions
copy()
and destroy()
. The purpose of these primitive functions
is discussed in section 6.4.1.
The destructor, constructors and the overloaded assignment function are listed below:
// default constructor Storage::Storage() { n = 0; storage = 0; } // copy constructor Storage::Storage(Storage const &other) { copy(other); } // destructor Storage::~Storage() { destroy(); } // overloaded assignment Storage const &Storage::operator=(Storage const &other) { if (this != &other) { destroy(); copy(other); } return (*this); }
The primitive functions copy()
and destroy()
unconditionally copy
another Storage
object, or destroy the contents of the current one. Note
that copy()
calls duplicate()
to duplicate the other's stored
objects:
void Storage::copy(Storage const &other) { n = other.n; storage = new Storable* [n]; for (int i = 0; i < n; i++) storage [i] = other.storage [i]->duplicate(); } void Storage::destroy() { for (register int i = 0; i < n; i++) delete storage [i]; delete storage; }
The function duplicate()
, which is required since Storage
itself
should be a Storable
, uses the copy constructor to duplicate the current
object:
Storable *Storage::duplicate() const { return (new Storage (*this)); }
Finally, here are the interface functions which add objects to the storage,
return them, or determine the number of stored objects (
Note: the function realloc()
that is used in this section should actually
not be used. A better procedure would be to create a C++ variant for the
realloc()
function. A modification is in the pipeline....)
void Storage::add(Storable const *newobj) { // reallocate storage array storage = (Storable **) realloc(storage, (n + 1) * sizeof(Storable *)); // put duplicate of newobj in storage storage [n] = newobj->duplicate(); // increase number of obj in storage n++; } Storable *Storage::get(int index) { // check if index within range if (index < 0 || index >= n) return (0); // return address of stored object return (storage [index]); } int Storage::nstored() const { return (n); }
Storage
and Storable
(see section 18.1)
two separate classes are used: one to represent the tree itself, and one to
represent the objects which are stored in the tree. The classes will be
appropriately named Tree
and Node
.
Node
is an abstract (pure virtual) class, which defines the
protocol for the usage of derived classes with a Tree
. Concerning this
protocol we remark the following:
This comparison must lie with Node
s: a Tree
itself cannot
know how objects should be compared. Part of the procotol which is
required by Node
is therefore:
virtual int compare(Node const &other) const = 0;The comparing function will have to be implemented in each derived class.
Storage
(see
section 18.1), a binary tree will contain copies of objects. The
responsibility to duplicate an object therefore also lies with Node
, as
enforced by a pure virtual function:
virtual Node *clone() const = 0;
virtual void process() = 0;in the class
Node
, the responsibility to process an object is placed
with the derived class.
For these cases we define a virtual function rejected()
, which is a
virtual function called for the Node
that was already stored, receiving
the node requesting to be added as its argument. However, since
it's a virtual function, it can be redefined in a derived class:
virtual void rejected(Node const &twice) { }
Node
is given
below:
class Node { public: virtual ~Node() // destructor {} virtual Node* clone() const = 0; // duplicator virtual void process() = 0; // Node processor // comparing 2 Nodes virtual int compare(Node const &other) const = 0; virtual void rejected(Node const &twice) // called when the node { // was found in the tree } };
Tree
is responsible for the storage of objects which are
derived from a Node
. To implement the recursive tree structure, the class
Tree
has two private
pointers as its data, pointing to subtrees: a
Tree *left
and Tree *right
. The information which is contained in a
node of the tree is represented as a private
field Node *info
.
Tree
objects may be constructed empty and they may be constructed storing
an initial Node
object.
To scan a binary tree, the class Tree
offers three methods: preorder,
inorder and postorder. When scanning in preorder first the left subtree is
scanned, then the leaf itself is processed and finally the right subtree is
scanned. When scanning in inorder first a leaf in a node is processed, then
the left subtree is scanned and finally the right subtree is scanned. When
scanning in postorder first the left and right subtrees are scanned and then
the leaf itself is processed.
The definition of the class Tree
is given below:
#include "node.h" class Tree { public: // destructor, constructors ~Tree(); Tree(); Tree(Node const &node); Tree(Tree const &other); // assignment Tree const &operator=(Tree const &other); // addition of a Node void add(Node const &node); // processing order in the tree void preorder_walk(); void inorder_walk(); void postorder_walk(); private: // primitives void construct(Tree const &other); void destroy(); // called by add(Node const &node) void add(Tree *&branch, Node const &node); // data Tree *left, *right; Node *node; };
There are three constructors defined in the Tree
class. The copy
constructor is presented in the next section, the other two constructors are:
#include "tree.h" // default constructor: initializes to 0 Tree::Tree() : left(0), right(0), node(0) { } // Node constructor: add a Node object Tree::Tree(Node const &node) : left(0), right(0), node(node.clone()) { }
18.2.2.2: The `standard' functions
As can be seen from the class definition, Tree
contains pointer fields.
This means that the class will need a destructor, a copy constructor and an
overloaded assignment function to ensure that no allocation problems occur.
The destructor, the copy constructor and the overloaded assignment function
are implemented with two primitive operations construct()
and destroy()
(presented later):
#include "tree.h" // destructor: destroys the tree Tree::~Tree() { destroy(); } // copy constructor: initializes to contents of other object Tree::Tree(Tree const &other) { construct(other); }
18.2.2.3: Adding an object to the tree
Adding a new object to the tree is a recursive process. When the
function add()
is called to insert an object into the tree, there are
only three possibilities:
node
field of the current node can be a 0-pointer. In that
case, a clone of the Node
object is inserted at the current node.
compare()
, a
pure virtual function whose implementation is required by Node
. Depending
on the order the new object must be inserted in the left
or in the
right
subtree. Adding a node to a subtree is done by an overloaded
(private) add()
function.
rejected()
is called to process the duplicated
Node
.
add()
functions:
#include "tree.h" void Tree::add(Node const &newNode) { if (!node) { node = newNode.clone(); return; } int cmp = node->compare(newNode); if (!cmp) // already stored { node->rejected(newNode); return; } add ( cmp < 0 ? left : right, newNode ); } void Tree::add(Tree *&tree, Node const &newNode) { if (!tree) tree = new Tree(newNode); else tree->add(newNode); }
The class Tree
offers three methods of scanning a binary tree: preorder,
inorder and postorder. The three functions defining these actions are
recursive:
#include "tree.h" void Tree::preorder_walk() { if (left) left->preorder_walk(); if (node) node->process(); if (right) right->preorder_walk(); } void Tree::inorder_walk() { if (node) node->process(); if (left) left->inorder_walk(); if (right) right->inorder_walk(); } void Tree::postorder_walk() { if (left) left->postorder_walk(); if (right) right->postorder_walk(); if (node) node->process(); }
18.2.2.5: The primitive operations copy() and destroy()
The functions copy()
and destroy()
are two private
member
functions which implement primitive operations of the class Tree
: the
copying of the contents of another Tree
or the destroying of the tree.
#include "tree.h" void Tree::destroy() { delete node; if (left) delete left; if (right) delete right; } void Tree::construct(Tree const &other) { node = other.node ? other.node->clone() : 0; left = other.left ? new Tree(*other.left) : 0; right = other.right ? new Tree(*other.right) : 0; }Concerning this implementation we remark the following:
destroy()
is recursive, even though this is not
at once visible. A statement like delete left
will activate the destructor
for the Tree
object which is pointed to by left
; this in turn will
call destroy()
etc..
construct()
is recursive. The code
left = new Tree(*other.left)
activates the copy constructor, which in
turn calls construct()
for the left branch of the tree.
add()
, nodes themselves are
cloned by the function clone()
. This function must be provided by a
concrete implementation of a derived class of Node
.
Tree
and Node
by a program that
counts words in files. Words are defined, rather blandly, as series of
characters, separated by white spaces. The program shows which words are
present in which file, and how many times.
Below is the listing of a class Strnode
. This class is derived from
Node
and implements the virtual functions. Note how this class
implements the counting of words; when a given word occurs more than one time,
Tree
will call the member function rejected()
. This function
simply increases the private counter variable times
.
#include <fstream> #include <iomanip> #include <string> #include "tree.h" class Strnode: public Node { public: Strnode(string const &s) : str(s), times(1) { } Node* clone() const { return (new Strnode(*this)); } int compare (Node const &other) const { return ( str.compare ( static_cast<Strnode const &>(other).str ) ); } void process () { if (times) cout << setw(20) << str.c_str() << ": " << setw(3) << times << endl; } void rejected(Node const &node) { ++times; } private: string str; unsigned times; }; void countfile(istream &inf) { Tree tree; string word; while (inf >> word) { Strnode next(word); tree.add(next); } tree.inorder_walk(); } int main (int argc, char **argv) { int exitstatus = 0; if (argc > 1) { for (int i = 1; i < argc; ++i) { ifstream inf(argv[i]); if (!inf) { cerr << "can't open " << argv[i] << endl; exitstatus++; continue; } countfile(inf); } } else countfile (cin); return (exitstatus); }
In this section we will develop a set of classes starting from the class
Configuration
, whose objects can process a great variety of
options. Actually, we'll start from a small demo program,
in which an object of the class Configuration
is used. From there, the
class Configuration
will be developed, working our way down to the
auxiliary classes that are used with the Configuration
class.
The resulting program will be available as a zip-file containing the sources and (Linux) binary program at our ftp-site. The zip-archive contains all the sources and auxiliary files for creating the program, as well as an icmake build script.
Configuration
object have?
int argc
and char **argv
.
Configuration
object must be able to load a resourcefile.
Our resource file will obey the standard unix form of configuration files:
empty lines are ignored, and information on lines beyond the hashmark (#) is
ignored.
Configuration
object must be able to process command-line
options, which can be either with or without an extra argument.
Option
is, but then, we don't have to if
we decide at this point that pointers to Option
s, rather than the
Option
s themselves are prodcued.
Maybe of similar importance as the functionality the object can perform is what the object can not perform:
Configuration
objects. Therefore there will be no copy constructor.
Now that we've specified the functionality we're ready to take a look at the interface.
18.3.1.1: The interface of the class Configuration
Here is the full interface of the class Configuration
. In the interface,
we recognize the functions we required when specifying the functionality of
the class: the constructor, destructor, and the (not to be implemented) copy
constructor and overloaded assignment operator.
To process the resource file we have loadResourceFile()
, the command-line
options are processed by loadCommandLineOptions()
. Next we see two plain
accessors: programName()
will return the plain program name, while
resourceFile()
will return the name of the resource file. To obtain the
number of command-line arguments that are available when all command-line
options have been processed we have argc()
. The arguments themselves are
obtained by overloaded index operator, using an unsigned
argument. Finally, options can be obtained by name: for this another
overloaded index operator is available, this time using a string
(char const *
) for its argument.
The private
section contains data: variables to access
argc
and argv
, using reference-type variables; variables to store the
program- and resource filenames, and two Hashtable
s (the class
Hashtable
will be covered in section 18.3.6) containing,
respectively, the precompiled options and the command-line options.
Here is the interface of the class Configuration
:
#ifndef _Configuration_H_ #define _Configuration_H_ #include "../hashtable/hashtable.h" class Option; class Configuration { public: Configuration(int &argc, char const **&argv, int initialCap = 20, double maxLoadFactor = 0.75); ~Configuration(); Configuration(Configuration const &other); // NI Configuration &operator=(Configuration const &right); // NI void loadResourceFile(char const *fname); void loadCommandLineOptions(); char const *programName(); // name of the program char const *resourceFile(); // name of used resourcefile unsigned argc() const; // count beyond [0], c.q. options // returns argv[index] | 0 // also beyond [0] c.q. options // option [name] Option const * operator[](char const *name) const; char const *operator[](unsigned index) const; // argument[index] private: int argcShift, &argC; char const **&argv; char *progName; Hashtable optionTable, cmdLineOption; char *resourceFilename; }; #include <string.h> #include "../option/option.h" #include "../string/string.h" #include "../mem/mem.h" #include "../ustream/ustream.h" #include "../stringtokenizer/stringtokenizer.h" #endif _Configuration_H_
18.3.1.2: An example of a program using the class Configuration
Below we present the source of the demonstration program. The program sets up the memoryhandler, to make sure that failing memory allocations will be noticed.
Next, a configuration object is created. This object is passed to an auxiliary
function showing us interesting aspects of the object
(showConfigurationInformation()
). Although this function tells us things
about the Configuration
object, it was not made part of the class, since
it was specifically designed in the context of the demonstration program,
without adding any real functionality to the Configuration
class.
Having displayed the raw information stored in the Configuration
object,
the resource-file is loaded. This might alter the values of the
program-parameters, of which there are four in the demonstration
program. Having loaded the resourcefile, the contents of the Configuration
object are shown again.
Then, the command-line options (if any) are processed, followed by yet another
display of the contents of the Configuration
object.
Here is the source of the demonstration program:
#include "demo.h" int main(int argc, char const **argv) { Mem::installNewHandler(); Configuration config(argc, argv); showConfigurationInformation(config, "After constructing 'config'"); config.loadResourceFile("demo.rc"); showConfigurationInformation(config, "After reading demo.rc"); config.loadCommandLineOptions(); showConfigurationInformation(config, "After processing command-line options"); return (0); }
The constructor of the class Configuration
expects argc
and argv
as reference-type variables. Apart from these two, tho extra parameters are
defined, for which the interface defines default
values: initialCap
defines the initial capacity of the hashtables that are
used by the Configuration
object, and maxLoadFactor
defining the
maximum load percentage of the hashtables. So, with the default parameters the
hashtables would be enlarged once more than 15 elements are stored in them.
Having initialized the reference variables and the hashtables the options are
stored in the hashtables for fast access. The
Option
-class function nextOptionDefinition()
produces a sequence of all options that are defined for the program. Each
option's name and value is stored in the optionTable
hashtable, and each
option's command-line character and name is stored in the cmdLineOption
hashtable. Therefore, the values of options can be retrieved immediately,
given the name of the option, while the option's command-line character can be
used to produce the name of the option, which can then be used in a second
step to obtain the value of the option.
Here is the source of the constructor:
#include "configuration.h" Configuration::Configuration(int &argCount, char const **&argVector, int initialCap, double maxLoadFactor) : argC(argCount), argv(argVector), optionTable(initialCap, maxLoadFactor), cmdLineOption(initialCap, maxLoadFactor) { resourceFilename = Mem::strdup(""); Option *option; while ((option = Option::nextOptionDefinition())) { String *name = new String(option->getName()); optionTable.put(name, option); String const *cmdopt = &(option->getCmdLineOption()); if (strlen(*cmdopt)) cmdLineOption.put(new String(*cmdopt), new String(*name)); } char const *cp = strrchr(argv[0], '/'); progName = Mem::strdup ( !cp ? argv[0] : cp + 1 ); argcShift = 1; }
The function loadResourceFile()
processes a unix-style
resource-files. In these files, empty lines are ignored, as well as
information on a line beyond hash-marks (#
) if these hashmarks are
preceded by the beginning of the line or white space. Long lines may be
stretched out over several lines by adding a continuation character (the
backslash (\
)) at the end of each line that continues on the next line.
To obtain the remaining lines of the configuration file,
loadResourceFile()
creates a Ustream
object. The class Ustream
was
specifically designed for the processing of unix-style
resource-files. As this class doesn't add much to the understanding of the
Configuration
-class its interface and implementation is not discussed in
the annotations. Rather, interface and implementation is found in the
configdemo.zip file at our
ftp-site.
The processing of the information in the configuration file is based on the assumption that all information on a line is organized as follows:
color:
With respect to this format, each key is looked up in the optionTable
. If
found, the value of the option is set to the key's value. Otherwise, if the
key is not found, a warning message is written, by catching the exception
thrown by the hashtable when it receives an undefined option-name.
Apart from the Ustream
object, the function loadResourceFile()
also
uses a StringTokenizer
object, which splits lines from the Ustream
file into words. The first word is interpreted as key
, while the function
range(index)
produces the unsplit line beyond word index
. The class
StringTokenizer
is also found in the distributed zip
-file.
18.3.2.3: loadCommandLineOptions()
The function loadCommandLineOptions()
uses the function getopt()
which is available on unix systems to retrieve command-line options (and
possibly their values) and to separate them from the remaining command-line
arguments. The function getopt()
expects (among other arguments) a string
of command-line option letters, which are possibly followed by a colon. If a
colon is following a command-line option, then information trailing the
command-line option character or the next command-line argument is interpreted
as the value of the command-line option. E.g., a command-line option
character specified as n:
may be specified on the command-line as -n20
or -n 20
.
The function Hashtable::catKeys()
is used to obtain a list of command-line
option characters. Next, the options are extracted from the command-line
arguments using getopt()
. When an option has been found, the
cmdLineOption
hashtable is used to obtain the name of the option, then the
optionTable
hashtable is used to obtain a pointer to the option.
Next the option receives a new value, through the virtual function
assign()
. This function is available for all options, and allows
loadCommandLineOptions()
to assign a new value to an option irrespective
of the actual type of the option.
Here is the code of the function loadCommandLineOptions()
:
#include "configuration.h" void Configuration::loadCommandLineOptions() { String list; cmdLineOption.catKeys(list); register int optionChar; String opt; register char *cp; opterr = 0; // no error messages from getopt() while // while options are found ( (optionChar = getopt(argC, (char *const *)argv, list)) != -1 && (cp = strchr(list, optionChar)) ) { opt = " :"; opt[0] = (char)optionChar; // create option-string if (cp[1] != ':') // no option value ? opt[1] = 0; // then remove ':' from opt. Option // get the configuration option *option = (Option *)optionTable[cmdLineOption[&opt]]; option->assign(optarg); // assign the value } argcShift = optind; // first non-option index in argv }
Option
is designed as an abstract base class, defining the
protocol to which all derived classes must adhere. Derived classes
representing logical values (Boolean
), integer values (Int
), real
values (Double
) and textstrings (Text
) will be constructed later on.
The class itself is derived from another abstract base class,
Object
. Pointers to Object
s are stored in, e.g.,
Hashtable
s.
The class Option
(cf. section 18.3.3.1), has a
constructor, expecting an option name and the specification of
a command-line parameter, and a virtual destructor to be able to deleting
memory allocated by derived class objects through an Option
pointer.
Default implementations returning the logical, int, double and textvalues of options are available as well. These implementations are replaced in derived classes by member functions returning the real, rather than the default, value of the derived class' object.
Since the options must be storable in a hashtable, and since the hashtable
must be able to compare two different object for equality, abstract members
hashCode()
and equals()
are available, to be implemented in the derived
class' objects.
The name and command-line option are obtained via two accessor functions:
getName()
and getCmdLineOption()
, respectively.
To assign a value to an option one more function must be implemented by
derived class options: assign()
, to assign a value to an option.
The static Option *nextOptionDefinition()
member function returns a pointer
to an object of a class derived from Option
. The returned option is
constructed by a function that can be called from an element of the
static Option *(*optionConstructor[])(Mold const &mold)
Option
s. Each of these functions
expects a reference to a Mold
struct.
An array of these structs must be available as static Mold mold[]
. The
Mold
array allows us to specify as data the ingredients of any option
we require in our program. In other words: by defining the elements of an
array Option::Mold Option::mold[]
all kinds of program-options and their
default values. can easily be defined.
For example, in our demonstration program four program options were defined,
representing a logical value, an integer value, a real value and a textual
string. Note that the following mold[]
array is defined as data:
#include "../demo.h" Option::Mold Option::mold[] = { {Boolean, "colors", "c", "True"}, {Int, "trials", "n:", "20"}, {Double, "epsilon", "e:", "0.004"}, {Text, "files", 0, "ls -Fla"}, {}, };
The last element of the mold[]
array
is an empty struct
, acting as a sentinel.
The remaining lines (refer to the struct Mold
in the interface of the
class Option
) contain four elements:
Type
enum are available. Note that this enum
is protected:
it's only used in derived classes.
Configuration
's overloaded index operator.
Boolean
) options string values like
on, off, true, false, 0, 1
in
any casing are all acceptable. Note again that the initial default values are
given as strings.
18.3.3.1: The interface of the class Option
Here is the complete interface of the abstract base class Option
:
#ifndef _Option_H_ #define _Option_H_ #include "../string/string.h" class Option: public Object { public: Option(char const *name, char const *cmdLineOpt); ~Option(); virtual int BoolValue() const; virtual int IntValue() const; virtual double DoubleValue() const; virtual char const *TextValue() const; unsigned hashCode() const; int operator==(Object const &other) const; String const &getName() const, &getCmdLineOption() const; virtual void assign(char const *string) = 0; static Option *nextOptionDefinition(); protected: enum Type { Sentinel, Int, Double, Text, Boolean, }; private: struct Mold { Type optionType; char *name, *cmdLineOption, *defaultValue; }; static Mold mold[]; static Option *(*optionConstructor[])(Mold const &mold); String name, cmdLineName; }; #include <strstream.h> #include "../booloption/booloption.h" #include "../intoption/intoption.h" #include "../doubleoption/doubleoption.h" #include "../textoption/textoption.h" #endif _Option_H_
18.3.3.2: The static member nextOptionDefinition
The static member function nextOptionDefinition()
is called repeatedly
until it returns 0. The function visits all elements of the mold[]
array,
calling the static function optionConstructor
associated with the
option-type of the element of the array mold[]
that is visited.
The variable optionConstructor[]
is an array, which is initialized as data
of the class Option
. The elements of the optionConstructor[]
array are
pointers to Constructor()
functions of all the derived classes. These
functions construct actual derived class option objects, and expect the
ingredients for the construction as a reference to a Mold struct
.
The function nextOptionDefinition()
is:
#include "option.h" Option *Option::nextOptionDefinition() { static unsigned index = 0; if (mold[index].optionType == Sentinel) return (0); Option *option = optionConstructor[mold[index].optionType] (mold[index]); index++; return (option); }
The array optionConstructor[]
is initialized as follows:
#include "option.h" Option *(*Option::optionConstructor[])(Mold const &mold) = { 0, IntOption::Constructor, DoubleOption::Constructor, TextOption::Constructor, BoolOption::Constructor, };Note that in this initialization reflects the ordering of the
Option::Type
enum. There is no constructor for the Sentinel
enum-value, while the remaining elements contain the addresses for the
different derived-class option types.
TextOption
, derived from Option
, is given. The class contains
implementations of all the pure virtual functions of the class Option
, and
it mentions the existence of a copy constructor and overloaded assignment
operator. However, these functions are (once again) not to be used, and are
mentioned here as a safeguard against their being used accidently.
The interesting part of the interface is the function static Option
*Constructor(Mold const &mold)
: it constructs a TextOption
object
(through TextOption
's constructor), using the ingredients it encounters in
the Mold
it receives as its argument. Note that the prototype of
Constructor
corresponds to the prototype of the elements of the array
Option::optionConstructor[]. As we have seen (in
section 18.3.3.2),
Option:optionConstructor[Text]
has been given the value
TextOption::Constructor
, thus setting up the connection between an
option-type and the constructor for such an option from the ingredients found
in an Option::Mold
.
The other three classes derived from the class Option
are constructed
similarly. The reader is referred to their interfaces and implementation in
the zip
-archive in our
ftp-site.
18.3.4.1: The interface of the class TextOption
Here is the interface of the class TextOption
, derived from Option
:
#ifndef _TextOption_H_ #define _TextOption_H_ #include "../option/option.h" class TextOption: public Option { public: static Option *Constructor(Mold const &mold); TextOption(char const *name, char const *cmdLineOpt, char const *initialValue); ~TextOption(); TextOption(TextOption const &other); // NI TextOption &operator=(TextOption const &other); // NI void assign(char const *str); char const *TextValue() const; char const *toString() const; private: char *value; }; #include "../mem/mem.h" #endif _TextOption_H_
18.3.4.2: The implementation of the assign() function
As an example of an implementation of an assign()
function, we present the function TextOption::assign()
. As defined by the interface of the
class Option, this function has one parameter, a
char const *str
. It needs to perform only two tasks: First, the old value
of the TextOption
object is deleted, then a new value is
assigned. Corresponding assign()
functions are available for the other
derived option classes.
Here is the implementation of TextOption::assign()
:
#include "textoption.h" void TextOption::assign(char const *str) { delete value; value = Mem::strdup(str); }
class Object
is an abstract base class. Pointers to Object
s are be
stored in Hashtable
s. The class is a very simple
one, containing a virtual destructor (doing nothing in particular), and
requiring the implementation of three pure virtual functions:
int operator==(Object const &other)
, used to compare two objects
of classes derived from the class Object
,
unsigned hashCode()
, returning a hashcode for the object. This
function is used in combination with a Hashtable
object.
char const *toString()
, returning a printable representation of the
object.
Object
:
#ifndef _Object_H_ #define _Object_H_ class Object { public: virtual ~Object(); virtual int operator==(Object const &other) const = 0; virtual unsigned hashCode() const = 0; virtual char const *toString() const = 0; }; #endif _Object_H_
class Hashtable
is used to store and retrieve objects of classes
derived from the class Object
. The class contains two pointers to vectors
of pointers to Object
s, containing the keys and values that are
stored in the hashtable. Furthermore, the class has data-members holding the
actual number of elements that are stored in the hashtable (n
), the
number of elements of the two vectors of pointers to Object
s
(capacity
), the original number of elements of these vectors
(initialCapacity
) and the maximum proportion of elements of the vectors
that may be occupied (maxLoadFactor
).
The Hashtable
objects are self-expanding. Once maxLoadFactor
threatens
to be exceeded, the table is expanded automatically.
The functionality of the hashtable includes members for retrieving values of
the objects stored in the table using either the name of a key (as a char
const *
) or a pointer to an Object
; a member to add a new key/value
pair to the table, and a utility member catKeys()
returning a string
containing the catenated names of all keys. This latter function is used by
the Option::nextOptionDefinition()
to tell
getopt()
what command-line option characters it can expect.
The interface of the class Hashtable
also shows some private
member functions, used for expanding the table, and for inserting and
retrieving elements from the table. Some of these functions are covered in the
following discussion. Functions not needing special attention are available in
the zip-archive.
Here is the interface of the class Hashtable
:
#ifndef _Hashtable_H_ #define _Hashtable_H_ #include "../string/string.h" class Object; class Hashtable { public: Hashtable(int initialCapacity, double maxLoadFactor = 0.75); ~Hashtable(); Hashtable(Hashtable const &other); // NI Hashtable const &operator=(Hashtable const &other); // NI Object const *operator[](Object const *key) const; Object const *operator[](char const *key) const; Object const *put(Object *key, Object *value); // returns value void catKeys(String &target); // catenate the keys // as strings char const *toString() const; private: void installVectors(int capacityRequest); int lookup(Object const *key) const; // key must exist int mayInsert(Object *key); // key might not exist // the key in the table int expanded(); // 1 if table was expanded unsigned capacity, initialCapacity, n; double maxLoadFactor; Object **keys, **values; }; #include <unistd.h> #include <stdlib.h> #include "../option/option.h" #endif _Hashtable_H_
18.3.6.1: The Hashtable constructor
The constructor of the hashtable initializes the data-members of the table,
and then calls installVectors()
to initialize the keys
and values
vectors. Here is the constructor of the class Hashtable
:
#include "hashtable.h" Hashtable::Hashtable(int iniCap, double maxFactor) { maxLoadFactor = maxFactor; n = 0; initialCapacity = iniCap; capacity = 0; keys = 0; values = 0; installVectors(initialCapacity); }
The function installVectors()
simply creates two vectors of the required
number of elements (i.e., capacity
), initializing the vectors with
null-pointers.
18.3.6.2: The function mayInsert()
The functions mayInsert()
returns the index of a key that is stored in the hashtable. The difference with the function lookup()
is that the function
lookup()
requires the key to be available in the hashtable, whereas the
function mayInsert()
will insert the key when it isn't available yet.
If the function lookup()
doesn't find the key in the table, it throws a
char const *
exeption, containing the name of the key. The exception is
thereupon caught by the function
Configuration::loadResourceFile()
. The
function mayInsert()
, however, will try to insert a non-existing key into
the hashtable.
Before looking for a key, both lookup()
and mayInsert()
first
determine an initial hashcode, using the key's hashCode()
function. A
simple add-the-hash rehash scheme is used to cope with collisions. The
add-the-hash value is at least 1 and at most the current capacity minus
one. Using a prime-sized hashtable, this ensures that all elements of the
hashtable are visited by repeatedly adding the add-the-hash value to the index
value that was last used.
The insertion process itself consists of a perpetual loop, that terminates when the index of the key in the hashtable has been determined.
If an empty element of the key
vector is hit,
expand()
is called, which may enlarge the hashtable.
If the table was enlarged, both the hashcode and the add-the-hash value of the
actual key are recomputed, and the perpetual loop starts its next
cycle. Otherwise, the key is entered at the empty element's position, and its
index value is returned.
If the key is found in the vector of keys, then the corresponding index position is returned. Alternatively, a collision may occur, and the index value is incremented by the add-the-hash value, followed by the next cycle of the perpetual loop.
Thus, the lookup()
and mayInsert()
functions return the index of the
provided key. Apart from that, lookup()
will throw an exception when the
provided key isn't found in the table.
Here is the sourcetext of the function mayInsert()
:
#include "hashtable.h" // addTheHash is set in the range 1 .. capacity - 1, and the initial // index is made equal to the addTheHash value. Since addTheHash is non-zero // a new index computed by adding the addTheHash value to the index will // always get another value. The zeroth index of the hashtable will only be // used as the result of a collision, but that doesn't matter: hashtables // aren't filled up completely anyway. int Hashtable::mayInsert(Object *key) { unsigned hashCode = key->hashCode(); register unsigned addTheHash = 1 + hashCode % (capacity - 1), index = addTheHash; // within the capacity range while (1) { if (!keys[index]) // empty slot ? { if (expanded()) // hashtable was expanded ? { addTheHash = 1 + hashCode % (capacity - 1); index = addTheHash; // new index after expansion continue; // restart the checking } keys[index] = key; // place the key here ++n; // n contains #-elements return (index); // and produce its index } if (*keys[index] == *key) // same object ? return (index); // return its index if ((index += addTheHash) >= capacity) // collision: try next entry index -= capacity; } }
18.3.6.3: The function expanded()
The function expanded()
first checks the loadfactor of the hashtable: if
the actual number of elements divided by the capacity of the table exceeds
maxLoadFactor
, the current keys
and values
vectors are saved, and
new vectors containing initialCapacity
extra elements are installed.
Next, the elements of the old keys
vector are visited. If a non-empty
element is found, that element and its value are stored in the hashtable using
the function put()
. This process continues until n
elements (the
number of non-empty elements in the old vectors) are stored in the enlarged
table. Since the function put()
owns the objects that its arguments
point to (i.e., Object *
s rather than Object const *
s are used, the
objects the elements of the old vectors point to must not be
deleted. Therefore, at the end of the function expanded()
the old keys and
values vectors are simply deleted, disregarding the objects their elements
point to.
Mem
is a good example: while
standard functions are available for the allocation of memory, these functions
reduce to the function malloc()
, and not to the operator
new
. Since the operator new
can be protected by the
set_new_handler()
function, it's a good idea to duplicate the popular
standard memory allocating functions based on malloc()
by functions using
new
.
Another example is found in the class Util
, containing functions we think
are useful, but which we could not place conceptually easy in other
classes. For example, the utility class contains a function prime()
returning a prime number.
The following utility classes are available:
Mem
: this class handles memory allocation through the operator
new
rather than through the function malloc()
.
String
: objects of this class represent strings, and can perform
certain string-related tasks.
StringTokenizer
: objects of this class break up strings into
substrings according to a set of delimiters.
Ustream
: objects of this class handle unix-style configuration
files, in which empty lines and information on lines beyond the hash-mark are
ignored.
Util
: this class contains functions performing tasks which do not
belong conceptually to other classes.
The Mem
and Util
classes contain just static member functions, and do
not require objects to be used. For the other classes objects must be defined.
The next sections will cover the interfaces of these classes. The implementation of the functions of these classes is found in the zip-archive at our ftp-site.
The class Mem
contains functions related to the allocation of memory,
using the operator new
. Using new
, it is easy to catch exhausted
dynamic memory through the function set_new_handler()
.
The class contains functions to install a new
-handler, to duplicate and
concatenate strings, to compare strings, and to reallocate memory. As all these
functions are static, there is no need to create a Mem
object.
The function realloc()
isn't a particularly elegant attempt to make
available a function that resembles the standard malloc()
-based
realloc()
function. Actually, in the demonstration program it's used only
by the StringTokenizer
constructor. However, by making it a member of the
latter class, we feel we would mix up memory allocation with string handling.
The Mem::realloc()
function does a rather crude job: it should be used
only for enlarging the required amount of memory, in which case the extra
allocated memory remains completely uninitialized.
The other member functions are implemented in a standard way. Most of them
accept null-pointers as arguments as well. Here is the interface of the
class Mem
:
#ifndef _Mem_H_ #define _Mem_H_ class Mem { public: static void installNewHandler(); static char *strdup(char const *str); static int casecmp(char const *s1, char const *s2); static int cmp(char const *s1, char const *s2); static char *strndup(char const *str, unsigned len); static char *strcat(char const *src1, char const *src2); static void *realloc(void *addressOfPointerToOldData, unsigned dataSize, unsigned oldN, unsigned newN); private: static void memoryExhausted(); }; #include <iostream.h> #include <new.h> #include <string.h> #endif _Mem_H_
Objects of the class String
represent strings: 0-delimited series of
ascii-characters. The class is derived from Object
, so String
objects
can be stored in Hashtable
s.
Apart from the functions required by the class Object
,
the class String
contains all standard members, like a copy constructor
and a overloaded assignment operators. Apart from these members, there is a
conversion operator, allowing the use of a String
object as a char
const *
, and there are members for enlarging the string by catenating another
string to it, and for retrieving a character using the index-operator.
Here is the interface of the class String
:
#ifndef _String_H_ #define _String_H_ #include <iostream.h> #include <stdarg.h> #include "../object/object.h" class String: public Object { public: String(); String(char const *arg); ~String(); String(String const &other); String &operator=(String const &rvalue); String &operator=(char const *rvalue); int operator==(Object const &other) const; unsigned hashCode() const; char const *toString() const; operator char const *() const; String &strcat(char const *str2); char &operator[](unsigned index); private: char *string; }; #include "../mem/mem.h" #include "../hashtable/hashtable.h" #endif _String_H_
18.3.7.3: The class StringTokenizer
The class StringTokenizer
is used for breaking up strings into substrings
according to a (set of) delimiters. By default, the white-space delimiters are
used. The constructor of the class expects an ascii-z string (and optionally a
string of delimiter-characters) and will split the string into substrings
according to the set of delimiters.
The substrings are retrievable through the overloaded index-operator,
returning pointers to String
objects, which are then owned by the calling
function. Another member function is range()
, returning the substring
starting at a particular index-position. For example, if StringTokenizer
st
contains five substrings, st.range(3)
will return the substring of the
original string starting at st[3]
.
Here is the interface of the class StringTokenizer
:
#ifndef _StringTokenizer_H_ #define _StringTokenizer_H_ #include "../string/string.h" class StringTokenizer { public: StringTokenizer(char const *cp, char const *delimiters = " \t\n"); ~StringTokenizer(); StringTokenizer(StringTokenizer const &other); // NI StringTokenizer &operator=(StringTokenizer const &other); // NI String *operator[](unsigned index); String *range(unsigned from); // until the last one private: struct SubString { char *str; unsigned length; }; char *str; SubString *subString; unsigned n; }; #endif _StringTokenizer_H_
The class Ustream
processes files as unix-like configuration files.
In these files empty lines are ignored, as is information starting at a
hash-mark at the beginning of a line or preceded by a white-space
character. Furthermore, lines are combined if the last character of a line is
a backslash.
The constructor of the class expects one argument: the name of the file to be
processed. Having created a Ustream
object, the conversion operator
operator void *()
can be used to determine the successful opening of the
file: it returns 0
if the file wasn't opened successfully.
The (non-empty, non-comment section of) lines of the file are returned by the
member read()
as a char *
: the line is owned by the calling
function. Calling read()
succeeds until a null-pointer is returned.
After a successful read-operation, the member-function lineNr()
will
return the actual linenumber of the just read line in the original file. In
this case empty and comment-lines are counted.
The file is closed when the Ustream
object is destroyed.
Here is the interface of the class Ustream
:
#ifndef _Ustream_H_ #define _Ustream_H_ #include <fstream.h> class Ustream { public: Ustream(char const *fname); Ustream(Ustream const &other); // NI Ustream const &operator=(Ustream const &right); // NI operator void *(); // direct status-check char *read(); // 0 if no more lines int lineNr(); private: ifstream stream; int line; }; #include "../mem/mem.h" #endif _Ustream_H_
The class Util
contains several utility functions, which did not belong
elsewhere. The functions atod()
and atoi()
convert, respectively,
strings to doubles and strings to ints, and they differ from the standard
functions atof()
and atoi()
only by the fact that the Util
functions accept null-pointers as well.
The function prime()
uses the sieve of Aristosthenes to generate the first
prime exceeding the value given as its argument.
The function hashPjw()
returns a hashvalue for a string. This algorithm is
given in Aho, Sethi, and Ullman's Compilers: Principles, Techniques and
Tools, 1986, p. 435 as P. J. Weinberger's algorithm for computing
hash-values.
The interface of the class Util
is given below:
#ifndef _Util_H_ #define _Util_H_ #include <values.h> // uses INTBITS to find the # of bits in a word, hence in an int class Util { public: static double atod(char const *value); // convert to double static int atoi(char const *value); // convert to int static unsigned prime(unsigned lowerBound); // first prime exceeding // lowerBound static unsigned hashPjw(char const *key); // return hashvalue private: int const bitsPerInt = INTBITS, moduloMask = bitsPerInt - 1; static int shiftBitsPerInt; }; #include <stdlib.h> #include <string.h> #include <math.h> #endif _Util_H_
flex
and the parser generator bison
. Both
bison
and flex
are well documented elsewhere. The original
predecessors of bison
and flex
, called yacc
and lex
are
described in several books, e.g. in O'Reilly's book `lex & yacc'.
However, the scanner and parser generators are also (and maybe even
more commonly, nowadays) available as free software. Both bison
and flex
can be obtained from prep.ai.mit.edu/pub/gnu. Flex
will create a C++
class
when called as flex++
, or when the -+
flag is used. With
bison
the situation is a bit more complex. Scattered over the
Internet several bison++
archives can be found (e.g., in
rzbsdi01.uni-trier.de). The
information in these archives usually dates back to 1993,
irrespective of the version number mentioned with the archive
itself. (However, the given ftp-archive also contains dos-executables,
for those who are interested....)
Using flex++
and bison++
a class
-based scanner and parser can be
generated. The advantage of this approach is that the interface to the scanner
and the parser tends to become a bit cleaner than without using the class
interface.
Below two examples are given. In the first example only a lexical scanner is
used to monitor the production of a file from several parts. This example
focuses on the lexical scanner, and on switching files while churning
through the parts. The second example uses both a scanner and a parser to
transform standard arithmetic expressions to their postfix notation, commonly
encountered in code generated by compilers and in HP
-calculators.
The second example focuses on the parser.
#include
statement, which is followed by a string indicating the
file which should be included at the location of the #include
.
In order to avoid complexities that have nothing to do with the current
example, the format of the #include
statement is restricted to the
form #include <filepath>
.
The file specified between the pointed brackets should be available at
the location indicated by filepath
. If the file is not available,
the program should terminate using a proper error message.
The program is started with one or two filename arguments. If the program is
started with just one filename argument, the output is written to the standard
output stream cout
. Otherwise, the output is written to the stream whose
name is given as the program's second argument.
The program uses a maximum nesting depth. Once the maximum is exceeded, the program terminates with an appropriate error message. In that case, the filenamestack indicating where which file was included should be printed.
One minor extra feature is that comment-lines should be recognized: include
directives in comment-lines should be ignored, comment being the standard
C++ comment-types.
The program is created in the following steps:
lexer
is constructed, containing the
specifications of the input-language.
lexer
the requirements for the
class Scanner
evolve. The Scanner
class is a wrapper around
the class yyFlexLexer
generated by flex++
. The requirements
results in the specification of the interface for the class
Scanner
.
main()
function is constructed. A Startup
object
is created to inspect the commandline arguments. If successful,
the scanner
's member yylex()
is called to construct the
output file.
18.4.1.1: The flex++ specification file
The organization of the lexical scanner specification file is similar
to the one used with flex
. However, flex++
creates a class
(yyFlexLexer
) from which the class Scanner
will be derived.
The code associated with the regular expression rules will be located inside
the class yyFlexLexer
. However, it would be handy to access the
member-functions of the derived class within that code. Fortunately, class
derivation and inheritance helps us to realize this. In the specification of
the class yyFlexLexer()
, we notice that the function yylex()
is a
virtual function. In the FlexLexer.h
header file we see virtual int
yylex()
:
class yyFlexLexer: public FlexLexer { public: yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 ); virtual ~yyFlexLexer(); void yy_switch_to_buffer( struct yy_buffer_state* new_buffer ); struct yy_buffer_state* yy_create_buffer( istream* s, int size ); void yy_delete_buffer( struct yy_buffer_state* b ); void yyrestart( istream* s ); virtual int yylex(); virtual void switch_streams( istream* new_in, ostream* new_out ); protected: ... };Consequently, if
yylex()
is defined in a derived class, then this
function of the derived class will be called from a base class (i.e.,
yyFlexLexer
) pointer. Since the yylex()
function of the derived class
is called, that function will have access to the members of its class, and to
the public and protected members of its base class.
The context in which the generated scanner is placed is (by default)
the function yyFlexLexer::yylex()
. However, this context can be changed by
defining the YY_DECL
-macro. This macro, if defined, determines the context
in which the generated scanner will be placed. So, in order to make the
generated scanner part of the derived class function yylex()
, three
things must be done:
YY_DECL
must be defined in the lexer specficiation
file. It must define the derived class function yylex()
as the scanner
function. For example:
#define YY_DECL int Scanner::yylex()
yylex()
must be declared in the class definition of
the derived class.
yyFlexLexer::yylex()
is a virtual function, it
must still be defined. It is not called, though, so its definition may be a
simple
int yyFlexLexer::yylex() { return (0); }
The definition of the YY_DECL
macro and the yyFlexLexer::yylex()
function can conveniently be placed in the lexer specification file, as shown
below.
Looking at the regular expressions themselves, notice that we'll need rules
for the recognition of the comment, for the include
directive, and for the
remaining characters. This is all fairly standard practice. When an include
directive is detected, the derived-class' member function switchSource()
is called, which will perform the required file switching. When the end of the
file (EOF
) is detected, the derived class' member function popSource()
is called, which will pop the previous previously pushed file, returning
1. Once the file-stack is empty, the function will return 0, resulting in the
call of yyterminate()
, which will terminate the scanner.
The lexical scanner specification file has three sections: a C++
preamble, containing code which can be used in the code defining
the actions to be performed once a regular expression is matched, a
Flex++ symbol area
, which is used for the definition of symbols,
like a mini scanner, or options
, like %option yylineno
when the
lexical scanner should keep track of the line numbers of the files it
is scanning and, finally a rules section, in which the regular
expressions and their actions are given. In the current example, the
lexer should mainly copy information from the istream *yyin
to the
ostream *yyout
, for which the predefined macro ECHO
can be
used.
Here is the complete and annotated lexical scanner specification file
to be used with flex++
:
%{ /* ---------------------------------------------------------------------------- C++ -preamble. Include header files, other than those generated by flex++ and bison++. E.g., include the interface to the class derived from yyFlexLexer ----------------------------------------------------------------------------*/ // the yylex() function that's actually // used #define YY_DECL int Scanner::yylex() #include "scanner.h" // The interface of the derived class int yyFlexLexer::yylex() // not called: overruled by { // Scanner::yylex() return (0); } %} /* ---------------------------------------------------------------------------- Flex++ symbol area ~~~~~~~~~~~~~~~~~~ The symbols mentioned here are used for defining e.g., a miniscanner ---------------------------------------------------------------------------- */ %x comment %option yylineno eolnComment "//".* anyChar .|\n /* ---------------------------------------------------------------------------- Flex rules area: ~~~~~~~~~~~~~~~~ Regular expressions below here define what the lexer will recognize. ---------------------------------------------------------------------------- */ %% /* The comment-rules: comment lines are ignored. */ {eolnComment} "/*" BEGIN comment; <comment>{anyChar} <comment>"*/" BEGIN INITIAL; /* File switching: #include <filepath> */ "#include "[^>]*">" switchSource(); /* The default rules: eating all the rest, echoing it to output */ {anyChar} ECHO; /* The <<EOF>>)rule: pop a pushed file, or terminate the lexer */ <<EOF>> { if (!popSource()) yyterminate(); }
Since the derived class is able to access the information stored within the
lexical scanner itself (it can even access the information directly, since
the data members of yyFlexLexer
are protected
, and thus accessible to
derived classes), very much processing can be done by the derived class'
member functions. This results in a very clean setup of the lexer
specification file, in which hardly any code is required in the preamble.
18.4.1.2: The derived class: Scanner
The class Scanner
is derived from the class yyFlexLexer
, generated by
flex++
. The derived class has access to the data controlled by the lexical
scanner. In particular, the derived class has access to the following data
members:
char *yytext
: contains the text matched by a regular expression
int yyleng
: the length of the text in yytext
int yylineno
: the current line number (only if %option yylineo
was specified in the lexer specfication file)
FlexLexer.h
, which is part
of the flex
distribution.
The class Scanner
has to perform two tasks: It should push file information
about the current file to a filestack, and should pop the information pushed
last once EOF
is detected on a file.
Several member functions are
needed for the accomplishment of these tasks. As they are auxiliary to the
switchSource()
and popSource()
functions, they are private members.
In practice, these private members are developed once the need for them arises.
In the following interface of the Scanner
class the final header
file is given. Note that, apart from the private member functions, several
private data members are used as well. These members are initialized in the
constructor Scanner()
and are used in the private member functions. They
are discussed below, in the context of the member functions using them.
#include <FlexLexer.h> // provides yyFlexLexer interface #include <fstream.h> #include <stdio.h> #include <string.h> class Scanner: public yyFlexLexer { public: Scanner(istream *yyin); int yylex(); // overruling yyFlexLexer's yylex() private: void switchSource(); int popSource(); int scanYYText(); // 1: nextSource contains new name void performSwitch(); void checkCircularity(); void checkDepth(); yy_buffer_state **state; char **fileName, *srcPtr, *nextSource; int stackTop; static int const sizeof_buffer = 16384, stackDepth = 10; };
The switchSource()
member function should interpret the information given
in yytext
: it is interpreted by scanYYText()
. If scanYYText()
can extract a filename from yytext
a switch to another file can be
performed. This switch is performed by performSwitch()
. If the filename
could not be extracted, a message is written to the outputstream. Here is
the code of switchSource()
:
#include "scanner.h" void Scanner::switchSource() { if (scanYYText()) performSwitch(); }
The performSwitch()
function and the matching function popSource()
handle a simple file switch. In particular, the yylineno
variable is not
updated when a file switch is performed. If line numbers are to be monitored,
the performSwitch()
and popSource()
functions should respectively
push the current value of yylineno
on a stack, and thereafter
reset yylineno
, and (at EOF
) pop yylineno
from the stack.
The member function scanYYText()
performs a simple scan of the information
in yytext
. If a name is detected following #include <
that name is
stored in the private data member nextSource
, and 1 is returned.
Otherwise, the information in yytext
is copied to yyout
, and 0 is
returned. Here is the source for scanYYText()
:
#include "scanner.h" int Scanner::scanYYText() { delete nextSource; // build new buffer nextSource = new char[yyleng]; if ( sscanf(yytext, "#include %[^ \t\n>]", nextSource) != 1 || !(srcPtr = strchr(nextSource, '<')) ) { *yyout << yytext; // copy #include to yyout return (0); // scan failed } srcPtr++; return (1); }
The function performSwitch()
performs the actual file-switching. The
yyFlexLexer
class provides a series of member functions that can be used
for file switching purposes. The file-switching capability of a yyFlexLexer
object is founded on the struct yy_buffer_state
, containing the state of
the scan-buffer of the file that is currently scanned by the lexical scanner.
This buffer is pushed on a stack when an #include
is encountered, to
be replaced with the buffer of the file that is mentioned in the #include
directive.
The switching of the file to be scanned is realized in the following steps:
include
-nesting is inspected.
If the stackDepth
is reached, the stack is full, and the program
aborts with an appropriate message. For this the member function
checkDepth()
is called.
fileName
stack is inspected, to avoid circular
inclusions. If nextSource
is encountered in the fileName
array, the inclusion is refused, and the program terminates with
an appropriate message. The member function checkCircularity()
is called for this task.
ifstream
object is created, assigned to
nextSource
. If this fails, the program terminates with an
appropriate message.
yy_buffer_state
is created for the newly opened
stream, and the lexical scanner is instructed to switch to that
stream using yyFlexLexer
's member function yy_switch_to_buffer
.
performSwitch(), checkDepth()
, and
checkCircularity()
are given next:
#include "scanner.h" void Scanner::performSwitch() { ++stackTop; checkDepth(); checkCircularity(); ifstream *newStream = new ifstream(srcPtr); if (!*newStream) { cerr << "Can't open " << srcPtr << endl; exit(1); } state[stackTop] = yy_current_buffer; yy_switch_to_buffer(yy_create_buffer(newStream, sizeof_buffer)); }
#include "scanner.h" void Scanner::checkDepth() { if (stackTop == stackDepth) { cerr << "Inclusion level exceeded. Maximum is " << stackDepth << endl; exit (1); } }
#include "scanner.h" void Scanner::checkCircularity() { delete fileName[stackTop]; fileName[stackTop] = new char [strlen(srcPtr) + 1]; strcpy(fileName[stackTop], srcPtr); int index; for (index = 0; strcmp(srcPtr, fileName[index]); index++) ; if (index != stackTop) { cerr << "Circular inclusion of " << srcPtr << endl; while (stackTop > index) { cerr << fileName[stackTop] << " was included in " << fileName[stackTop - 1] << endl; --stackTop; } exit (1); } }
The member function popSource()
is called to pop the previously pushed
sourcefile from the stack, to continue its scan just beyond the just processed
#include
directive. The popSource()
function first inspects
stackTop
: if the variable is at least 0, then it's an index into the
yy_buffer_state
array, and thus the current buffer is deleted, to be
replaced by the state waiting on top of the stack. This is realized by
the yyFlexLexer
members yy_delete_buffer
and yy_switch_to_buffer
.
If a previous buffer waited on top of the stack, then 1 is returned, indicating a successful switch to the previously pushed file. If the stack was empty, 0 is returned, and the lexer will terminate.
Here is the source of the function popSource()
:
#include "scanner.h" int Scanner::popSource() { if (stackTop >= 0) { yy_delete_buffer(yy_current_buffer); yy_switch_to_buffer(state[stackTop]); stackTop--; return (1); } return (0); }
These functions complete the implementation of the complete lexical scanner.
the lexical scanner itself is stored in the Scanner::yylex()
function.
The Scanner
object itself only has three public
member functions: one function to push a sourcefile on a stack when a switch to
the next sourcefile is requested, one function to restore the previously
pushed source, and of course yylex()
itself.
Finally, the constructor will initialize the Scanner
object. Note that
the interface contains an overloaded assignment operator and a copy
constructor. By mentioning these two functions in the interface only,
without implementing them, they cannot be used in a program: the linking phase
of a program using such functions would fail. In this case this is intended
behavior: the Scanner
object does its own job, and there simply is no need
for the assignment of a Scanner
object to another one, or for
the duplication of a Scanner
object.
The constructor itself is a simple piece of code. Here is its source:
#include "scanner.h" Scanner::Scanner(istream *yyin) { switch_streams(yyin, yyout); state = new yy_buffer_state * [stackDepth]; memset(state, 0, stackDepth * sizeof(yy_buffer_state *)); fileName = new char * [stackDepth]; memset(fileName, 0, stackDepth * sizeof(char *)); nextSource = 0; stackTop = -1; }
The main program is a very simple one. As the program expects a
filename to start the scanning process at, initially the number of
arguments is checked. If at least one argument was given, then a
ifstream
object is created. If this object can be created, then a
Scanner
object is created, receiving the address of the
ifstream
object as its argument. Then the yylex()
member
function of the Scanner
object is called. This function is
inherited from the Scanner
's base class yyFlexLexer
.
Here is the source-text of the main function:
/* lexer.cc A C++ main()-frame generated by C++ for lexer.cc */ #include "lexer.h" /* program header file */ int main(int argc, char **argv) { if (argc == 1) { cerr << "Filename argument required\n"; exit (1); } ifstream yyin(argv[1]); if (!yyin) { cerr << "Can't read " << argv[1] << endl; exit(1); } Scanner scanner(&yyin); scanner.yylex(); return (0); }
18.4.1.4: Building the scanner-program
The final program is constructed in two steps. These steps are given
for a unix system, on which flex++
and the Gnu
C++
compiler g++
have been installed:
flex++
. For
this the command
flex++ lexer
libfl.a
library. The appropriate command here is
g++ -o scanner *.cc -lfl
flex++
is
called with the -d
flag, debugging code will be part of the generated
scanner. Apart from that, the debugging code must be activated. Assuming the
scanner object is called scanner
, the statement
scanner.set_debug(1);
scanner
object.
Starting point for a program using both a parser and a scanner is the grammar: the grammar is specified first. This results in a set of tokens which can be returned by the lexical scanner (commonly called the lexer. Finally, auxiliary code is provided to fill in the blanks: the actions which are performed by the parser and the lexer are not normally specified with the grammatical rules or lexical regular expressions, but are executed by functions, which are called from within the parser's rules or associated with the lexer's regular expressions.
In the previous section we've seen an example of a C++ class generated by
flex++
. In the current section the parser is our main concern. The parser
can be generated from a grammar specified for the program bison++
. The
specification of bison++
is similar to the specifications required for
bison
, but a class is generated, rather than a single function. In the
next sections we'll develop a program converting infix expressions, in
which binary operators are written between their operands, to postfix
expressions, in which binary operators are written following their operands. A
comparable situation holds true for the unary operators -
and +
: We
can ignore the +
operator, but the -
is converted to a unary minus.
Our calculator will recognize a minimal set of operators: multiplication, addition, parentheses, and the unary minus. We'll distinguish real numbers from integers, to illustrate a subtlety in the bison-like grammar specifications, but that's about it: the purpose of this section, after all, is to illustrate a C++ program, using a parser and a lexer, and not to construct a full-fledged calculator.
In the next few sections we'll start developing the grammar in a bison++
specification file. Then, the regular expressions for the scanner are
specified according to the requirements of flex++
. Finally the program is
constructed.
The class-generating bison software (bison++) is not widely available. The version used by us is 2.20. It can be obtained from
ftp.icce.rug.nl:/pub/unix/bison++2.20.tar.gz.
18.4.2.1: The bison++ specification file
The bison specification file used with bison++
is comparable to the
specification file used with bison
. Differences are related to the
class nature of the resulting parser. The calculator will distinguish real
numbers from ints, and will support the basic set of arithmetic operators.
The bison++
specification file consists of the following sections:
bison
. The difference being the
%header{
opening. In this section we'll encounter mainly declarations:
header files are included, and the yyFlexLexer
object is declared.
bison++
has several extra
items that can be declared here. They are important and warrant a section of
their own.
bison
program.
18.4.2.2: The bison++ token section
The token section contains all the tokens that are used in the grammar, as well as the priority rules as used for the mathematical operators. Moreover, several new items can be declared here:
%name ParserName
. The name ParserName
will be the name
of the parser's class. This entry should be the first entry of the
token-section. It is used in cases where multiple grammars are used,
to make sure that the different parser-classes use unique
identifiers. By default the name parse
is used.
%define name content
. The %define
has the same function
as the #define
statement for the C++
preprocessor. It can be
used to define, e.g., a macro. Internally, the defined symbol will be
the concatenation of YY_
, the parser's classname, and the name of
the macro. E.g.,
YY_ParserName_name
%define LEX_BODY inline-code
: here the body of the call to the
lexer is defined. It can be defined as = 0
for an abstract parser-class,
but otherwise it must contain the code (including surrounding curly braces)
representing the call to the lexer. For example, if the lexer object generated
by flex++
is called lexer
, this declaration should be
%define LEX_BODY {return lexer.yylex();}
%define ERROR_BODY inline-code
: similarly, the body of
the code of the call to the error-function is defined here. It can
be defined as = 0
, in which case the parser's class will again
become abstract. Otherwise, it is used to specify the inner
workings of the error function, including surrounding braces. E.g.,
%define ERROR_BODY { cerr << "syntax Error\n"; }
LEX_BODY
and ERROR_BODY
definitions are omitted, then the
compiler is not able to complete the virtual table of the parser class, and
the linking phase will report an error like
undefined reference to `Parser virtual table'
The remaining symbols are optional, and can be (re)defined as needed:
%define DEBUG 1
: if non-0 debugging code will be included
in the parser's source.
%define ERROR_VERBOSE
: if defined, the parser's stack
will be dumped when an error occurs.
%define LVAL yylval
: the default variable name is shown
here: the variable name containing the parser's semantic value is by
default yylval
, but its name may be redefined here.
%define INHERIT :public ClassA, public ClassB
: the
inheritance list for the parser's class. Note that it starts with the
':'
character. The define
should be left out if the parser's
class isn't derived from another class.
%define MEMBERS member-prototypes
: if the parser should
contain extra members, they must be declared here. Note that there is
only one %define MEMBERS
definition allowed. So, if multiple
members are to be declared, they must all be declared at this
point. To prevent very long lines in the specification file, the \ can
be used at the end of a line, to indicate that it continues on the
next line of the source-text. E.g.,
%define MEMBERS void lookup(); void lookdown();The
MEMBERS
section starts in a public
section. If private
members are required too, a private:
directive can be part of the
MEMBERS
section.
%defines
can be used:
%define CONSTRUCTOR_PARAM parameterlist
: this
defines the parameterlist for the parser's constructor. Here the types and
names of the parameters of the parser should be given. The surrounding
parentheses of the parameterlist are not part of the CONSTRUCTOR_PARAM
definition.
%define CONSTRUCTOR_INIT :initializer(s)
: this
defines the base-class and member initializers for the constructor. Note the
initial colon following CONSTRUCTOR_INIT
, which is required. The colon may
be given immediately after the CONSTRUCOR_INIT
statement, or blanks may be
used to separate the symbol from the colon.
%define CONSTRUCTOR_CODE { code }
: this
defines the code of the parser's constructor.
parse parser;
%union
. This starts the definition of the semantical value
union. It replaces the #define YYSTYPE
definition seen with
bison
. An example of a %union
declaration is
%union { int i; double d; };The union cannot contain objects as its fields, as constructors cannot be called when a union is created. This means that a
string
cannot be a
member of the union. A string *
, however, is a possible union
member. As a side line: the lexical scanner has no need to know about this
union. The scanner can simply pass its scanned text to the parser through its
YYText()
member function. At the appropriate action block a statements like
$$.i = atoi(scanner.YYText());
$$
or the
generic returnvalues $1, $2
, etc, that are associated with components of
rules can be used, rather than $$.i, $3.d
, etc. To associate a
nonterminal or a token with a unionfield, the <fieldname>
specification is
used. E.g., %token <i> INT // token association (deprecated) <d> DOUBLE %type <i> intExpr // non-terminal associationIn this example, note that both the tokens and the nonterminals can be associated with a field of the union. However, as noted earlier, the lexical scanner has no need to know about all this. In our opinion, it is cleaner to let the scanner do just one thing: scan texts. The parser knows what it's all about, and may convert strings like
"123"
to an integer
value. Consequently, we are discouraging the association of a unionfield and a
token. In the upcoming description of the rules of the grammar this will be
further illustrated.
%union
discussion the %token
and %type
specifications should be noted. They are used for the specficiation of
the tokens (terminal symbols) that can be returned by the lexical
scanner, and for the specification of the returntypes of nonterminals.
Apart from %token
the token-indicators %left, %right
and
%nonassoc
may be used to specify the associativity of
operators. The token(s) mentioned at these indicators are interpreted
as tokens indicating operators, associating in the indicated
direction. The precedence of operators is given by their order: the
first specification has the lowest precedence. To overrule a certain
precedence in a certain context, %prec
can be used. As all this is
standard bison
practice, it isn't further discussed in this
context. The documentation provided with the bison
distribution
should be consulted for further reference.
18.4.2.3: The bison++ grammar rules
The rules and actions of the grammar are specified as usual. The grammar for our little calculator is given below. A lot of rules, but they illustrate the use of nonterminals associated with value-types.
lines: lines line | line ; line: intExpr '\n' { cerr << "int: " << $1 << endl; } | doubleExpr '\n' { cerr << "double: " << $1 << endl; } | '\n' { cout << "Good bye\n"; YYACCEPT; } | error '\n' ; intExpr: intExpr '*' intExpr { $$ = $1 * $3; } | intExpr '+' intExpr { $$ = $1 + $3; } | '(' intExpr ')' { $$ = $2; } | '-' intExpr %prec UnaryMinus { $$ = -$2; } | INT { $$ = atoi(lexer.YYText()); } ; doubleExpr: doubleExpr '*' doubleExpr { $$ = $1 * $3; } | doubleExpr '+' doubleExpr { $$ = $1 + $3; } | doubleExpr '*' intExpr { $$ = $1 * $3; } | doubleExpr '+' intExpr { $$ = $1 + $3; } | intExpr '*' doubleExpr { $$ = $1 * $3; } | intExpr '+' doubleExpr { $$ = $1 + $3; } | '(' doubleExpr ')' { $$ = $2; } | '-' doubleExpr %prec UnaryMinus { $$ = -$2; } | DOUBLE { $$ = atof(lexer.YYText()); } ;With these rules a very simple calculator is defined in which integer and real values can be negated, added, and multiplied, and in which standard priority rules can be circumvented using parentheses. The rules show the use of typed nonterminal symbols:
doubleExpr
is
linked to real (double) values, intExpr
is linked to integer
values. Precedence and type association is defined in the token
section of the parser specification file, which is:
%name Parser %union { int i; double d; }; %token INT DOUBLE %type <i> intExpr <d> doubleExpr %left '+' %left '*' %right UnaryMinus %define MEMBERS \ virtual ~Parser() {} \ private: \ yyFlexLexer lexer; %define LEX_BODY {return lexer.yylex();} %define ERROR_BODY { cerr << "error encountered\n"; }In the token section we see the use of the
%type
specifiers,
connecting intExpr
to the i
-field of the semantic-value union,
and connecting doubleExpr
to the d
-field. At first sight it
looks a bit complex, since the expression rules must be included for
each individual returntype. On the other hand, if the union itself
would have been used, we would have had to specify somewhere in the
returned semantic values what field to use: less rules, but more
complex and error-prone code.
Also, note that the lexical scanner is included as a member of the parser. There is no need to define the scanner outside of the parser, as it's not used outside of the parser object. The virtual destructor is included as an member to prevent the compiler from complaining about the parser having a non-virtual destructor.
18.4.2.4: The flex++ specification file
The flex-specification file to be used with our little calculator is
simple: blanks are skipped, single characters are returned, and
numerical values are returned as either Parser::INT
or
Parser::DOUBLE
values. Here is the complete flex++
specification file:
%{ #include "parser.h" %} %% [ \t] ; [0-9]+ return(Parser::INT); "."[0-9]* | [0-9]+("."[0-9]*)? return(Parser::DOUBLE); exit | quit return (Parser::DONE); .|\n return (*yytext);
18.4.2.5: The generation of the code
The code is generated in the same way as with bison
and
flex
. To order bison++
to generate the files parser.cc
and parser.h
, the command
bison++ -d -o parser.cc parser
Flex++
will thereupon generate code on lexer.cc
using the command
flex++ -I -olexer.cc lexer
flex++
expects no blanks between the -o
flag
and lexer.cc
.
On unix, linking and compiling the generated sources and the source for the main program (listed below) is realized with the following command:
g++ -o calc -Wall *.cc -lfl -s
libfl.a
library is mentioned here. If it's
not mentioned unresolved functions like yywrap()
emerge.
A source in which the main()
function, the lexical
scanner and the parser objects are defined is, finally:
#include "parser.h" int main() { Parser parser; return (parser.yyparse()); }