Don't hesitate to send in feedback: send an e-mail if you like the C++ Annotations; if you think that important material was omitted; if you find errors or typos in the text or the code examples; or if you just feel like e-mailing. Send your e-mail to Frank B. Brokken.Please state the document version you're referring to, as found in the title (in this document: 8.3.1) and please state chapter and paragraph name or number you're referring to.
All received mail is processed conscientiously, and received suggestions for improvements will usually have been processed by the time a new version of the Annotations is released. Except for the incidental case I will normally not acknowledge the receipt of suggestions for improvements. Please don't interpret this as me not appreciating your efforts.
In this chapter concrete examples of C++ programs, classes and templates
will be presented. Topics covered by the C++ Annotations such as virtual
functions, static
members, etc. are illustrated in this chapter. The
examples roughly follow the organization of earlier chapters.
As an additional topic, not just providing examples of C++ the subjects of scanner and parser generators are covered. We show how these tools may be used in C++ programs. These additional examples assume a certain familiarity with the concepts underlying these tools, like grammars, parse-trees and parse-tree decoration. Once the input for a program exceeds a certain level of complexity, it's attractive to use scanner- and parser-generators to create the code doing the actual input processing. One of the examples in this chapter describes the usage of these tools in a C++ environment.
std::streambuf
as the starting point for constructing classes
interfacing such file descriptor devices.
Below we'll construct classes that can be used to write to a device given its file descriptor. The devices may be files, but they could also be pipes or sockets. Section 23.1.2 covers reading from such devices; section 23.3.1 reconsiders redirection, discussed earlier in section 6.6.1.
Using the streambuf
class as a base class it is relatively easy to
design classes for output operations. The only member function that must
be overridden is the (virtual) member
int steambuf::overflow(int c)
. This member's responsibility is to
write characters to the device. If fd
is an output file descriptor and if
output should not be buffered then the member overflow()
can simply be
implemented as:
class UnbufferedFD: public std::streambuf { public: virtual int overflow(int c); ... }; int UnbufferedFD::overflow(int c) { if (c != EOF) { if (write(d_fd, &c, 1) != 1) return EOF; } return c; }The argument received by
overflow
is either written to the file
descriptor (and returned from overflow
), or EOF
is returned.
This simple function does not use output buffering. For various reasons, using a buffer is usually a good idea (see also the next section).
When output buffering is used, the overflow
member is a bit more
complex as it is only called when the buffer is full. Once the buffer is full,
we first have to flush the buffer. Flushing the buffer is the
responsibility of the (virtual) function
streambuf::sync
is
available. Since sync
is a virtual function, classes derived from
streambuf
may redefine sync
to flush a buffer streambuf
itself
doesn't know about.
Overriding sync
and using it in overflow
is not all that has to be
done. When the object of the class defining the buffer reaches the end of its
lifetime the buffer may be only partially full. In that situation the buffer
must also be flushed. This is easily done by simply calling sync
from the
class's destructor.
Now that we've considered the consequences of using an output buffer, we're almost ready to design our derived class. Several more features will be added as well, though:
OFdnStreambuf
has the following characteristics:
streambuf
the
<unistd.h>
header file must
have been read by the compiler before its member functions can be compiled.
std::streambuf
.
class OFdnStreambuf: public std::streambuf { size_t d_bufsize; int d_fd; char *d_buffer; public: OFdnStreambuf(); OFdnStreambuf(int fd, size_t bufsize = 1); virtual ~OFdnStreambuf(); void open(int fd, size_t bufsize = 1); private: virtual int sync(); virtual int overflow(int c); };
open
member (see below). Here are the constructors:
inline OFdnStreambuf::OFdnStreambuf() : d_bufsize(0), d_buffer(0) {} inline OFdnStreambuf::OFdnStreambuf(int fd, size_t bufsize) { open(fd, bufsize); }
sync
, flushing any characters stored in the
output buffer to the device. In implementations not using a buffer the
destructor can be given a default implementation:
inline OFdnStreambuf::~OFdnStreambuf() { if (d_buffer) { sync(); delete[] d_buffer; } }
This implementation does not close the device. It is left as an exercise to the reader to change this class in such a way that the device is optionally closed (or optionally remains open). This approach was adopted by, e.g., the Bobcat library. See also section 23.1.2.2.
open
member initializes the buffer. Using
streambuf::setp
, the begin and end points of the buffer are
defined. This is used by the streambuf
base class to initialize
streambuf::pbase
,
streambuf::pptr
, and
streambuf::epptr
:
inline void OFdnStreambuf::open(int fd, size_t bufsize) { d_fd = fd; d_bufsize = bufsize == 0 ? 1 : bufsize; d_buffer = new char[d_bufsize]; setp(d_buffer, d_buffer + d_bufsize); }
sync
flushes the as yet unflushed contents of the
buffer to the device. After the flush the buffer is reinitialized using
setp
. After successfully flushing the buffer sync
returns 0:
inline int OFdnStreambuf::sync() { if (pptr() > pbase()) { write(d_fd, d_buffer, pptr() - pbase()); setp(d_buffer, d_buffer + d_bufsize); } return 0; }
streambuf::overflow
is also
overridden. Since this member is called from the streambuf
base class when
the buffer is full it should first call sync
to flush the buffer to the
device. Next it should write the character c
to the (now empty)
buffer. The character c
should be wrtten using pptr
and
streambuf::pbump
. Entering a character into the buffer should be
implemented using available streambuf
member functions, rather than `by
hand' as doing so might invalidate streambuf
's internal bookkeeping. Here
is overflow
's implementation:
inline int OFdnStreambuf::overflow(int c) { sync(); if (c != EOF) { *pptr() = c; pbump(1); } return c; }
OFfdStreambuf
class to copy its standard
input to file descriptor
STDOUT_FILENO
, which is the symbolic name of the
file descriptor used for the standard output:
#include <string> #include <iostream> #include <istream> #include "fdout.h" using namespace std; int main(int argc, char **argv) { OFdnStreambuf fds(STDOUT_FILENO, 500); ostream os(&fds); switch (argc) { case 1: for (string s; getline(cin, s); ) os << s << '\n'; os << "COPIED cin LINE BY LINE\n"; break; case 2: cin >> os.rdbuf(); // Alternatively, use: cin >> &fds; os << "COPIED cin BY EXTRACTING TO os.rdbuf()\n"; break; case 3: os << cin.rdbuf(); os << "COPIED cin BY INSERTING cin.rdbuf() into os\n"; break; } }
std::streambuf
, they
should be provided with an input buffer of at least one character. The
one-character input buffer allows for the use of the member functions
istream::putback
or istream::ungetc
. Strictly speaking it is not
necessary to implement a buffer in classes derived from streambuf
. But
using buffers in these classes is strongly advised. Their implementation is
very simple and straightforward and the applicability of such classes will be
greatly improved. Therefore, in all our classes derived from the class
streambuf
a buffer of at least one character will be defined.
IFdStreambuf
) from streambuf
using a
buffer of one character, at least its member
streambuf::underflow
should be overridden, as this member eventually
receives all requests for input. The member
streambuf::setg
is used to inform the streambuf
base class of the
size and location of the input buffer, so that it is able to set up its input
buffer pointers accordingly. This will ensure that
streambuf::eback
,
streambuf::gptr
, and
streambuf::egptr
return correct values.
The class IFdStreambuf
is designed like this:
streambuf
, the
<unistd.h>
header file must have been read by the compiler before its member functions
can be compiled.
std::streambuf
as well.
protected
data members
so that derived classes (e.g., see section 23.1.2.3) can access them. Here
is the full class interface:
class IFdStreambuf: public std::streambuf { protected: int d_fd; char d_buffer[1]; public: IFdStreambuf(int fd); private: int underflow(); };
gptr
's return value equal to
egptr
's return value. This
implies that the buffer is empty so underflow
is immediately called
to fill the buffer:
inline IFdStreambuf::IFdStreambuf(int fd) : d_fd(fd) { setg(d_buffer, d_buffer + 1, d_buffer + 1); }
underflow
is overridden. The buffer is refilled by
reading from the file descriptor. If this fails (for whatever reason),
EOF
is returned. More sophisticated implementations could act more
intelligently here, of course. If the buffer could be refilled,
setg
is
called to set up streambuf
's buffer pointers correctly:
inline int IFdStreambuf::underflow() { if (read(d_fd, d_buffer, 1) <= 0) return EOF; setg(d_buffer, d_buffer, d_buffer + 1); return *gptr(); }
main
function shows how IFdStreambuf
can be used:
int main() { IFdStreambuf fds(STDIN_FILENO); istream is(&fds); cout << is.rdbuf(); }
IFdStreambuf
developed in the previous section. To make things a bit more
interesting, in the class
IFdNStreambuf
developed here, the member
streambuf::xsgetn
is also overridden, to optimize reading a
series of characters. Also a default constructor is provided that can be used
in combination with the open
member to construct an istream
object
before the file descriptor becomes available. In that case, once the
descriptor becomes available, the open
member can be used to initiate
the object's buffer. Later, in section 23.3, we'll encounter such a
situation.
To save some space, the success of various calls was not checked. In `real
life' implementations, these checks should of course not be omitted. The
class IFdNStreambuf
has the following characteristics:
streambuf
the
<unistd.h>
header file must
have been read by the compiler before its member functions can be compiled.
std::streambuf
.
IFdStreambuf
(section 23.1.2.1), its data
members are protected. Since the buffer's size is configurable, this size is
kept in a dedicated data member, d_bufsize
:
class IFdNStreambuf: public std::streambuf { protected: int d_fd; size_t d_bufsize; char* d_buffer; public: IFdNStreambuf(); IFdNStreambuf(int fd, size_t bufsize = 1); virtual ~IFdNStreambuf(); void open(int fd, size_t bufsize = 1); private: virtual int underflow(); virtual std::streamsize xsgetn(char *dest, std::streamsize n); };
open
. Open
will then
initialize the object so that it can actually be used:
inline IFdNStreambuf::IFdNStreambuf() : d_bufsize(0), d_buffer(0) {} inline IFdNStreambuf::IFdNStreambuf(int fd, size_t bufsize) { open(fd, bufsize); }
open
, its destructor will
both delete the object's buffer and use the file descriptor to close the
device:
IFdNStreambuf::~IFdNStreambuf() { if (d_bufsize) { close(d_fd); delete[] d_buffer; } }
Even though the device is closed in the above implementation this may not
always be desirable. In cases where the open file descriptor is already
available the intention may be to use that descriptor repeatedly, each time
using a newly constructed IFdNStreambuf
object. It is left as an exercise
to the reader to change this class in such a way that the device may
optionally be closed. This approach was followed in, e.g., the
Bobcat library.
open
member simply allocates the object's buffer. It is
assumed that the calling program has already opened the device. Once the
buffer has been allocated, the base class member
setg
is used to ensure
that
streambuf::eback
streambuf::gptr
and
streambuf::egptr
return correct values:
void IFdNStreambuf::open(int fd, size_t bufsize) { d_fd = fd; d_bufsize = bufsize; d_buffer = new char[d_bufsize]; setg(d_buffer, d_buffer + d_bufsize, d_buffer + d_bufsize); }
underflow
is implemented almost
identically to IFdStreambuf
's (section 23.1.2.1) member. The only
difference is that the current class supports buffers of larger
sizes. Therefore, more characters (up to d_bufsize
) may be read from the
device at once:
int IFdNStreambuf::underflow() { if (gptr() < egptr()) return *gptr(); int nread = read(d_fd, d_buffer, d_bufsize); if (nread <= 0) return EOF; setg(d_buffer, d_buffer, d_buffer + nread); return *gptr(); }
xsgetn
is overridden. In a loop, n
is reduced until
0, at which point the function terminates. Alternatively, the member returns
if underflow
fails to obtain more characters. This member optimizes the
reading of series of characters. Instead of calling
streambuf::sbumpc
n
times, a block of avail
characters is copied
to the destination, using
streambuf::gbump
to consume avail
characters from the buffer using one function call:
std::streamsize IFdNStreambuf::xsgetn(char *dest, std::streamsize n) { int nread = 0; while (n) { if (!in_avail()) { if (underflow() == EOF) break; } int avail = in_avail(); if (avail > n) avail = n; memcpy(dest + nread, gptr(), avail); gbump(avail); nread += avail; n -= avail; } return nread; }
xsgetn
is called by
streambuf::sgetn
,
which is a streambuf
member. Here is an example illustrating the use of
this member function with a IFdNStreambuf
object:
#include <unistd.h> #include <iostream> #include <istream> #include "ifdnbuf.h" using namespace std; int main() { // internally: 30 char buffer IFdNStreambuf fds(STDIN_FILENO, 30); char buf[80]; // main() reads blocks of 80 // chars while (true) { size_t n = fds.sgetn(buf, 80); if (n == 0) break; cout.write(buf, n); } }
std::streambuf
should override the members
streambuf::seekoff
and
streambuf::seekpos
. The class
IFdSeek
, developed in this section, can be used to read information from
devices supporting seek operations. The class IFdSeek
was derived from
IFdStreambuf
, so it uses a character buffer of just one character. The
facilities to perform seek operations, which are added to our new class
IFdSeek
, ensure that the input buffer is reset when a seek operation is
requested. The class could also be derived from the class
IFdNStreambuf
. In that which case the arguments to reset the input buffer
must be adapted so that its second and third parameters point beyond the
available input buffer. Let's have a look at the characteristics of
IFdSeek
:
IFdSeek
is derived from IFdStreambuf
. Like the
latter class, IFdSeek
's member functions use facilities declared in
unistd.h
. So, the header file
<unistd.h>
must have been read by the
compiler before it can compile the class's members functions. To reduce the
amount of typing when specifying types and constants from streambuf
and
std::ios
, several typedef
s are defined by the class. These
typedefs refer to types that are defined in the header file
<ios>
, which
must therefore also be included before the compiler can compile IFdSeek
's
class interface:
class IFdSeek: public IFdStreambuf { typedef std::streambuf::pos_type pos_type; typedef std::streambuf::off_type off_type; typedef std::ios::seekdir seekdir; typedef std::ios::openmode openmode; public: IFdSeek(int fd); private: pos_type seekoff(off_type offset, seekdir dir, openmode); pos_type seekpos(pos_type offset, openmode mode); };
inline IFdSeek::IFdSeek(int fd) : IFdStreambuf(fd) {}
seek_off
is responsible for performing the actual
seek operations. It calls
lseek
to seek a new position in a device whose
file descriptor is known. If seeking succeeds,
setg
is called to define
an already empty buffer, so that the base class's underflow
member
refills the buffer at the next input request.
IFdSeek::pos_type IFdSeek::seekoff(off_type off, seekdir dir, openmode) { pos_type pos = lseek ( d_fd, off, (dir == std::ios::beg) ? SEEK_SET : (dir == std::ios::cur) ? SEEK_CUR : SEEK_END ); if (pos < 0) return -1; setg(d_buffer, d_buffer + 1, d_buffer + 1); return pos; }
seekpos
is overridden as well:
it is actually defined as a call to seekoff
:
inline IFdSeek::pos_type IFdSeek::seekpos(pos_type off, openmode mode) { return seekoff(off, std::ios::beg, mode); }
IFdSeek
. If
this program is given its own source file using input redirection then
seeking is supported (and with the exception of the first line, every other
line is shown twice):
#include "fdinseek.h" #include <string> #include <iostream> #include <istream> #include <iomanip> using namespace std; int main() { IFdSeek fds(0); istream is(&fds); string s; while (true) { if (!getline(is, s)) break; streampos pos = is.tellg(); cout << setw(5) << pos << ": `" << s << "'\n"; if (!getline(is, s)) break; streampos pos2 = is.tellg(); cout << setw(5) << pos2 << ": `" << s << "'\n"; if (!is.seekg(pos)) { cout << "Seek failed\n"; break; } } }
Streambuf
classes and classes derived from
streambuf
should support
at least ungetting the last read character. Special care must be taken
when series of
unget
calls must be supported. In this section the
construction of a class supporting a configurable number of istream::unget
or
istream::putback
calls is discussed.
Support for multiple (say `n
') unget
calls is implemented by
reserving an initial section of the input buffer, which is gradually filled up
to contain the last n
characters read. The class was implemented as
follows:
std::streambuf
. It
defines several data members, allowing the class to perform the bookkeeping
required to maintain an unget-buffer of a configurable size:
class FdUnget: public std::streambuf { int d_fd; size_t d_bufsize; size_t d_reserved; char *d_buffer; char *d_base; public: FdUnget(int fd, size_t bufsz, size_t unget); virtual ~FdUnget(); private: int underflow(); };
d_reserved
bytes of the class's input buffer.
d_reserved
. So, a certain number of bytes may be read. Once d_reserved
bytes have been read at most d_reserved
bytes can be ungot.
d_base
, pointing to a location d_reserved
bytes from the
start of d_buffer
. This will always be the point where buffer refills
start.
streambuf
's buffer pointers using setg
. As no characters have been
read yet, all pointers are set to point to d_base
. If unget
is
called at this point, no characters are available, so unget
will
(correctly) fail.
FdUnget::FdUnget(int fd, size_t bufsz, size_t unget) : d_fd(fd), d_reserved(unget) { size_t allocate = bufsz > d_reserved ? bufsz : d_reserved + 1; d_buffer = new char[allocate]; d_base = d_buffer + d_reserved; setg(d_base, d_base, d_base); d_bufsize = allocate - d_reserved; }
inline FdUnget::~FdUnget() { delete[] d_buffer; }
underflow
is overridden as follows:
underflow
determines the number of characters that
could potentially be ungot. If that number of characters are ungot, the input
buffer is exhausted. So this value may be any value between 0 (the initial
state) or the input buffer's size (when the reserved area has been filled up
completely, and all current characters in the remaining section of the buffer
have also been read);
d_reserved
, but it is set equal to the
actual number of characters that can be ungot if this value is smaller;
d_base
;
d_base
and not from d_buffer
;
streambuf
's read buffer pointers are set up.
Eback
is set to move
locations before d_base
, thus
defining the guaranteed unget-area,
gptr
is set to d_base
, since that's the location of the
first read character after a refill, and
is set just beyond the location of the last character
read into the buffer.
underflow
's implementation:
int FdUnget::underflow() { size_t ungetsize = gptr() - eback(); size_t move = std::min(ungetsize, d_reserved); memcpy(d_base - move, egptr() - move, move); int nread = read(d_fd, d_base, d_bufsize); if (nread <= 0) // none read -> return EOF return EOF; setg(d_base - move, d_base, d_base + nread); return *gptr(); }
FdUnget
. It reads at most
10 characters from the standard input, stopping at EOF
. A guaranteed
unget-buffer of 2 characters is defined in a buffer holding 3 characters. Just
before reading a character, the program tries to unget at most 6
characters. This is, of course, not possible; but the program will nicely
unget as many characters as possible, considering the actual number of
characters read:
#include "fdunget.h" #include <string> #include <iostream> #include <istream> using namespace std; int main() { FdUnget fds(0, 3, 2); istream is(&fds); char c; for (int idx = 0; idx < 10; ++idx) { cout << "after reading " << idx << " characters:\n"; for (int ug = 0; ug <= 6; ++ug) { if (!is.unget()) { cout << "\tunget failed at attempt " << (ug + 1) << "\n" << "\trereading: '"; is.clear(); while (ug--) { is.get(c); cout << c; } cout << "'\n"; break; } } if (!is.get(c)) { cout << " reached\n"; break; } cout << "Next character: " << c << '\n'; } } /* Generated output after 'echo abcde | program': after reading 0 characters: unget failed at attempt 1 rereading: '' Next character: a after reading 1 characters: unget failed at attempt 2 rereading: 'a' Next character: b after reading 2 characters: unget failed at attempt 3 rereading: 'ab' Next character: c after reading 3 characters: unget failed at attempt 4 rereading: 'abc' Next character: d after reading 4 characters: unget failed at attempt 4 rereading: 'bcd' Next character: e after reading 5 characters: unget failed at attempt 4 rereading: 'cde' Next character: after reading 6 characters: unget failed at attempt 4 rereading: 'de ' reached */
istream
objects operator
>>, the
standard extraction operator, is perfectly suited for the task as in most
cases the extracted fields are white-space (or otherwise clearly separated)
from each other. But this does not hold true in all situations. For example,
when a web-form is posted to some processing script or program, the receiving
program may receive the form field's values as url-encoded
characters: letters and digits are sent unaltered, blanks are sent as +
characters, and all other characters start with %
followed by the
character's
ascii-value represented by its two digit hexadecimal value.
When decoding url-encoded information, simple hexadecimal extraction won't
work, since that will extract as many hexadecimal characters as available,
instead of just two. Since the letters a-f`
and 0-9
are legal
hexadecimal characters, a text like My name is `Ed'
, url-encoded as
My+name+is+%60Ed%27results in the extraction of the hexadecimal values
60ed
and 27
,
instead of 60
and 27
. The name Ed
disappears from view, which is
clearly not what we want.
In this case, having seen the %
, we could extract 2 characters, put
them in an
istringstream
object, and extract the hexadecimal value from
the istringstream
object. A bit cumbersome, but doable. Other approaches
are possible as well.
The class
Fistream
for fixed-sized field istream defines
an istream
class supporting both fixed-sized field extractions and
blank-delimited extractions (as well as unformatted read
calls). The
class may be initialized as a
wrapper around an existing istream
, or
it can be initialized using the name of an existing file. The class is derived
from istream
, allowing all extractions and operations supported by
istream
s in general. Fistream
defines the following data members:
d_filebuf
: a filebuffer used when Fistream
reads its information
from a named (existing) file. Since the filebuffer is only needed in
that case, and since it must be allocated dynamically, it is defined
as a unique_ptr<filebuf>
object.
d_streambuf
: a pointer to Fistream
's streambuf
. It points
to d_filebuf
when Fistream
opens a file by name. When an
existing istream
is used to construct an Fistream
, it will
point to the existing istream
's streambuf
.
d_iss
: an istringstream
object used for the fixed field
extractions.
d_width
: a size_t
indicating the width of the field to
extract. If 0 no fixed field extractions is used, but
information is extracted from the istream
base class object
using standard extractions.
Fistream
's class interface:
class Fistream: public std::istream { std::unique_ptr<std::filebuf> d_filebuf; std::streambuf *d_streambuf; std::istringstream d_iss; size_t d_width;
As stated, Fistream
objects can be constructed from either a
filename or an existing istream
object. The class interface therefore
declares two constructors:
Fistream(std::istream &stream); Fistream(char const *name, std::ios::openmode mode = std::ios::in);
When an Fistream
object is constructed using an existing istream
object, the Fistream
's istream
part simply uses the stream
's
streambuf
object:
Fistream::Fistream(istream &stream) : istream(stream.rdbuf()), d_streambuf(rdbuf()), d_width(0) {}
When an fstream
object is constructed using a filename, the
istream
base initializer is given a new filebuf
object to be used as
its streambuf
. Since the class's data members are not initialized before
the class's base class has been constructed, d_filebuf
can only be
initialized thereafter. By then, the filebuf
is only available as
rdbuf
, returning a streambuf
. However, as it is actually a
filebuf
, a reinterpret_cast
is used to cast the streambuf
pointer
returned by rdbuf
to a filebuf *
, so d_filebuf
can be
initialized:
Fistream::Fistream(char const *name, ios::openmode mode) : istream(new filebuf()), d_filebuf(reinterpret_cast<filebuf *>(rdbuf())), d_streambuf(d_filebuf.get()), d_width(0) { d_filebuf->open(name, mode); }
There is only one additional public member: setField(field const
&)
. This member defines the size of the next field to extract. Its
parameter is a reference to a field
class, a manipulator class
defining the width of the next field.
Since a field &
is mentioned in Fistream
's interface, field
must be declared before Fistream
's interface starts. The class field
itself is simple and declares Fistream
as its friend. It has two data
members: d_width
specifies the width of the next field, and d_newWidth
which is set to true
if d_width
's value should actually be used. If
d_newWidth
is false, Fistream
returns to its standard extraction
mode. The class field
has two constructors: a default
constructor, setting d_newWidth
to false
, and a second constructor
expecting the width of the next field to extract as its value. Here is the
class field
:
class field { friend class Fistream; size_t d_width; bool d_newWidth; public: field(size_t width); field(); }; inline field::field(size_t width) : d_width(width), d_newWidth(true) {} inline field::field() : d_newWidth(false) {}
Since field
declares Fistream
as its friend, setField
may
inspect field
's members directly.
Time to return to setField
. This function expects a reference to a
field
object, initialized in one of three different ways:
field()
: When setField
's argument is a field
object
constructed by its default constructor the next extraction will use
the same fieldwidth as the previous extraction.
field(0)
: When this field
object is used as setField
's
argument, fixed-sized field extraction stops, and the Fistream
will act like any standard istream
object.
field(x)
: When the field
object itself is initialized by a
non-zero size_t value x
, then the next field width will be x
characters wide. The preparation of such a field is left to
setBuffer
, Fistream
's only private member.
setField
's implementation:
std::istream &Fistream::setField(field const ¶ms) { if (params.d_newWidth) // new field size requested d_width = params.d_width; // set new width if (!d_width) // no width? rdbuf(d_streambuf); // return to the old buffer else setBuffer(); // define the extraction buffer return *this; }
The private member setBuffer
defines a buffer of d_width + 1
characters and uses read
to fill the buffer with d_width
characters. The buffer is terminated by an ASCII-Z
character. This buffer
is used to initialize the d_str
member. Fistream
's rdbuf
member is
used to extract the d_str
's data via the Fistream
object itself:
void Fistream::setBuffer() { char *buffer = new char[d_width + 1]; rdbuf(d_streambuf); // use istream's buffer to buffer[read(buffer, d_width).gcount()] = 0; // read d_width chars, // terminated by ascii-Z d_iss.str(buffer); delete buffer; rdbuf(d_iss.rdbuf()); // switch buffers }
Although setField
could be used to configure Fistream
to use or
not to use fixed-sized field extraction, using manipulators is probably
preferable. To allow field
objects to be used as manipulators an
overloaded extraction operator was defined. This extraction operator accepts
istream &
and a field const &
objects. Using this extraction
operator, statements like
fis >> field(2) >> x >> field(0);are possible (assuming
fis
is a Fistream
object). Here is the
overloaded operator
>>, as well as its declaration:
istream &std::operator>>(istream &str, field const ¶ms) { return reinterpret_cast<Fistream *>(&str)->setField(params); }
Declaration:
namespace std { istream &operator>>(istream &str, FBB::field const ¶ms); }
Finally, an example. The following program uses a Fistream
object to
url-decode url-encoded information appearing at its standard input:
int main() { Fistream fis(cin); fis >> hex; while (true) { size_t x; switch (x = fis.get()) { case '\n': cout << '\n'; break; case '+': cout << ' '; break; case '%': fis >> field(2) >> x >> field(0); // FALLING THROUGH default: cout << static_cast<char>(x); break; case EOF: return 0; } } } /* Generated output after: echo My+name+is+%60Ed%27 | a.out My name is `Ed' */
fork
system call is well
known. When a program needs to start a new process,
system
can be used
System
requires the program to wait for the
child process to
terminate. The more general way to spawn subprocesses is to use fork
.
In this section we investigate how C++ can be used to wrap classes around
a complex system call like fork
. Much of what follows in this section
directly applies to the Unix operating system, and the discussion therefore
focuses on that operating system. Other systems usually provide
comparable facilities. What follows is closely related to the
Template Design Pattern
(cf. Gamma et al. (1995)
Design Patterns, Addison-Wesley)
When fork
is called, the current program is duplicated in memory, thus
creating a new process. Following the duplication both processes continue
their execution just below the fork
system call. The two processes may
inspect fork
's return value: the return value in the
original process (called the
parent process) differs from the return
value in the newly created process (called the
child process):
fork
returns the
process ID of the
(child) process that was created by the fork
system call. This is a
positive integer value.
fork
returns 0.
fork
fails, -1 is returned.
A basic Fork
class should hide all bookkeeping details of a system
call like fork
from its users. The class Fork
developed here will do
just that. The class itself only ensures the proper execution of the fork
system call. Normally, fork
is called to start a child process, usually
boiling down to the execution of a separate process. This child process may
expect input at its standard input stream and/or may generate output to its
standard output and/or standard error streams. Fork
does not know all
this, and does not have to know what the child process will do. Fork
objects should be able to start their child processes.
Fork
's constructor cannot know what actions its child
process should perform. Similarly, it cannot know what actions the parent
process should perform. For these kind of situations, the
template method design pattern
was developed. According to Gamma c.s., the template method design
pattern
``Define(s) the skeleton of an algorithm in an operation, deferring some steps to subclasses. [The] Template Method (design pattern) lets subclasses redefine certain steps of an algorithm, without changing the algorithm's structure.''
This design pattern allows us to define an
abstract base class
already providing the essential steps related to the fork
system call,
deferring the implementation of other parts of the fork
system call to
subclasses.
The Fork
abstract base class has the following characteristics:
d_pid
. In the parent process this data
member contains the child's
process id and in the child process it has
the value 0. Its public interface declares only two members:
fork
member function, responsible for the actual forking
(i.e., it will create the (new) child process);
virtual
destructor ~Fork
.
Fork
's interface:
class Fork { int d_pid; public: virtual ~Fork() = default; void fork(); protected: int pid() const; int waitForChild(); // returns the status private: virtual void childRedirections(); virtual void parentRedirections(); virtual void childProcess() = 0; // pure virtual members virtual void parentProcess() = 0; };
protected
section and can thus only be used by derived classes. They
are:
pid()
: The member function pid
allows derived classes to
access the system fork
's return value:
inline int Fork::pid() const { return d_pid; }
waitForChild()
: The member int waitForChild
can be called by
parent processes to wait for the completion of their child processes (as
discussed below). This member is declared in the class interface. Its
implementation is:
#include "fork.ih" int Fork::waitForChild() { int status; waitpid(d_pid, &status, 0); return WEXITSTATUS(status); }This simple implementation returns the child's exit status to the parent. The called system function
waitpid
blocks until the
child terminates.
fork
system calls are used,
parent processes
and
child processes
must always be distinguished. The
main distinction between these processes is that d_pid
becomes
the child's process-id in the parent process, while d_pid
becomes
0 in the child process itself. Since these two processes must always be
distinguished (and present), their implementation by classes derived from
Fork
is enforced by Fork
's interface: the members childProcess
,
defining the child process' actions and parentProcess
, defining the
parent process' actions were defined as pure virtual functions.
childRedirections()
: this member should be implemented if any
standard stream (cin, cout
) or cerr
must be redirected in the
child process (cf. section 23.3.1);
parentRedirections()
: this member should be implemented if any
standard stream (cin, cout
) or cerr
must be redirected in the
parent process.
inline void Fork::childRedirections() {} inline void Fork::parentRedirections() {}
fork
calls the system function fork
(Caution: since the system function fork
is called by a member
function having the same name, the ::
scope resolution operator must be
used to prevent a recursive call of the member function itself).
::fork
's return value determines whether parentProcess
or childProcess
is called. Maybe redirection is
necessary. Fork::fork
's implementation calls childRedirections
just before calling childProcess
, and parentRedirections
just
before calling parentProcess
:
#include "fork.ih" void Fork::fork() { if ((d_pid = ::fork()) < 0) throw "Fork::fork() failed"; if (d_pid == 0) // childprocess has pid == 0 { childRedirections(); childProcess(); exit(1); // we shouldn't come here: } // childProcess() should exit parentRedirections(); parentProcess(); }In
fork.cc
the class's
internal header file fork.ih
is
included. This header file takes care of the inclusion of the necessary system
header files, as well as the inclusion of fork.h
itself. Its
implementation is:
#include "fork.h" #include <cstdlib> #include <unistd.h> #include <sys/types.h> #include <sys/wait.h>
Child processes should not return: once they have completed their tasks,
they should terminate. This happens automatically when the child process
performs a call to a member of the
exec...
family, but if the child
itself remains active, then it must make sure that it terminates properly. A
child process normally uses
exit
to terminate itself, but note that
exit
prevents the activation of destructors of objects
defined at the same or more superficial nesting levels than the level at
which exit
is called. Destructors of globally defined objects are
activated when exit
is used. When using exit
to terminate
childProcess
, it should either itself call a support member function
defining all nested objects it needs, or it should define all its objects in a
compound statement (e.g., using a throw
block) calling exit
beyond
the compound statement.
Parent processes should normally wait for their children to complete. Terminating child processes inform their parents that they are about to terminate by sending a signal that should be caught by their parents. If child processes terminate and their parent processes do not catch those signals then such child processes remain visible as so-called zombie processes.
If parent processes must wait for their children to complete, they may
call the member waitForChild
. This member returns the exit status of a
child process to its parent.
There exists a situation where the child process continues to
live, but the parent dies. This is a fairly natural event: parents tend to
die before their children do. In our context (i.e. C++), this is called a
daemon program. In a daemon the parent process dies and the child program
continues to run as a child of the basic
init
process. Again, when the
child eventually dies a signal is sent to its `step-parent'
init
. This
does not create a zombie as init
catches the termination signals of all
its (step-) children. The construction of a daemon process is very simple,
given the availability of the class Fork
(cf. section 23.3.2).
ios::rdbuf
member function. By assigning the
streambuf
of a stream to another stream, both stream objects access the
same streambuf
, thus implementing redirection at the level of the
programming language itself.
This may be fine within the context of a C++ program, but once we
leave that context the redirection terminates. The operating system does not
know about streambuf
objects. This situation is encountered, e.g., when a
program uses a
system
call to start a subprogram. The example program at
the end of this section uses C++ redirection to redirect the information
inserted into
cout
to a file, and then calls
system("echo hello world")to echo a well-known line of text. Since
echo
writes its information
to the standard output, this would be the program's redirected file if the
operating system would recognize C++'s redirection.
But redirection doesn't happen. Instead, hello world
still appears at
the program's standard output and the redirected file is left untouched. To
write hello world
to the redirected file redirection must be realized at
the operating system level. Some operating systems (e.g.,
Unix and
friends) provide system calls like
dup
and
dup2
to accomplish
this. Examples of the use of these system calls are given in section
23.3.3.
Here is the example of the failing redirection at the system level
following C++ redirection using streambuf
redirection:
#include <iostream> #include <fstream> #include <cstdlib> using namespace std; int main() { ofstream of("outfile"); streambuf *buf = cout.rdbuf(of.rdbuf()); cout << "To the of stream\n"; system("echo hello world"); cout << "To the of stream\n"; cout.rdbuf(buf); } /* Generated output: on the file `outfile' To the of stream To the of stream On standard output: hello world */
fork
is to start a
child process. The parent process terminates immediately after spawning the
child process. If this happens, the child process continues to run as a child
process of
init
, the always running first process on
Unix systems. Such
a process is often called a
daemon, running as a
background process.
Although the next example can easily be constructed as a plain C
program, it was included in the C++ Annotations because it is so closely
related to the current discussion of the Fork
class. I thought about
adding a daemon
member to that class, but eventually decided against it
because the construction of a daemon program is very simple and requires no
features other than those currently offered by the class Fork
. Here is an
example illustrating the construction of such a daemon program. Its child
process doesn't do
exit
but throw 0
which is caught by the catch
clause of the child's main
function. Doing this ensures that any objects
defined by the child process are properly destroyed:
#include <iostream> #include <unistd.h> #include "fork.h" class Daemon: public Fork { virtual void parentProcess() // the parent does nothing. {} virtual void childProcess() // actions by the child { sleep(3); // just a message... std::cout << "Hello from the child process\n"; throw 0; // The child process ends } }; int main() try { Daemon().fork(); } catch(...) {} /* Generated output: The next command prompt, then after 3 seconds: Hello from the child process */
pipe
system call. When two processes want to communicate
using such file descriptors, the following happens:
pipe
system call. One of the file descriptors is used for writing, the
other file descriptor is used for reading.
fork
function is called),
duplicating the file descriptors. Now we have four file descriptors as
the child process and the parent process both have their own copies of the two
file descriptors created by pipe
.
Pipe
class
developed here. Let's have a look at its characteristics (before using
functions like pipe
and dup
the compiler must have read the
<unistd.h>
header file):
pipe
system call expects a pointer to two int
values,
which will represent, subsequent to the pipe
call, the file descriptor
used for reading and the file descriptor used for writing, respectively. To
avoid confusion, the class Pipe
defines an enum
having values
associating the indices of the array of 2-int
s with symbolic
constants. The two file descriptors themselves are stored in a data member
d_fd
. Here is the initial section of the class's interface:
class Pipe { enum RW { READ, WRITE }; int d_fd[2];
pipe
to create a set of associated file descriptors used for
accessing both ends of a pipe:
Pipe::Pipe() { if (pipe(d_fd)) throw "Pipe::Pipe(): pipe() failed"; }
readOnly
and readFrom
are used to configure the
pipe's reading end. The latter function is used when using redirection. It is
provided with an alternate file descriptor to be used for reading from the
pipe. Usually this alternate file descriptor is
STDIN_FILENO
, allowing
cin
to extract information from the pipe. The former function is merely
used to configure the reading end of the pipe. It closes the matching writing
end and returns a file descriptor that can be used to read from the pipe:
int Pipe::readOnly() { close(d_fd[WRITE]); return d_fd[READ]; } void Pipe::readFrom(int fd) { readOnly(); redirect(d_fd[READ], fd); close(d_fd[READ]); }
writeOnly
and two writtenBy
members are available to
configure the writing end of a pipe. The former function is only used to
configure the writing end of the pipe. It closes the reading end, and
returns a file descriptor that can be used for writing to the pipe:
int Pipe::writeOnly() { close(d_fd[READ]); return d_fd[WRITE]; } void Pipe::writtenBy(int fd) { writtenBy(&fd, 1); } void Pipe::writtenBy(int const *fd, size_t n) { writeOnly(); for (size_t idx = 0; idx < n; idx++) redirect(d_fd[WRITE], fd[idx]); close(d_fd[WRITE]); }
For the latter member two overloaded versions are available:
writtenBy(int fileDescriptor)
is used to configure single
redirection, so that a specific file descriptor (usually
STDOUT_FILENO
or
STDERR_FILENO
) can be used to write to the pipe;
(writtenBy(int *fileDescriptor, size_t n = 2))
may be used
to configure multiple redirection, providing an array argument containing
file descriptors. Information written to any of these file descriptors is
actually written to the pipe.
redirect
, used to set up
redirection through the
dup2
system call. This function expects two file
descriptors. The first file descriptor represents a file descriptor that can
be used to access the device's information; the second file descriptor is an
alternate file descriptor that may also be used to access the device's
information. Here is redirect
's implementation:
void Pipe::redirect(int d_fd, int alternateFd) { if (dup2(d_fd, alternateFd) < 0) throw "Pipe: redirection failed"; }
Pipe
objects, we'll use Fork
and Pipe
in various example programs.
ParentSlurp
, derived from Fork
, starts a child process
executing a stand-along program (like /bin/ls
). The (standard) output of
the execed program is not shown on the screen but is read by the parent
process.
For demonstration purposes the parent process will write the lines it
receives to its standard output stream, prepending linenumbers to the
lines. It is attractive to redirect the parent's standard input stream to
allow the parent to read the output from the child process using its
std::cin
input stream. Therefore, the only pipe in the program is used
as an input pipe for the parent, and an output pipe for the child.
The class ParentSlurp
has the following characteristics:
Fork
. Before starting ParentSlurp
's class
interface, the compiler must have read fork.h
and pipe.h
. The class
only uses one data member, a Pipe
object d_pipe
.
Pipe
's constructor already defines a pipe, and as d_pipe
is automatically initialized by ParentSlurp
's default constructor, which
is implicitly provided. All additional members are only there for
ParentSlurp
's own benefit so they can be defined in the class's (implicit)
private
section. Here is the class's interface:
class ParentSlurp: public Fork { Pipe d_pipe; virtual void childRedirections(); virtual void parentRedirections(); virtual void childProcess(); virtual void parentProcess(); };
childRedirections
member configures the writing end of the
pipe. So, all information written to the child's standard output stream will
end up in the pipe. The big advantage of this is that no additional streams
are needed to write to a file descriptor:
inline void ParentSlurp::childRedirections() { d_pipe.writtenBy(STDOUT_FILENO); }
parentRedirections
member, configures the reading end of
the pipe. It does so by connecting the reading end of the pipe to the parent's
standard input file descriptor (STDIN_FILENO
). This allows the parent to
perform extractions from cin
, not requiring any additional streams for
reading.
inline void ParentSlurp::parentRedirections() { d_pipe.readFrom(STDIN_FILENO); }
childProcess
member only needs to concentrate on its own
actions. As it only needs to execute a program (writing information to its
standard output), the member can consist of one single statement:
inline void ParentSlurp::childProcess() { execl("/bin/ls", "/bin/ls", 0); }
parentProcess
member simply `slurps' the information
appearing at its standard input. Doing so, it actually reads the child's
output. It copies the received lines to its standard output stream prefixing
line numbers to them:
void ParentSlurp::parentProcess() { std::string line; size_t nr = 1; while (getline(std::cin, line)) std::cout << nr++ << ": " << line << '\n'; waitForChild(); }
ParentSlurp
object, and
calls its fork()
member. Its output consists of a numbered list of files
in the directory where the program is started. Note that the program also
needs the fork.o, pipe.o
and waitforchild.o
object files (see
earlier sources):
int main() { ParentSlurp().fork(); } /* Generated Output (example only, actually obtained output may differ): 1: a.out 2: bitand.h 3: bitfunctional 4: bitnot.h 5: daemon.cc 6: fdinseek.cc 7: fdinseek.h ... */
start
: this start a new child process. The parent returns the child's
ID (a number) to the user. The ID is thereupon be used to identify a
particular child process
<nr> text
sends ``text
'' to the child process having ID
<nr>
;
stop <nr>
terminates the child process having ID <nr>
;
exit
terminates the parent as well as all its child processes.
A problem with programs like our monitor is that they programs allow
asynchronous input from multiple sources. Input may appear at the
standard input as well as at the input-sides of pipes. Also, multiple output
channels are used. To handle situations like these, the
select
system
call was developed.
select
system call was developed to handle asynchronous
I/O multiplexing.
The select
system call is used to handle, e.g., input appearing
simultaneously at a set of file descriptors.
The select
function is rather complex, and its full discussion is
beyond the C++ Annotations' scope. By encapsulating select
in a class
Selector
, hiding its details and offering an intuitively attractive
interface, its use is simplified. The Selector
class has these
features:
Select
's members are very small,
most members can be implemented inline. The class requires quite a few data
members. Most of these data members belong to types that require some system
headers to be included first:
#include <limits.h> #include <unistd.h> #include <sys/time.h> #include <sys/types.h>
fd_set
is a
type designed to be used by select
and variables of this type contain the
set of file descriptors on which select
may sense some
activity. Furthermore, select
allows us to fire an
asynchronous alarm. To set the alarm time, the class Selector
defines a
timeval
data member. Other members are used for internal
bookkeeping purposes. Here is the class Selector
's interface:
class Selector { fd_set d_read; fd_set d_write; fd_set d_except; fd_set d_ret_read; fd_set d_ret_write; fd_set d_ret_except; timeval d_alarm; int d_max; int d_ret; int d_readidx; int d_writeidx; int d_exceptidx; public: Selector(); int exceptFd(); int nReady(); int readFd(); int wait(); int writeFd(); void addExceptFd(int fd); void addReadFd(int fd); void addWriteFd(int fd); void noAlarm(); void rmExceptFd(int fd); void rmReadFd(int fd); void rmWriteFd(int fd); void setAlarm(int sec, int usec = 0); private: int checkSet(int *index, fd_set &set); void addFd(fd_set *set, int fd); };
Selector()
: the (default) constructor. It
clears the read, write, and execute fd_set
variables, and switches off the
alarm. Except for d_max
, the remaining data members do not require
specific initializations:
Selector::Selector() { FD_ZERO(&d_read); FD_ZERO(&d_write); FD_ZERO(&d_except); noAlarm(); d_max = 0; }
int wait()
: this member blocks until the alarm times
out or until activity is sensed at any of the file descriptors monitored by
the Selector
object. It throws an exception when the select
system
call itself fails:
int Selector::wait() { timeval t = d_alarm; d_ret_read = d_read; d_ret_write = d_write; d_ret_except = d_except; d_readidx = 0; d_writeidx = 0; d_exceptidx = 0; d_ret = select(d_max, &d_ret_read, &d_ret_write, &d_ret_except, t.tv_sec == -1 && t.tv_usec == -1 ? 0 : &t); if (d_ret < 0) throw "Selector::wait()/select() failed"; return d_ret; }
int nReady
: this member function's return value is only
defined when wait
has returned. In that case it returns 0 for an
alarm-timeout, -1 if select
failed, and otherwise the number of file
descriptors on which activity was sensed:
inline int Selector::nReady() { return d_ret; }
int readFd()
: this member function's return
value is also only defined after wait
has returned. Its return value is
-1 if no (more) input file descriptors are available. Otherwise the next file
descriptor available for reading is returned:
inline int Selector::readFd() { return checkSet(&d_readidx, d_ret_read); }
int writeFd()
: operating analogously to readFd
, it
returns the next file descriptor to which output is written. It uses
d_writeidx
and d_ret_read
and is implemented analogously to
readFd
;
int exceptFd()
: operating analogously to readFd
, it
returns the next exception file descriptor on which activity was sensed. It
uses d_except_idx
and d_ret_except
and is implemented analogously to
readFd
;
void setAlarm(int sec, int usec = 0)
: this member
activates Select
's alarm facility. At least the number of seconds to wait
for the alarm to go off must be specified. It simply assigns values to
d_alarm
's fields. At the next Select::wait
call, the alarm will fire
(i.e., wait
returns with return value 0) once the configured
alarm-interval has passed:
inline void Selector::setAlarm(int sec, int usec) { d_alarm.tv_sec = sec; d_alarm.tv_usec = usec; }
void noAlarm()
: this member switches off the alarm, by
simply setting the alarm interval to a very long period:
inline void Selector::noAlarm() { setAlarm(-1, -1); }
void addReadFd(int fd)
: this member adds a
file descriptor to the set of input file descriptors monitored by the
Selector
object. The member function wait
will return once input is
available at the indicated file descriptor:
inline void Selector::addReadFd(int fd) { addFd(&d_read, fd); }
void addWriteFd(int fd)
: this member adds a file
descriptor to the set of output file descriptors monitored by the Selector
object. The member function wait
will return once output is available at
the indicated file descriptor. Using d_write
, it is implemented
analogously to addReadFd
;
void addExceptFd(int fd)
: this member adds a file
descriptor to the set of exception file descriptors to be monitored by the
Selector
object. The member function wait
will return once activity
is sensed at the indicated file descriptor. Using d_except
, it is
implemented analogously to addReadFd
;
void rmReadFd(int fd)
: this member removes a file
descriptor from the set of input file descriptors monitored by the
Selector
object:
inline void Selector::rmReadFd(int fd) { FD_CLR(fd, &d_read); }
void rmWriteFd(int fd)
: this member removes a file
descriptor from the set of output file descriptors monitored by the
Selector
object. Using d_write
, it is implemented analogously to
rmReadFd
;
void rmExceptFd(int fd)
: this member removes a file
descriptor from the set of exception file descriptors to be monitored by the
Selector
object. Using d_except
, it is implemented analogously to
rmReadFd
;
private
section:
addFd
adds a file descriptor to a fd_set
:
void Selector::addFd(fd_set *set, int fd) { FD_SET(fd, set); if (fd >= d_max) d_max = fd + 1; }
checkSet
tests whether a file descriptor (*index
)
is found in a fd_set
:
int Selector::checkSet(int *index, fd_set &set) { int &idx = *index; while (idx < d_max && !FD_ISSET(idx, &set)) ++idx; return idx == d_max ? -1 : idx++; }
monitor
program uses a Monitor
object doing most of the
work. The class Monitor
's public interface only offers a default
constructor and one member, run
, to perform its tasks. All other member
functions are located in the class's private
section.
Monitor
defines the private
enum Commands
, symbolically
listing the various commands its input language supports, as well as several
data members. Among the data members are a Selector
object and a map
using child order numbers as its keys and pointer to Child
objects (see
section 23.3.5.3) as its values. Furthermore, Monitor
has a static array
member s_handler[]
, storing pointers to member functions handling user
commands.
A destructor should be implemented as well, but its implementation is left
as an exercise to the reader. Here is Monitor
's interface, including the
interface of the nested class Find
that is used to create a function
object:
class Monitor { enum Commands { UNKNOWN, START, EXIT, STOP, TEXT, sizeofCommands }; typedef std::map<int, std::shared_ptr<Child>> MapIntChild; friend class Find; class Find { int d_nr; public: Find(int nr); bool operator()(MapIntChild::value_type &vt) const; }; Selector d_selector; int d_nr; MapIntChild d_child; static void (Monitor::*s_handler[])(int, std::string const &); static int s_initialize; public: enum Done {}; Monitor(); void run(); private: static void killChild(MapIntChild::value_type it); static int initialize(); Commands next(int *value, std::string *line); void processInput(); void processChild(int fd); void createNewChild(int, std::string const &); void exiting(int = 0, std::string const &msg = std::string()); void sendChild(int value, std::string const &line); void stopChild(int value, std::string const &); void unknown(int, std::string const &); };
Since there's only one non-class type data member, the class's constructor
could be implemented inline. The array
s_handler
, storing pointers to functions needs to be initialized as
well. This can be accomplished in several ways:
Command
enumeration only contains a fairly limited set
of commands, compile-time initialization could be considered:
void (Monitor::*Monitor::s_handler[])(int, string const &) = { &Monitor::unknown, // order follows enum Command's &Monitor::createNewChild, // elements &Monitor::exiting, &Monitor::stopChild, &Monitor::sendChild, };
The advantage of this is that it's simple, not requiring any run-time
effort. The disadvantage is of course relatively complex maintenance. If for
some reason Commands
is modified, s_handler
must be modified as
well. In cases like these, compile-time initialization often is
asking for trouble. There is a simple alternative though.
Monitor
's interface we see a static data member
s_initialize
and a static member function initialize
. The static
member function handles the initialization of the s_handler
array. It
explicitly assigns the array's elements and any modification in ordering of
the enum Command
's values is automatically accounted for by recompiling
initialize
:
void (Monitor::*Monitor::s_handler[sizeofCommands])(int, string const &); int Monitor::initialize() { s_handler[UNKNOWN] = &Monitor::unknown; s_handler[START] = &Monitor::createNewChild; s_handler[EXIT] = &Monitor::exiting; s_handler[STOP] = &Monitor::stopChild; s_handler[TEXT] = &Monitor::sendChild; return 0; }
The member initialize
is a static member and so it can be
called to initialize s_initialize
, a static int
variable. The
initialization is enforced by placing the initialization statement in the
source file of a function that is known to be executed. It could be main
,
but if we're Monitor
's maintainers and only have control over the library
containing Monitor
's code then that's not an option. In those cases the
source file containing the destructor is a very good candidate. If a class
has only one constructor and it's not defined inline then the
constructor's source file is a good candidate as well. In Monitor
's
current implementation the initialization statement is put in run
's source
file, reasoning that s_handler
is only needed when run
is used.
Monitor
's constructor is a very simple function and may be implemented
inline:
inline Monitor::Monitor() : d_nr(0) {}
The core of Monitor
's activities are performed by run
. It
performs the following tasks:
Monitor
object only monitors its standard
input. The set of input file descriptors to which d_selector
listens
is initialized to STDIN_FILENO
.
d_selector
's wait
function is called.
If input on cin
is available, it is processed by processInput
.
Otherwise, the input has arrived from a child process. Information sent by
children is processed by processChild
.
Monitor
caught the termination signals. As noted by Ben Simons (ben at
mrxfx dot com
) this is inappropriate. Instead, the process spawning child
processes has that responsibility (so, the parent process is responsible for
its child processes; a child process is in turn responsible for its own child
processes). Thanks, Ben).
run
's source file also defines and initializes
s_initialize
to ensure the proper initialization of the s_handler
array.
run
's implementation and s_initialize
's definition:
#include "monitor.ih" int Monitor::s_initialize = Monitor::initialize(); void Monitor::run() { d_selector.addReadFd(STDIN_FILENO); while (true) { cout << "? " << flush; try { d_selector.wait(); int fd; while ((fd = d_selector.readFd()) != -1) { if (fd == STDIN_FILENO) processInput(); else processChild(fd); } cout << "NEXT ...\n"; } catch (char const *msg) { exiting(1, msg); } } }
The member function processInput
reads the commands entered by the
user using the program's standard input stream. The member itself is rather
simple. It calls next
to obtain the next command entered by the user, and
then calls the corresponding function using the matching element of the
s_handler[]
array. Here are the members processInput
and next
:
void Monitor::processInput() { string line; int value; Commands cmd = next(&value, &line); (this->*s_handler[cmd])(value, line); }
Monitor::Commands Monitor::next(int *value, string *line) { if (!getline(cin, *line)) exiting(1, "Command::next(): reading cin failed"); if (*line == "start") return START; if (*line == "exit" || *line == "quit") { *value = 0; return EXIT; } if (line->find("stop") == 0) { istringstream istr(line->substr(4)); istr >> *value; return !istr ? UNKNOWN : STOP; } istringstream istr(line->c_str()); istr >> *value; if (istr) { getline(istr, *line); return TEXT; } return UNKNOWN; }
All other input sensed by d_select
is created by child
processes. Because d_select
's readFd
member returns the corresponding
input file descriptor, this descriptor can be passed to
processChild
. Using a
IFdStreambuf
(see section 23.1.2.1), its
information is read from an input stream. The communication protocol used here
is rather basic. For every line of input sent to a child, the child replies by
sending back exactly one line of text. This line is then read by
processChild
:
void Monitor::processChild(int fd) { IFdStreambuf ifdbuf(fd); istream istr(&ifdbuf); string line; getline(istr, line); cout << d_child[fd]->pid() << ": " << line << '\n'; }
The construction d_child[fd]->pid()
used in the above source deserves
some special attention. Monitor
defines the data member map<int,
shared_ptr<Child>> d_child
. This map contains the child's order number as
its key, and a (shared) pointer to the Child
object as its value. A shared
pointer is used here, rather than a Child
object, since we want to use the
facilities offered by the map, but don't want to copy a Child
object time
and again.
Now that run
's implementation has been covered, we'll concentrate on
the various commands users might enter:
start
command is issued, a new child process is started.
A new element is added to d_child
by the member createNewChild
. Next,
the Child
object should start its activities, but the Monitor
object
can not wait for the child process to complete its activities, as there is no
well-defined endpoint in the near future, and the user will probably want to
enter more commands. Therefore, the Child
process must run as a
daemon. So the forked process terminates immediately, but its own child
process will continue to run (in the background). Consequently,
createNewChild
calls the child's fork
member. Although it is the
child's fork
function that is called, it is still the monitor program
wherein that fork
function is called. So, the monitor program is
duplicated by fork
. Execution then continues:
Child
's parentProcess
in its parent process;
Child
's childProcess
in its child process
Child
's parentProcess
is an empty function, returning
immediately, the Child
's parent process effectively continues immediately
below createNewChild
's cp->fork()
statement. As the child process
never returns (see section 23.3.5.3), the code below cp->fork()
is never
executed by the Child
's child process. This is exactly as it should be.
In the parent process, createNewChild
's remaining code simply
adds the file descriptor that's available for reading information from the
child to the set of input file descriptors monitored by d_select
, and
uses d_child
to establish the association between that
file descriptor and the Child
object's address:
void Monitor::createNewChild(int, string const &) { Child *cp = new Child(++d_nr); cp->fork(); int fd = cp->readFd(); d_selector.addReadFd(fd); d_child[fd].reset(cp); cerr << "Child " << d_nr << " started\n"; }
stop <nr>
and <nr> text
commands. The former command terminates child process
<nr>
, by calling stopChild
. This function locates the child process
having the order number using an anonymous object of the class Find
,
nested inside Monitor
. The class Find
simply compares the
provided nr
with the children's order number returned by their nr
members:
inline Monitor::Find::Find(int nr) : d_nr(nr) {} inline bool Monitor::Find::operator()(MapIntChild::value_type &vt) const { return d_nr == vt.second->nr(); }
If the child process having order number nr
was found, its file
descriptor is removed from d_selector
's set of input file
descriptors. Then the child process itself is terminated by the static member
killChild
. The member killChild
is declared as a static member
function, as it is used as function argument of the for_each
generic
algorithm by erase
(see below). Here is killChild
's
implementation:
void Monitor::killChild(MapIntChild::value_type it) { if (kill(it.second->pid(), SIGTERM)) cerr << "Couldn't kill process " << it.second->pid() << '\n'; // reap defunct child process int status = 0; while( waitpid( it.second->pid(), &status, WNOHANG) > -1) ; }
Having terminated the specified child process, the corresponding Child
object is destroyed and its pointer is removed from d_child
:
void Monitor::stopChild(int nr, string const &) { auto it = find_if(d_child.begin(), d_child.end(), Find(nr)); if (it == d_child.end()) cerr << "No child number " << nr << '\n'; else { d_selector.rmReadFd(it->second->readFd()); d_child.erase(it); } }
<nr> text>
sends text
to child process
nr
using the member function sendChild
. This function too, will
use a Find
object to locate the process having order number nr
, and
then simply inserts the text into the writing end of a pipe connected to
the indicated child process:
void Monitor::sendChild(int nr, string const &line) { auto it = find_if(d_child.begin(), d_child.end(), Find(nr)); if (it == d_child.end()) cerr << "No child number " << nr << '\n'; else { OFdnStreambuf ofdn(it->second->writeFd()); ostream out(&ofdn); out << line << '\n'; } }
exit
or quit
the member exiting
is
called. It terminates all child processes using the
for_each
generic
algorithm (see section 19.1.17) to visit all elements of
d_child
. Then the program itself ends:
void Monitor::exiting(int value, string const &msg) { for_each(d_child.begin(), d_child.end(), killChild); if (msg.length()) cerr << msg << '\n'; throw value; }
main
function is simple and needs no further comment:
int main() try { Monitor().run(); } catch (int exitValue) { return exitValue; }
Monitor
object starts a child process, it creates an object
of the class Child
. The Child
class is derived from the class
Fork
, allowing it to operate as a
daemon (as discussed in the
previous section). Since Child
is a daemon class, we know that its parent
process must be defined as an empty function. Its childProcess
member
has a non-empty implementation. Here are the characteristics of the class
Child
:
Child
class has two Pipe
data members, to handle
communications between its own child- and parent processes. As these pipes are
used by the Child
's child process, their names refer to the child
process. The child process reads from d_in
, and writes to d_out
. Here
is the interface of the class Child
:
class Child: public Fork { Pipe d_in; Pipe d_out; int d_parentReadFd; int d_parentWriteFd; int d_nr; public: Child(int nr); virtual ~Child(); int readFd() const; int writeFd() const; int pid() const; int nr() const; private: virtual void childRedirections(); virtual void parentRedirections(); virtual void childProcess(); virtual void parentProcess(); };
Child
's constructor simply stores its argument, a
child-process order number, in its own d_nr
data member:
inline Child::Child(int nr) : d_nr(nr) {}
Child
's child process obtains commands from its standard
input stream and writes its output to its standard output stream. Since the
actual communication channels are pipes, redirections must be used. The
childRedirections
member looks like this:
void Child::childRedirections() { d_in.readFrom(STDIN_FILENO); d_out.writtenBy(STDOUT_FILENO); }
d_in
and
reads from d_out
. Here is parentRedirections
:
void Child::parentRedirections() { d_parentReadFd = d_out.readOnly(); d_parentWriteFd = d_in.writeOnly(); }
Child
object exists until it is destroyed by the
Monitor
's stopChild
member. By allowing its creator, the Monitor
object, to access the parent-side ends of the pipes, the Monitor
object
can communicate with the Child
's child process via those pipe-ends. The
members readFd
and writeFd
allow the Monitor
object to access
these pipe-ends:
inline int Child::readFd() const { return d_parentReadFd; } inline int Child::writeFd() const { return d_parentWriteFd; }
Child
object's child process performs two tasks:
childProcess
defines a local
Selector
object, adding STDIN_FILENO
to its set of monitored input
file descriptors.
Then, in an endless loop, childProcess
waits for selector.wait()
to return. When the alarm goes off it sends a message to its standard output
(hence, into the writing pipe). Otherwise, it will echo the messages appearing
at its standard input to its standard output. Here is the childProcess
member:
void Child::childProcess() { Selector selector; size_t message = 0; selector.addReadFd(STDIN_FILENO); selector.setAlarm(5); while (true) { try { if (!selector.wait()) // timeout cout << "Child " << d_nr << ": standing by\n"; else { string line; getline(cin, line); cout << "Child " << d_nr << ":" << ++message << ": " << line << '\n'; } } catch (...) { cout << "Child " << d_nr << ":" << ++message << ": " << "select() failed" << '\n'; } } exit(0); }
Monitor
object to obtain
the Child
's process ID and its order number:
inline int Child::pid() const { return Fork::pid(); } inline int Child::nr() const { return d_nr; }
Child
process terminates when the user enters a stop
command. When an existing child process number was entered, the corresponding
Child
object is removed from Monitor
's d_child
map. As a result,
its destructor is called. Child
's destructor calls kill
to terminate
its child, and then waits for the child to terminate. Once its child has
terminated, the destructor has completed its work and returns, thus completing
the erasure from d_child
. The current implementation fails if the child
process doesn't react to the SIGTERM
signal. In this demonstration program
this does not happen. In `real life' more elaborate killing-procedures may be
required (e.g., using SIGKILL
in addition to SIGTERM
). As discussed in
section 9.11 it is important to ensure the proper
destruction. Here is the Child
's destructor:
Child::~Child() { if (pid()) { cout << "Killing process " << pid() << "\n"; kill(pid(), SIGTERM); int status; wait(&status); } }
Some operators appear to be missing: there appear to be no predefined
function objects corresponding to
bitwise operations. However, their
construction is, given the available predefined function objects, not
difficult. The following examples show a
class template implementing a
function object calling the
bitwise and (
operator&
), and a template
class implementing a function object calling the
unary not
(
operator~
). It is left to the reader to construct similar function
objects for other operators.
Here is the implementation of a function object calling the
bitwise
operator&
:
#include <functional> template <typename _Tp> struct bit_and: public std::binary_function<_Tp, _Tp, _Tp> { _Tp operator()(_Tp const &__x, _Tp const &__y) const { return __x & __y; } };
Here is the implementation of a function object calling operator~()
:
#include <functional> template <typename _Tp> struct bit_not: public std::unary_function<_Tp, _Tp> { _Tp operator()(_Tp const &__x) const { return ~__x; } };
These and other missing predefined function objects are also implemented
in the file
bitfunctional
, which is found in the cplusplus.yo.zip
archive. These classes are derived from existing class templates (e.g.,
std::binary_function
and
std::unary_function
). These base classes
define several types which
are expected (used) by various generic algorithms as defined in the STL
(cf. chapter 19), thus following the advice offered in, e.g., the
C++ header file
bits/stl_function.h
:
* The standard functors are derived from structs named unary_function * and binary_function. These two classes contain nothing but typedefs, * to aid in generic (template) programming. If you write your own * functors, you might consider doing the same.
Here is an example using bit_and
, removing all odd numbers from a
vector of int
values:
#include <iostream> #include <algorithm> #include <vector> #include "bitand.h" using namespace std; int main() { vector<int> vi; for (int idx = 0; idx < 10; ++idx) vi.push_back(idx); copy ( vi.begin(), remove_if(vi.begin(), vi.end(), bind2nd(bit_and<int>(), 1)), ostream_iterator<int>(cout, " ") ); cout << '\n'; } /* Generated output: 0 2 4 6 8 */
atoi
,
atol
, and other functions that can be used to convert ASCII-Z
strings
to numeric values. In C++, these functions are still available, but a more
type safe way to convert text to other types uses objects of the class
std::istringsteam
.
Using the class istringstream
instead of the C standard conversion
functions may have the advantage of type-safety, but it also appears to be a
rather cumbersome alternative. After all, we first have to construct and
initialize a std::istringstream
object before we're able to extract a
value of some type from it. This requires us to use a variable. Then, in cases
where the extracted value is only needed to initialize some
function-parameter, one might wonder whether the added variable and the
istringstream
construction can somehow be avoided.
In this section we'll develop a class (
A2x
) preventing all the
disadvantages of the standard C library functions, without requiring the
cumbersome definitions of istringstream
objects over and over
again. The class is called A2x
, meaning `
ascii to anything'.
A2x
objects can be used to extract values of any type extractable from
std::istream
objects. Since A2x
represents the object-variant of the
C functions, it is not only type-safe but also extensible. So its use
is greatly preferred over using the standard C functions. Here are its
characteristics:
A2x
is derived from std::istringstream
, and so all members of
the class istringstream
are available for A2x
objects.
Extractions of values of variables can always
effortlessly be performed. Here's the class's interface:
class A2x: public std::istringstream { public: A2x() = default; A2x(char const *txt); A2x(std::string const &str); template <typename Type> operator Type(); template <typename Type> Type to(); A2x &operator=(char const *txt); A2x &operator=(std::string const &str); A2x &operator=(A2x const &other); };
A2x
has a default constructor and a constructor expecting a
std::string
argument. The latter constructor may be used to initialize
A2x
objects with text to be converted (e.g., a line of text obtained from
reading a configuration file):
inline A2x::A2x(char const *txt) // initialize from text : std::istringstream(txt) {} inline A2x::A2x(std::string const &str) : std::istringstream(str.c_str()) {}
A2x
's real strength comes from its operator Type()
conversion
member template. As it is a member template, it will automatically adapt
itself to the type of the variable that should be given a value, obtained by
converting the text stored inside the A2x
object to the variable's
type. When the extraction fails, A2x
's inherited good
member returns
false
.
A2x.operator int<int>(); // or just: A2x.operator int();As neither syntax looks attractive, the member template
to
is provided too, allowing constructions like:
A2x.to<int>();Here is its implementation:
template <typename Type> inline Type A2x::to() { Type t; return (*this >> t) ? t : Type(); }
to
member makes it easy to implement operator Type()
:
template <typename Type> inline A2x::operator Type() { return to<Type>(); }
A2x
object is available, it may be reinitialized using
operator=
:
#include "a2x.h" A2x &A2x::operator=(char const *txt) { clear(); // very important!!! If a conversion failed, the object // remains useless until executing this statement str(txt); return *this; }
A2x
being used:
int x = A2x("12"); // initialize int x from a string "12" A2x a2x("12.50"); // explicitly create an A2x object double d; d = a2x; // assign a variable using an A2x object cout << d << '\n'; a2x = "err"; d = a2x; // d is 0: the conversion failed, cout << d << '\n'; // and a2x.good() == false a2x = " a"; // reassign a2x to new text char c = a2x; // c now 'a': internally operator>>() is used cout << c << '\n'; // so initial blanks are skipped. int expectsInt(int x); // initialize a parameter using an expectsInt(A2x("1200")); // anonymous A2x object d = A2x("12.45").to<int>(); // d is 12, not 12.45 cout << d << '\n';
A complementary class (
X2a
), converting values to text, can be
constructed as well. Its construction is left as an exercise to the reader.
operator[]
is that it can't distinguish between its
use as an lvalue and as an rvalue. It is a familiar misconception to
think that since
Type const &operator[](size_t index) constis used as rvalue (since the object isn't modified) that that
Type &operator[](size_t index)is used as lvalue (since the returned value can be modified). In fact, the compiler distinguishes between the two operators only by the
const
-status of the object for which operator[]
is called. With
const
objects the former operator is called, with non-const
objects
the latter is always used. It is always used, irrespective of it being used as
lvalue or rvalue.
Being able to distinguish between lvalues and rvalues can be very
useful. Consider the situation where a class supporting operator[]
stores
data of a type that is very hard to copy. With data like that reference
counting (e.g., using shared_ptr
s) is probably used to prevent needless
copying.
As long as operator[]
is used as rvalue there's no need to copy the data,
but the information must be copied if it is used as lvalue.
The Proxy Design Pattern (cf. Gamma et al. (1995)) can be used to distinguish between lvalues and rvalues. With the Proxy Design Pattern an object of another class (the Proxy class) is used to acts as a stand in for the `real thing'. The proxy class offers functionality that cannot be offered by the data themselves, like distinguishing between its use as lvalue or rvalue. A proxy class can be used in many situations where access to the real date cannot or should not be directly provided. In this regard iterator types are examples of proxy classes as they create a layer between the real data and the software using the data. Proxy classes could also dereference pointers in a class storing its data by pointers.
In this section we concentrate on the distinction between using operator[]
as lvalue and rvalue. Let's assume we have a class Lines
storing lines
from a file. It's constructor expects the name of a stream from which the
lines are read and it offers a non-const operator[]
that can be used as
lvalue or rvalue (the const
version of operator[]
is omitted as it
offers no problem because it is always used as rvalue):
class Lines { std::vector<std::string> d_line; public: Lines(std::istream &in); std::std::string &operator[](size_t idx); };
To distinguish between lvalues and rvalues we must find distinguishing
characteristics of lvalues and rvalues that we can exploit. Such
distinguishing characteristics are operator=
(which is always used as
lvalue) and the conversion operator (which is always used as rvalue). Rather
than having operator[]
return a string &
we can let it return a
Proxy
object that is able to distinguish between its use as lvalue
and rvalue.
The class Proxy
thus needs operator=(string const &other)
(acting as
lvalue) and operator std::string const &() const
(acting as rvalue). Do we
need more operators? The std::string
class also offers operator+=
, so
we should probably implement that operator as well. Plain characters can also
be assigned to string
objects (even using their numeric values). As
string
objects cannot be constructed from plain characters
promotion cannot be used with operator=(string const &other)
if the
right-hand side argument is a character. Implementing operator=(char
value)
could therefore also be considered. These additional operators are
left out of the current implementation but `real life' proxy classes should
consider implementing these additional operators as well. Another subtlety is
that Proxy
's operator std::string const &() const
will not be used
when using ostream
's insertion operator or istream
's extraction
operator as these operators are implemented as templates not recognizing our
Proxy
class type. So when stream insertion and extraction is required (it
probably is) then Proxy
must be given its own overloaded insertion and
extraction operator. Here is an implementation of the overloaded insertion
operator inserting the object for which Proxy
is a stand-in:
inline std::ostream &operator<<(std::ostream &out, Lines::Proxy const &proxy) { return out << static_cast<std::string const &>(proxy); }
There's no need for any code (except Lines
) to create or copy Proxy
objects. Proxy
's constructor should therefore be made private, and
Proxy
can declare Lines
to be its friend. In fact, Proxy
is
intimately related to Lines
and can be defined as a nested class. In the
revised Lines
class operator[]
no longer returns a string
but
instead a Proxy
is returned. Here is the revised Lines
class,
including its nested Proxy
class:
class Lines { std::vector<std::string> d_line; public: class Proxy; Proxy operator[](size_t idx); class Proxy { friend Proxy Lines::operator[](size_t idx); std::string &d_str; Proxy(std::string &str); public: std::string &operator=(std::string const &rhs); operator std::string const &() const; }; Lines(std::istream &in); };
Proxy
's members are very lightweight and can usually be implemented
inline:
inline Lines::Proxy::Proxy(std::string &str) : d_str(str) {} inline std::string &Lines::Proxy::operator=(std::string const &rhs) { return d_str = rhs; } inline Lines::Proxy::operator std::string const &() const { return d_str; }
The member Lines::operator[]
can also be implemented inline: it merely
returns a Proxy
object initialized with the idx
+sup(th) string
.
Now that the class Proxy
has been developed it can be used in a
program. Here is an example using the Proxy
object as lvalue or rvalue. On
the surface Lines
objects won't behave differently from Lines
objects
using the original implementation, but adding an identifying cout
statement to Proxy
's members will show that operator[]
will behave
differently when used as lvalue or as rvalue:
int main() { ifstream in("lines.cc"); Lines lines(in); string s = lines[0]; // rvalue use lines[0] = s; // lvalue use cout << lines[0] << '\n'; // rvalue use lines[0] = "hello world"; // lvalue use cout << lines[0] << '\n'; // rvalue use }
An object of this nested iterator class handles the dereferencing of the pointers stored in the vector. This allowed us to sort the strings pointed to by the vector's elements rather than the pointers.
A drawback of this is that the class implementing the iterator is closely tied to the derived class as the iterator class was implemented as a nested class. What if we would like to provide any class derived from a container class storing pointers with an iterator handling pointer-dereferencing?
In this section a variant of the earlier (nested class) approach is discussed. Here the iterator class is defined as a class template, not only parameterizing the data type to which the container's elements point but also the container's iterator type itself. Once again, we will concentrate on developing a RandomIterator as it is the most complex iterator type.
Our class is named RandomPtrIterator
, indicating that it is a random
iterator operating on pointer values. The class template defines three
template type parameters:
Class
). Like before, RandomPtrIterator
's
constructor is private. Therefore friend
declarations are needed to
allow client classes to construct RandomPtrIterators
. However, a
friend class Class
cannot be used as template parameter types cannot be
used in friend class ...
declarations. But this is a minor problem as not
every member of the client class needs to construct iterators. In fact, only
Class
's begin
and end
members must construct
iterators. Using the template's first parameter, friend declarations can be
specified for the client's begin
and end
members.
BaseIterator
);
Type
).
RandomPtrIterator
has one private data member, a
BaseIterator
. Here is the class interface and the constructor's
implementation:
#include <iterator> template <typename Class, typename BaseIterator, typename Type> class RandomPtrIterator: public std::iterator<std::random_access_iterator_tag, Type> { friend RandomPtrIterator<Class, BaseIterator, Type> Class::begin(); friend RandomPtrIterator<Class, BaseIterator, Type> Class::end(); BaseIterator d_current; RandomPtrIterator(BaseIterator const ¤t); public: bool operator!=(RandomPtrIterator const &other) const; int operator-(RandomPtrIterator const &rhs) const; RandomPtrIterator const operator+(int step) const; Type &operator*() const; bool operator<(RandomPtrIterator const &other) const; RandomPtrIterator &operator--(); RandomPtrIterator const operator--(int); RandomPtrIterator &operator++(); RandomPtrIterator const operator++(int); bool operator==(RandomPtrIterator const &other) const; RandomPtrIterator const operator-(int step) const; RandomPtrIterator &operator-=(int step); RandomPtrIterator &operator+=(int step); Type *operator->() const; }; template <typename Class, typename BaseIterator, typename Type> RandomPtrIterator<Class, BaseIterator, Type>::RandomPtrIterator( BaseIterator const ¤t) : d_current(current) {}
Looking at its friend
declarations, we see that the members begin
and end
of a class Class
, returning a RandomPtrIterator
object for
the types Class, BaseIterator
and Type
are granted access to
RandomPtrIterator
's private constructor. That is exactly what we
want. Begin
and end
are declared as bound friends.
All RandomPtrIterator
's remaining members are public. Since
RandomPtrIterator
is just a generalization of the nested class
iterator
developed in section 21.12.1, re-implementing the required
member functions is easy and only requires us to change iterator
into
RandomPtrIterator
and to change std::string
into Type
. For
example, operator<
, defined in the class iterator
as
inline bool StringPtr::iterator::operator<(iterator const &other) const { return **d_current < **other.d_current; }
is now implemented as:
template <typename Class, typename BaseIterator, typename Type> bool RandomPtrIterator<Class, BaseIterator, Type>::operator<( RandomPtrIterator const &other) const { return **d_current < **other.d_current; }
Some additional examples: operator*
, defined in the class
iterator
as
inline std::string &StringPtr::iterator::operator*() const { return **d_current; }
is now implemented as:
template <typename Class, typename BaseIterator, typename Type> Type &RandomPtrIterator<Class, BaseIterator, Type>::operator*() const { return **d_current; }
The pre- and postfix increment operators are now implemented as:
template <typename Class, typename BaseIterator, typename Type> RandomPtrIterator<Class, BaseIterator, Type> &RandomPtrIterator<Class, BaseIterator, Type>::operator++() { ++d_current; return *this; } template <typename Class, typename BaseIterator, typename Type> RandomPtrIterator<Class, BaseIterator, Type> const RandomPtrIterator<Class, BaseIterator, Type>::operator++(int) { return RandomPtrIterator(d_current++); }
Remaining members can be implemented accordingly, their actual
implementations are left as exercises to the reader (or can be obtained from
the cplusplus.yo.zip
archive, of course).
Re-implementing the class StringPtr
developed in section 21.12.1
is not difficult either. Apart from including the header file defining the
class template RandomPtrIterator
, it only requires a single modification.
Its iterator
typedef must now be associated with a
RandomPtrIterator
. Here is the full class interface and the class's inline
member definitions:
#ifndef INCLUDED_STRINGPTR_H_ #define INCLUDED_STRINGPTR_H_ #include <vector> #include <string> #include "iterator.h" class StringPtr: public std::vector<std::string *> { public: typedef RandomPtrIterator < StringPtr, std::vector<std::string *>::iterator, std::string > iterator; typedef std::reverse_iterator<iterator> reverse_iterator; iterator begin(); iterator end(); reverse_iterator rbegin(); reverse_iterator rend(); }; inline StringPtr::iterator StringPtr::begin() { return iterator(this->std::vector<std::string *>::begin() ); } inline StringPtr::iterator StringPtr::end() { return iterator(this->std::vector<std::string *>::end()); } inline StringPtr::reverse_iterator StringPtr::rbegin() { return reverse_iterator(end()); } inline StringPtr::reverse_iterator StringPtr::rend() { return reverse_iterator(begin()); } #endif
Including StringPtr
's modified header file into the program given in
section 21.12.2 results in a program behaving identically to its
earlier version. In this case StringPtr::begin
and StringPtr::end
return iterator objects constructed from a template definition.
The current example assumes that the reader knows how to use the
scanner generator
flex
and the
parser generator
bison
. Both
bison
and flex
are well documented elsewhere. The original
predecessors of bison
and flex
, called
yacc
and
lex
are
described in several books, e.g. in
O'Reilly's book `lex & yacc'.
Scanner- and parser generators are also available as free software. Both
bison
and flex
are usually part of software distributions or they can
be obtained from
ftp://prep.ai.mit.edu/pub/non-gnu. Flex
creates a C++
class
when
%option c++
is specified.
For parser generators the program
bison
is available. In the early 90's
Alain Coetmeur (coetmeur@icdc.fr) created a
C++ variant (
bison++
) creating a parser class. Although the
bison++
program produces code that can be used in C++ programs it also
shows many characteristics that are more suggestive of a C context than a
C++ context. In January 2005 I rewrote parts of Alain's bison++
program, resulting in the original version of the program
bisonc++. Then,
in May 2005 a complete rewrite of the bisonc++
parser generator was
completed (version number 0.98). Current versions of bisonc++
can be
downloaded from
http://bisoncpp.sourceforge.net/, where it is available as source
archive and as binary (i386) Debian package
(including bisonc++
's documentation).
Bisonc++
creates a cleaner parser class than bison++
. In particular,
it derives the parser class from a base-class, containing the parser's token-
and type-definitions as well as all member functions which should not be
(re)defined by the programmer. As a result of this approach, the generated
parser class is very small, declaring only members that are actually defined
by the programmer (as well as some other members, generated by bisonc++
itself, implementing the parser's
parse()
member). One member that is
not implemented by default is lex
, producing the next lexical
token. When the directive %scanner
(see section 23.8.2.1) is used,
bisonc++
produces a standard implementation for this member; otherwise it
must be implemented by the programmer.
This section of the C++ Annotations focuses on bisonc++
as our
parser generator.
Using flex
and bisonc++
class
-based scanners and parsers can be
generated. The advantage of this approach is that the interface to the scanner
and the parser tends to become cleaner than without using the class
interface. Furthermore, classes allow us to get rid of most if not all global
variables, making it easy to use multiple parsers in one program.
Below two examples programs are developed. The first example only uses
flex
. The generated scanner monitors the production of a file from
several parts. That example focuses on the lexical scanner and on switching
files while churning through the information. The second example uses both
flex
and bisonc++
to generate a scanner and a parser transforming
standard arithmetic expressions to their postfix notations, commonly used in
code generated by compilers and in HP
-calculators. In the second example
the emphasis is mainly on bisonc++
and on composing a scanner object
inside a generated parser.
#include
directive, followed by a text
string specifying the file (path) which should be included at the location of
the #include
.
In order to avoid complexities irrelevant to the current example, the format
of the #include
statement is restricted to the form #include
<filepath>
. The file specified between the pointed brackets should be
available at the location indicated by filepath
. If the file is not
available, the program terminates after issuing an error message.
The program is started with one or two filename arguments. If the program is
started with just one filename argument, the output is written to the
standard output stream cout
. Otherwise, the output is written to
the stream whose name is given as the program's second argument.
The program defines a maximum nesting depth. Once this maximum is exceeded, the program terminates after issuing an error message. In that case, the filename stack indicating where which file was included is printed.
An additional feature of the program is that (standard C++) comment-lines are ignored. Include-directives in comment-lines are also ignored.
The program is created in five major steps:
lexer
is constructed, containing the
input-language specifications.
lexer
the requirements for the
class Scanner
evolve. The Scanner
class is a wrapper class around the
class
yyFlexLexer
generated by
flex
. The requirements result in the
interface of the class Scanner
.
main
is constructed. A Scanner
object is created
inspecting the command-line arguments. If successful, the scanner's member
yylex
is called to produce the program's output.
yyFlexLexer
. However, we of course want to use the derived
class's members in this code. This causes a small problem. How does a
base-class member know about members of classes derived from it?
Inheritance helps us to overcome this problem. In the specification of
the class yyFlexLexer
, we notice that the function
yylex
is a
virtual function. The header file FlexLexer.h
declares the
virtual
member int yylex
:
class yyFlexLexer: public FlexLexer { public: yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 ); virtual ~yyFlexLexer(); void yy_switch_to_buffer( struct yy_buffer_state* new_buffer ); struct yy_buffer_state* yy_create_buffer( istream* s, int size ); void yy_delete_buffer( struct yy_buffer_state* b ); void yyrestart( istream* s ); virtual int yylex(); virtual void switch_streams( istream* new_in, ostream* new_out ); };As this function is virtual it can be overridden by a derived class. In that case the overridden function will be called from its base class (i.e.,
yyFlexLexer
) code. Since the derived class's
yylex()
is called, it will now have access to the members of the derived
class, and also to the public and protected members of its base class.
By default, the context in which the generated scanner is placed is the
function
yyFlexLexer::yylex
. This context changes if we use a
derived class, e.g., Scanner
. To derive Scanner
from yyFlexLexer
,
generated by flex
, do as follows:
yylex
must be declared in the derived class
Scanner
.
flex
about the derived
class's name.
Looking at the regular expressions themselves, notice that we need rules
to recognize comment,
#include
directives, and all remaining characters.
This is all fairly standard practice. When an #include
directive is
sensed, the directive is parsed by the scanner. This too is common
practice. Here is what our lexical scanner will do:
Scanner::Error
value (invalidInclude
) if this
fails;
nextSource
;
#include
directive has been processed, pushSource
is
called to perform the switch to another file;
EOF
) is reached, the derived class's
member function popSource
is called, popping the previously
pushed file and returning
true
;
popSource
returns
false
,
resulting in calling yyterminate
, terminating the scanner.
The
lexical scanner specification file is organized similarly as the
one used for flex
in C contexts. However, in C++ contexts,
flex
may create a class (yyFlexLexer
) from which another class (e.g.,
Scanner
) can be derived. Flex's specification file itself has three
sections:
scanner.h
, in turn including
FlexLexer.h
, which is part of the flex
distribution. FlexLexer.h
has a peculiar setup, due to which it should not be read twice by the code
generated by flex
. So, we now have the following situation:
scanner.ih
. The class Scanner
is declared in
scanner.h
, which is read by scanner.ih
. Therefore Scanner
's
members are known and can be called from the code associated with the regular
expressions defined in the lexer specification file.
scanner.h
, defining class Scanner
, the header
file FlexLexer.h
, declaring Scanner
's base class, must have been
read by the compiler before the class Scanner
itself is defined.
flex
already includes FlexLexer.h
, and
as mentioned, FlexLexer.h
may not be read twice. Unfortunately, flex
also inserts the specification file's preamble into the code it generates.
scanner.ih
, and so
scanner.h
, and so FlexLexer.h
, we now do include FlexLexer.h
twice in code generated by flex
. This must be prevented.
FlexLexer.h
can be prevented:
scanner.ih
includes scanner.h
, scanner.h
itself is modified such that it includes FlexLexer.h
, unless the C
preprocesser variable
SKIP_FLEXLEXER_
is defined.
flex
' specification file SKIP_FLEXLEXER_
is defined
just prior to including scanner.ih
.
flex
now re-includes
FlexLexer.h
. At the same time the compilation of Scanner
's members
proceeds independently of the lexer specification file's preamble, so here
FlexLexer.h
is properly included too. Here is the specification files'
preamble:
%{ #define _SKIP_YYFLEXLEXER_ #include "scanner.ih" %}
flex
symbol area, used to define symbols, like a
mini scanner, or
options. The following options are suggested:
%option 8bit
: allowing the generated lexical scanner to
read 8-bit characters (rather than 7-bit, which is the default).
%option c++
: this results in flex
generating C++
code.
%option debug
: includes debugging
code into the code generated by
flex
. Calling the member function
set_debug(true)
activates this debugging code at run-time. When activated,
information about which rules are matched is written to the standard error
stream. To suppress the execution of debug code the member function
set_debug(false)
may be called.
%option noyywrap
: when the scanner reaches the end of file,
it will (by default) call a function yywrap
which may perform the switch
to another file. Calling this function is suppressed when %option noyywrap
is specified. Since there exist alternatives which render this function
superfluous (see below), it is suggested to specify this option as well.
%option outfile
="yylex.cc"
: this defines yylex.cc
as
the name of the generated C++ source file.
%option warn
: this option is strongly suggested by the
flex
documentation, so it's mentioned here as well. See flex
'
documentation for details.
%option yyclass
="Scanner"
: this defines Scanner
as
the name of the class derived from yyFlexLexer
.
%option yylineno
: this option causes the lexical scanner to
keep track of the line numbers of the files it is scanning. When processing
nested files the variable yylineno
is not automatically reset to the last
line number of a file when yylex
returns to a partially processed file. In
those cases, yylineno
must explicitly be reset to a former
value. If specified, the current line number is returned by the public member
lineno
, returning an int
.
%option yyclass="Scanner" outfile="yylex.cc" c++ 8bit warn noyywrap yylineno %option debug %x comment %x include eolnComment "//".* anyChar .|\n
istream *yyin
to the
ostream *yyout
. For this the predefined macro
ECHO
can be used. Here is the used symbol area:
%% /* The comment-rules: comment lines are ignored. */ {eolnComment} "/*" BEGIN comment; <comment>{anyChar} <comment>"*/" BEGIN INITIAL; /* File switching: #include <filepath> */ #include[ \t]+"<" BEGIN include; <include>[^ \t>]+ d_nextSource = yytext; <include>">"[ \t]*\n { BEGIN INITIAL; pushSource(YY_CURRENT_BUFFER, YY_BUF_SIZE); } <include>{anyChar} throw invalidInclude; /* The default rules: eating all the rest, echoing it to output */ {anyChar} ECHO; /* The <<EOF>> rule: pop a pushed file, or terminate the lexer */ <<EOF>> { if (!popSource(YY_CURRENT_BUFFER)) yyterminate(); } %%
yyFlexLexer
's data members of are
protected, and
thus accessible to derived classes), most processing can be left to the
derived class's member functions. This results in a very clean setup of the
lexer specification file, requiring no or hardly any code in the preamble.
class Scanner
, derived as usual from the class
yyFlexLexer
, is
generated by
flex. The derived class has access to data
controlled by the lexical scanner. Specifically, it has access to the
following members:
char *yytext
, containing the text
matched by a
regular expression. Clients may access this information using
the scanner's
YYText
member;
int yyleng
, the length of the
text in yytext
. Clients may access this value using the scanner's
YYLeng
member;
int yylineno
: the current line number. This
variable is only maintained if
%option yylineno
is specified. Clients
may access this value using the scanner's
lineno
member.
FlexLexer.h
.
Objects of the class Scanner
perform two tasks:
EOF
is detected in a file.
Scanner
:
FlexLexer.h
, its class
opening, and its
private data. At the top of the class interface the private struct
FileInfo
is defined. FileInfo
is used to store the names and pointers
to open files. The struct has two constructors. One accepts a
filename, the other also expecting a bool
argument indicating that the
file is already open and should not be handled by FileInfo
. This former
constructor is used only once. As the initial stream is an already open file
there is no need to open it again and so Scanner
's constructor will use
this constructor to store the name of the initial file only. Scanner
's
public section starts off by defining the enum Error
defining various
symbolic constants for errors that may be detected:
#if ! defined(_SKIP_YYFLEXLEXER_) #include <FlexLexer.h> #endif class Scanner: public yyFlexLexer { struct FileInfo { std::string d_name; std::ifstream *d_in; FileInfo(std::string name) : d_name(name), d_in(new std::ifstream(name.c_str())) {} FileInfo(std::string name, bool) : d_name(name), d_in(0) {} // inline bool operator==(FileInfo const &rhs) const // { // return d_name == rhs.d_name; // } }; friend bool operator==(FileInfo const &fi, std::string const &name); std::stack<yy_buffer_state *> d_state; std::vector<FileInfo> d_fileInfo; std::string d_nextSource; static size_t const s_maxDepth = 10; public: enum Error { invalidInclude, circularInclusion, nestingTooDeep, cantRead, };
Scanner
's constructor. It activates the initial input
(and output) file and pushes the name of the initial input file on the file
stack, using the second FileInfo
constructor. Here is its implementation:
#include "scanner.ih" Scanner::Scanner(istream *yyin, string const &initialName) { switch_streams(yyin, yyout); d_fileInfo.push_back(FileInfo(initialName, false)); }
#include
directive, a
switch to another file is performed by pushSource
. If the filename could
not be extracted, the scanner throws an invalidInclude
exception. The
pushSource
member and the matching function popSource
handle file
switching. Switching to another file proceeds like this:
include
-nesting is inspected.
If s_maxDepth
is reached, the stack is considered full, and the scanner
throws a nestingTooDeep
exception.
throwOnCircularInclusion
is called to avoid circular
inclusions when switching to new files. This function throws an exception if a
filename is included twice using a simple literal name check. Here is its
implementation:
#include "scanner.ih" inline bool operator==(Scanner::FileInfo const &fi, string const &name) { return fi.d_name == name; } void Scanner::throwOnCircularInclusion() { vector<FileInfo>::iterator it = find(d_fileInfo.begin(), d_fileInfo.end(), d_nextSource); if (it != d_fileInfo.end()) throw circularInclusion; }
FileInfo
vector, at the
same time creating a new ifstream
object. If this fails, the scanner
throws a cantRead
exception.
yy_buffer_state
is created for the newly
opened stream, and the lexical scanner is instructed to switch to that stream
using yyFlexLexer
's member function
yy_switch_to_buffer
.
pushSource
's implementation:
#include "scanner.ih" void Scanner::pushSource(yy_buffer_state *current, size_t size) { if (d_state.size() == s_maxDepth) throw nestingTooDeep; throwOnCircularInclusion(); d_fileInfo.push_back(FileInfo(d_nextSource)); ifstream *newStream = d_fileInfo.back().d_in; if (!*newStream) throw cantRead; d_state.push(current); yy_switch_to_buffer(yy_create_buffer(newStream, size)); }
yyFlexLexer
provides a series of member functions that
can be used to switch files. The file-switching capability of a
yyFlexLexer
object is founded on the struct yy_buffer_state
,
containing the state of the
scan-buffer of the currently read file. This
buffer is pushed on the d_state
stack when an #include
is
encountered. Then yy_buffer_state
's contents are replaced by the buffer
created for the file to be processed next. In the flex
specification file the function pushSource
is called as
pushSource(YY_CURRENT_BUFFER, YY_BUF_SIZE);
YY_CURRENT_BUFFER
and
YY_BUF_SIZE
are macros that are only
available in the rules section of the lexer specification file, so they must
be passed as arguments to pushSource
. It is not possible
to use these macros in the Scanner
class's member functions directly.
yylineno
is not updated when a file switch is
performed. If line numbers are to be monitored, then the current value of
yylineno
should be pushed on a stack and yylineno
should be reset by
pushSource
. Correspondingly, popSource
should reinstate a former value
of yylineno
by popping a previously pushed value from the
stack. Scanner
's current implementation maintains a stack of
yy_buffer_state
pointers. Changing that into a stack of
pair<yy_buffer_state *, size_t>
elements allows us to save (and restore)
line numbers as well. This modification is left as an
exercise to the
reader.
popSource
is called to pop the previously
pushed buffer off the stack. This allows the scanner to continue its scanning
process just beyond the just completed #include
directive. The member
popSource
first inspects the size of the d_state
stack. If it is
empty, false
is returned and the function terminates. If it isn't empty,
then the current buffer is deleted and it is replaced by the state waiting on
top of the stack. The file switch is performed by the yyFlexLexer
members
yy_delete_buffer
and yy_switch_to_buffer
. The yy_delete_buffer
function does not close the ifstream
and does not delete the
memory allocated for this stream by pushSource
. Therefore delete
is called for the ifstream
pointer stored at the back of d_fileInfo
to
take care of both. Following this the last FileInfo
entry is removed from
d_fileInfo
. Finally the function returns true
:
#include "scanner.ih" bool Scanner::popSource(yy_buffer_state *current) { if (d_state.empty()) return false; yy_delete_buffer(current); yy_switch_to_buffer(d_state.top()); d_state.pop(); delete d_fileInfo.back().d_in; // closes the stream as well d_fileInfo.pop_back(); return true; }
stackTrace
dumps the names of
the currently pushed files to the standard error stream. It may be called by
exception catchers. Here is its implementation:
#include "scanner.ih" void Scanner::stackTrace() { for (size_t idx = 0; idx < d_fileInfo.size() - 1; ++idx) cerr << idx << ": " << d_fileInfo[idx].d_name << " included " << d_fileInfo[idx + 1].d_name << '\n'; }
lastFile
returns the name of the currently processed file. It
may be implemented inline:
inline std::string const &Scanner::lastFile() { return d_fileInfo.back().d_name; }
Scanner::yylex
.
Therefore, int yylex
must be declared by the class Scanner
, as it
overrides FlexLexer
's virtual member yylex
.
Scanner
is very simple. It expects a filename
indicating where to start the scanning process. It first checks the number of
arguments. If at least one argument was given, then an ifstream
object is created. If this object can be created, then a Scanner
object is
constructed, receiving the address of the ifstream
object and the name of
the initial input file as its arguments. Then the Scanner
object's
yylex
member is called. The scanner object throws Scanner::Error
exceptions if it fails to perform its tasks properly. These exceptions are
caught near main
's end. Here is the program's source:
#include "lexer.h" using namespace std; int main(int argc, char **argv) try { if (argc == 1) { cerr << "Filename argument required\n"; return 1; } ifstream yyin(argv[1]); if (!yyin) { cerr << "Can't read " << argv[1] << '\n'; return 1; } Scanner scanner(&yyin, argv[1]); try { return scanner.yylex(); } catch (Scanner::Error err) { char const *msg[] = { "Include specification", "Circular Include", "Nesting", "Read", }; cerr << msg[err] << " error in " << scanner.lastFile() << ", line " << scanner.lineno() << '\n'; scanner.stackTrace(); return 1; } }
flex
and the
Gnu C++ compiler
g++
have
been installed:
flex
. For
this the following command can be given:
flex lexer
yywrap()
function is used, the
libfl.a
library should be
linked against the final program. Normally, that's not required, and the
program can be constructed as, e.g.:
g++ -o lexer *.cc
%option debug
was
specified, debugging code is included in the generated scanner. To obtain
debugging info, this code must also be activated. Assuming the scanner object
is called scanner
, then the statement
scanner.set_debug(true);will produce debugging info written to the standard error stream.
Starting point when developing programs that use both parsers and scanners is the grammar. The grammar defines a set of tokens that can be returned by the lexical scanner (called the scanner below).
Finally, auxiliary code is provided to `fill in the blanks': the actions performed by the parser and by the scanner are not normally specified literally in the grammar rules or lexical regular expressions, but should be implemented in member functions, called from the parser's rules or which are associated with the scanner's regular expressions.
In the previous section we've seen an example of a C++ class generated by
flex
. In the current section we concentrate on the parser. The parser can
be generated from a grammar specification file, processed by the program
bisonc++
. The grammar specification file required by bisonc++
is
similar to the file processed by
bison
(or by bison
's successor (and
bisonc++
's predecessor)
bison++
, written in the early nineties by the
Frenchman
Alain Coetmeur).
In this section a program is developed converting
infix expressions, where binary operators are written between their
operands, to postfix expressions, where operators are written behind their
operands. Also, the unary operator -
is converted from its prefix notation
to a postfix form. The unary +
operator is ignored as it requires no
further actions. In essence our little calculator is a micro compiler,
transforming numeric expressions into assembly-like instructions.
Our calculator will recognize a very basic set of operators: multiplication, addition, parentheses, and the unary minus. We'll distinguish real numbers from integers, to illustrate a subtlety in bison-like grammar specifications. That's all. The purpose of this section is, after all, to illustrate the construction of a C++ program that uses both a parser and a lexical scanner, rather than to construct a full-fledged calculator.
In the coming sections we'll develop the grammar specification for
bisonc++
. Then, the regular expressions for the scanner are
specified. Following that, the final program is constructed.
bisonc++
is comparable to
the specification file required by
bison
. Differences are related to the
class nature of the resulting parser. Our calculator distinguishes real
numbers from integers, and supports a basic set of arithmetic operators.
Bisonc++
should be used as follows:
bisonc++
this is no
different, and bisonc++
grammar definitions are for all practical
purposes identical to bison
's grammar definitions.
bisonc++
can generate files defining the parser class and
the implementation of the member function parse
.
parse
) must be
separately implemented. Of course, they should also be
declared in the parser class's header. At the very least the member
lex
must be implemented. This member is called by parse
to
obtain the next available token. However, bisonc++
offers a
facility providing a standard implementation of the function
lex
. The member function
error(char const *msg)
is given a simple default implementation that may be modified by the
programmer. The member function error
is called when parse
detects (syntactic) errors.
int main() { Parser parser; return parser.parse(); }
The bisonc++
specification file has two
sections:
bisonc++
also supports several new declarations. These new declarations are
important and are discussed below.
bison
,
albeit that some members that were available in bison
and
bison++
are obsolete in bisonc++
, while other members can be
used in a wider context. For example, ACCEPT and ABORT can be
called from any member called from the parser's action blocks to
terminate the parsing process.
bison
will note that there is no
header section anymore. Header sections are used by bison to provide
for the necessary declarations allowing the compiler to compile the C
function generated by bison
. In C++ declarations are part of or
already used by class definitions. Therefore, a parser generator generating a
C++ class and some of its member functions does not require a header
section anymore.
bisonc++
are discussed here. The
reader is referred to bisonc++
's man-page for a full description.
header
base.h
.
header
header
as the pathname to the file pre-included in the
parser's base-class header. This declaration is useful in
situations where the base class header file refers to types which
might not yet be known. E.g., with
%union
a std::string *
field might be used. Since the class std::string
might not yet
be known to the compiler once it processes the base class header
file we need a way to inform the compiler about these classes and
types. The suggested procedure is to use a pre-include header file
declaring the required types. By default header
will be
surrounded by double quotes (using, e.g., #include "header"
).
When the argument is surrounded by angle brackets #include
<header>
will be included. In the latter case, quotes might be
required to escape interpretation by the shell (e.g., using -H
'<header>'
).
header
.h
parser-class-name
%name
declaration previously used by bison++
. It
defines the name of the C++ class that will be
generated. Contrary to bison++'s %name
declaration,
%class-name
may appear anywhere in the first section of the
grammar specification file. It may be defined only once. If no
%class-name
is specified the default class name Parser
will
be used.
parse
and its support functions with debugging code,
showing the actual parsing process on the standard output
stream. When included, the debugging output is active by default,
but its activity may be controlled using the setDebug(bool
on-off)
member. Note that no #ifdef DEBUG
macros are used
anymore. By rerunning bisonc++
without the --debug
option
an equivalent parser is generated not containing the debugging
code.
header
header
.ih
. The implementation header should
contain all directives and declarations only used by the
implementations of the parser's member functions. It is the only
header file that is included by the source file containing
parse
's implementation. It is suggested that user defined
implementations of other class members use the same convention,
thus concentrating all directives and declarations that are
required for the compilation of other source files belonging to the
parser class in one header file.
source
parse
. Defaults to parse.cc
.
header
header
as the pathname to the file pre-included in the
parser's class header. This file should define a class
Scanner
, offering a member int yylex()
producing the next
token from the input stream to be analyzed by the parser generated
by bisonc++
. When this option is used the parser's member
int lex()
will be predefined as (assuming the parser class
name is Parser
):
inline int Parser::lex() { return d_scanner.yylex(); }and an object
Scanner d_scanner
will be composed into the
parser. The d_scanner
object will be constructed using its
default constructor. If another constructor is required, the
parser class may be provided with an appropriate (overloaded)
parser constructor after having constructed the default parser
class header file using bisonc++
. By default header
will
be surrounded by double quotes (using, e.g., #include
"header"
). When the argument is surrounded by angle brackets
#include <header>
will be included.
typename
typename
should be the name of an unstructured type (e.g.,
size_t
). By default it is int
. See YYSTYPE
in
bison
. It should not be used if a %union
specification is
used. Within the parser class, this type may be used as
STYPE
.
union-definition
bison
declaration. As with bison
this generates a union for the parser's semantic type. The union
type is named STYPE
. If no %union
is declared, a simple
stack-type may be defined using the %stype
declaration. If no
%stype
declaration is used, the default stacktype (int
) is
used.
%union
declaration is:
%union { int i; double d; };In pre-C++0x code a union cannot contain objects as its fields, as constructors cannot be called when a union is created. This means that a
string
cannot be a member of the
union. A string *
, however, is a possible union member. In time C++0x
will offer unrestricted unions (cf. section 7.11) allowing class
type objects to become fields in union definitions, but these unrestricted
unions have not yet been implemented by compilers.
As an aside: the scanner does not have to know about such a union. It
can simply pass its scanned text to the parser through its
YYText
member
function. For example using a statement like
$$.i = A2x(scanner.YYText());matched text may be converted to a value of an appropriate type.
Tokens and non-terminals can be associated with union fields. This is
strongly advised, as it prevents type mismatches, since the compiler will be
able to check for type correctness. At the same time, the bison specific
variabels $$
, $1
, $2
, etc. may be used, rather than the full field
specification (like $$.i
). A non-terminal or a token may be associated
with a union field using the
<fieldname>
specification. E.g.,
%token <i> INT // token association (deprecated, see below) <d> DOUBLE %type <i> intExpr // non-terminal associationIn the example developed here, both the tokens and the non-terminals can be associated with a field of the union. However, as noted before, the scanner does not have to know about all this. In our opinion, it is cleaner to let the scanner do just one thing: scan texts. The parser, knowing what the input is all about, may then convert strings like
"123"
to an integer
value. Consequently, the association of a union field and a token is
discouraged.
When describing the rules of the grammar this will be
illustrated further.
In the %union
discussion the
%token
and
%type
specifications
should be noted. They are used to specify the tokens (terminal symbols) that
can be returned by the scanner, and to specify the return types of
non-terminals. Apart from %token
the token declarators
%left
,
%right
, and
%nonassoc
can be used to specify the associativity of
operators. The tokens mentioned at these indicators are interpreted as tokens
indicating operators, associating in the indicated direction. The precedence
of operators is defined by their order: the first specification has the lowest
priority. To overrule a certain precedence in a certain context
%prec
can
be used. As all this is standard bisonc++
practice, it isn't further
elaborated here. The documentation provided with bisonc++
's distribution
should be consulted for further reference.
Here is the specification of the calculator's declaration section:
%filenames parser %scanner ../scanner/scanner.h %lines %union { int i; double d; }; %token INT DOUBLE %type <i> intExpr %type <d> doubleExpr %left '+' %left '*' %right UnaryMinus
In the declaration section %type
specifiers are used, associating the
intExpr
rule's value (see the next section) to the i
-field of the
semantic-value union, and associating doubleExpr
's value to the
d
-field. At first sight this may look complex, since the expression rules
must be included for each individual return type. On the other hand, if the
union itself would have been used, we would still have had to specify
somewhere in the returned semantic values what field to use: less rules, but
more complex and error-prone code.
bisonc++
. In particular, note
that no action block requires more than a single line of code. This keeps the
grammar simple, and therefore enhances its readability and
understandability. Even the rule defining the parser's proper termination (the
empty line in the line
rule) uses a single member function called
done
. The implementation of that function is simple, but it is worth while
noting that it calls Parser::ACCEPT, showing that ACCEPT can be called
indirectly from a production rule's action block. Here are the grammar's
production rules:
lines: lines line | line ; line: intExpr '\n' { display($1); } | doubleExpr '\n' { display($1); } | '\n' { done(); } | error '\n' { reset(); } ; intExpr: intExpr '*' intExpr { $$ = exec('*', $1, $3); } | intExpr '+' intExpr { $$ = exec('+', $1, $3); } | '(' intExpr ')' { $$ = $2; } | '-' intExpr %prec UnaryMinus { $$ = neg($2); } | INT { $$ = convert<int>(); } ; doubleExpr: doubleExpr '*' doubleExpr { $$ = exec('*', $1, $3); } | doubleExpr '*' intExpr { $$ = exec('*', $1, d($3)); } | intExpr '*' doubleExpr { $$ = exec('*', d($1), $3); } | doubleExpr '+' doubleExpr { $$ = exec('+', $1, $3); } | doubleExpr '+' intExpr { $$ = exec('+', $1, d($3)); } | intExpr '+' doubleExpr { $$ = exec('+', d($1), $3); } | '(' doubleExpr ')' { $$ = $2; } | '-' doubleExpr %prec UnaryMinus { $$ = neg($2); } | DOUBLE { $$ = convert<double>(); } ;
This grammar is used to implement a simple calculator in which integer and
real values can be negated, added, and multiplied and in which standard
priority rules can be overruled by parentheses. The grammar shows the use of
typed nonterminal symbols: doubleExpr
is linked to real (double) values,
intExpr
is linked to integer values. Precedence and type association is
defined in the parser's definition section.
Bisonc++
generates multiple files, among which the file
defining the parser's class. Functions called from the production rule's
action blocks are usually member functions of the parser. These member
functions must be declared and defined. Once bisonc++
has generated the
header file defining the parser's class it will not automatically rewrite that
file, allowing the programmer to add new members to the parser class once they
are required. Here is the parser.h
file as used in our little calculator:
#ifndef Parser_h_included #define Parser_h_included #include <iostream> #include <sstream> #include <bobcat/a2x> #include "parserbase.h" #include "../scanner/scanner.h" #undef Parser class Parser: public ParserBase { std::ostringstream d_rpn; // $insert scannerobject Scanner d_scanner; public: int parse(); private: template <typename Type> Type exec(char c, Type left, Type right); template <typename Type> Type neg(Type op); template <typename Type> Type convert(); void display(int x); void display(double x); void done() const; void reset(); void error(char const *msg); int lex(); void print(); static double d(int i); // support functions for parse(): void executeAction(int d_ruleNr); void errorRecovery(); int lookup(bool recovery); void nextToken(); }; inline double Parser::d(int i) { return i; } template <typename Type> Type Parser::exec(char c, Type left, Type right) { d_rpn << " " << c << " "; return c == '*' ? left * right : left + right; } template <typename Type> Type Parser::neg(Type op) { d_rpn << " n "; return -op; } template <typename Type> Type Parser::convert() { Type ret = FBB::A2x(d_scanner.YYText()); d_rpn << " " << ret << " "; return ret; } inline void Parser::error(char const *msg) { std::cerr << msg << '\n'; } inline int Parser::lex() { return d_scanner.yylex(); } inline void Parser::print() {} #endif
Parser::INT
or Parser::DOUBLE
tokens. Here is the complete
specification file:
%{ #define _SKIP_YYFLEXLEXER_ #include "scanner.ih" #include "../parser/parserbase.h" %} %option yyclass="Scanner" outfile="yylex.cc" c++ 8bit warn noyywrap %option debug %% [ \t] ; [0-9]+ return Parser::INT; "."[0-9]* | [0-9]+("."[0-9]*)? return Parser::DOUBLE; .|\n return *yytext; %%
bison
and flex
. The files
parser.cc
and parser.h
are generated by the command:
bisonc++ -V grammarThe option
-V
produces the file parser.output
showing information
about the internal structure of the provided grammar (among which its
states). It is useful for debugging purposes and can be omitted if no
debugging is required.
Bisonc++
may detect conflicts
(
shift-reduce conflicts and/or
reduce-reduce conflicts) in the
provided grammar. These conflicts may be resolved explicitly using
disambiguating rules or they are `resolved' by default. By default, a
shift-reduce conflict is resolved by shifting (i.e., the next token is
consumed). By default a reduce-reduce conflict is resolved by using the first
of two competing production rules. Bisonc++
's
conflict resolution procedures are identical to bison
's procedures.
Once a parser class and a parsing member function has been constructed
flex
may be used to create a lexical scanner using, e.g., the command
flex -I lexer
On Unix systems a command like
g++ -o calc -Wall *.cc -scan be used to compile and link the source of the main program and the sources produced by the scanner and parser generators.
Finally, here is a source file in which the main
function and the
parser object is defined. The parser features the lexical scanner as one of
its data members:
#include "parser/parser.h" using namespace std; int main() { Parser parser; cout << "Enter (nested) expressions containing ints, doubles, *, + and " "unary -\n" "operators. Enter an empty line to stop.\n"; return parser.parse(); }
Bisonc++
can be downloaded from
http://bisoncpp.sourceforge.net/. It requires the bobcat
library, which can be downloaded from
http://bobcat.sourceforge.net/.
One may wonder why a union
is still used by Bisonc++ as C++ offers
inherently superior constructs to combine multiple types into one type. The
C++ way to combine types into one type is by defining a polymorphic base
class and a series of derived classes implementing the alternative data
types. Bisonc++ supports the union
approach (and the unrestricted unions
with C++0x) for various (e.g., backward compatibility)
reasons.
Bison and
bison++ both support the %union
directive.
An alternative to using a union
is using a polymorphic base class. Such a
class is developed below (the class Base
). As it is a polymorphic
base class it has the following characteristics:
ownClone
member implementing a so-called virtual
constructor (cf. the
virtual constructor
design pattern,
Gamma et al. (1995));
;
clone
must be called. It calls ownClone
and
forms a layer between the derived class implementations of
ownClone
and the user-software. Right now it only calls
ownClone
, but by not defining it as an inline function
clone
can easily be extended once that is required;
insert
member and an overloaded operator<<
to
allow derived objects to be inserted into ostream
objects.
#ifndef INCLUDED_BASE_ #define INCLUDED_BASE_ #include <iosfwd> class Base { friend std::ostream &operator<<(std::ostream &out, Base const &obj); public: virtual ~Base() = default; Base *clone() const; private: virtual Base *ownClone() const = 0; virtual std::ostream &insert(std::ostream &os) const = 0; }; inline std::ostream &operator<<(std::ostream &out, Base const &obj) { return obj.insert(out); } #endif
Instead of using fields of a classical union
we are now using
classes that are derived from the class Base
. For example:
Int
contain int
values. Here is
its interface (and implementation):
#ifndef INCLUDED_INT_ #define INCLUDED_INT_ #include <ostream> #include <bobcat/a2x> #include "../base/base.h" class Int: public Base { int d_value; public: Int(char const *text); Int(int v); int value() const; // directly access the value private: virtual Base *ownClone() const; virtual std::ostream &insert(std::ostream &os) const; }; inline Int::Int(char const *txt) : d_value(FBB::A2x(txt)) {} inline Int::Int(int v) : d_value(v) {} inline Base *Int::ownClone() const { return new Int(*this); } inline int Int::value() const { return d_value; } inline std::ostream &Int::insert(std::ostream &out) const { return out << d_value; } #endif
Text
contain text. These objects can be
used, e.g., to store the names of identifiers recognized by a lexical scanner.
Here is the interface of the class Text
:
#ifndef INCLUDED_TEXT_ #define INCLUDED_TEXT_ #include <string> #include <ostream> #include "../base/base.h" class Text: public Base { std::string d_text; public: Text(char const *id); std::string const &id() const; // directly access the name. private: virtual Base *ownClone() const; virtual std::ostream &insert(std::ostream &os) const; }; inline Text::Text(char const *id) : d_text(id) {} inline Base *Text::ownClone() const { return new Text(*this); } inline std::string const &Text::id() const { return d_text; } inline std::ostream &Text::insert(std::ostream &out) const { return out << d_text; } #endif
Base
can't immediately be used as the parser's
semantic value type for various reasons:
Base
class object cannot contain derived class's data members,
so plain Base
class objects cannot be used to store the parser's semantic
values.
Base
class reference as a semantic
value either as containers cannot store references.
Base
class object. Although a pointer would offer programmers the benefits of the
polymorphic nature of the Base
class, it would also require them to keep
track of all memory used by Base
objects, thus countering many of the
benefits of using a polymorphic base class.
To solve the above problems, a wrapper class Semantic
around a
Base
pointer is used. To simplify memory bookkeeping Semantic
itself
is defined as a class derived from std::shared_ptr
(cf. section
18.4). This allows us to benefit from default implementations of the
copy constructor, the overloaded assignment operator, and the destructor.
Semantic
itself offers an overloaded insertion operator allowing us to
insert the object that is controlled by the Semantic
object and derived
from Base
into an ostream
. Here is Semantic
's interface:
#ifndef INCLUDED_SEMANTIC_ #define INCLUDED_SEMANTIC_ #include <memory> #include <ostream> #include "../base/base.h" class Semantic: public std::shared_ptr<Base> { friend std::ostream &operator<<(std::ostream &out, Semantic const &obj); public: Semantic(Base *bp = 0); // Semantic owns the bp ~Semantic() = default; }; inline Semantic::Semantic(Base *bp) : std::shared_ptr<Base>(bp) {} inline std::ostream &operator<<(std::ostream &out, Semantic const &obj) { if (obj) return out << *obj; return out << "<UNDEFINED>"; } #endif
%stype
is of course
Semantic
. A simple grammar is defined for this illustrative example. The
grammar expects input according to the following rule:
rule: IDENTIFIER '(' IDENTIFIER ')' ';' | IDENTIFIER '=' INT ';' ;
The rule's actions simply echo the received identifiers and int
values to
cout
. Here is an example of a decorated production rule, where due to
Semantic
's overloaded insertion operator the insertion of the object
controlled by Semantic
is automatically performed:
IDENTIFIER '=' INT ';' { cout << $1 << " " << $3 << '\n'; }
Bisonc++'s parser stores all semantic values on its semantic values stack
(irrespective of the number of tokens that were defined in a particular
production rule). At any time all semantic values associated with
previously recognized tokens are available in an action block. Once the
semantic value stack is reduced, the Semantic
class's destructor takes
care of the proper destruction of the objects controlled by its shared_ptr
base class.
The scanner must of course be allowed to access the parser's data member
representing the most recent semantic value. This data member is available as
the parser's data member d_val__
, whose address or reference can be passed
to the scanner when it is constructed. E.g., with a scanner expecting
an STYPE__ &
the parser's constructor could simply be implemented as:
inline Parser::Parser() : d_scanner(d_val__) {}
d_val__
data
member. Therefore the Scanner class defines a Semantic *d_semval
member,
initialized to the parser's Semantic d_val__
data member that is made
available to the Scanner's constructor:
inline Scanner::Scanner(Parser::STYPE__ *semval) : // or: Semantic *semval d_semval(semval) {}
The scanner (generated by
flex
) recognizes input patterns, returns Parser
tokens (e.g., Parser::INT), and returns a semantic value when
applicable. E.g., when recognizing a Parser::INT
the rule is:
{ d_semval->reset(new Int(yytext)); return Parser::INT; }
IDENTIFIER
's semantic value is obtained analogously:
[a-zA-Z_][a-zA-Z0-9_]* { d_semval->reset(new Text(yytext)); return Parser::IDENTIFIER; }