Chapter 23: Concrete Examples

Don't hesitate to send in feedback: send an e-mail if you like the C++ Annotations; if you think that important material was omitted; if you find errors or typos in the text or the code examples; or if you just feel like e-mailing. Send your e-mail to Frank B. Brokken.
Please state the document version you're referring to, as found in the title (in this document: 8.3.1) and please state chapter and paragraph name or number you're referring to.
All received mail is processed conscientiously, and received suggestions for improvements will usually have been processed by the time a new version of the Annotations is released. Except for the incidental case I will normally not acknowledge the receipt of suggestions for improvements. Please don't interpret this as me not appreciating your efforts.

In this chapter concrete examples of C++ programs, classes and templates will be presented. Topics covered by the C++ Annotations such as virtual functions, static members, etc. are illustrated in this chapter. The examples roughly follow the organization of earlier chapters.

As an additional topic, not just providing examples of C++ the subjects of scanner and parser generators are covered. We show how these tools may be used in C++ programs. These additional examples assume a certain familiarity with the concepts underlying these tools, like grammars, parse-trees and parse-tree decoration. Once the input for a program exceeds a certain level of complexity, it's attractive to use scanner- and parser-generators to create the code doing the actual input processing. One of the examples in this chapter describes the usage of these tools in a C++ environment.

23.1: Using file descriptors with `streambuf' classes

23.1.1: Classes for output operations

Reading and writing from and to file descriptors are not part of the C++ standard. But on most operating systems file descriptors are available and can be considered a device. It seems natural to use the class std::streambuf as the starting point for constructing classes interfacing such file descriptor devices.

Below we'll construct classes that can be used to write to a device given its file descriptor. The devices may be files, but they could also be pipes or sockets. Section 23.1.2 covers reading from such devices; section 23.3.1 reconsiders redirection, discussed earlier in section 6.6.1.

Using the streambuf class as a base class it is relatively easy to design classes for output operations. The only member function that must be overridden is the (virtual) member int steambuf::overflow(int c). This member's responsibility is to write characters to the device. If fd is an output file descriptor and if output should not be buffered then the member overflow() can simply be implemented as:

    class UnbufferedFD: public std::streambuf
    {
        public:
            virtual int overflow(int c);
            ...
    };

    int UnbufferedFD::overflow(int c)
    {
        if (c != EOF)
        {
            if (write(d_fd, &c, 1) != 1)
                return EOF;
        }
        return c;
    }

The argument received by overflow is either written to the file descriptor (and returned from overflow), or EOF is returned.

This simple function does not use output buffering. For various reasons, using a buffer is usually a good idea (see also the next section).

When output buffering is used, the overflow member is a bit more complex as it is only called when the buffer is full. Once the buffer is full, we first have to flush the buffer. Flushing the buffer is the responsibility of the (virtual) function streambuf::sync is available. Since sync is a virtual function, classes derived from streambuf may redefine sync to flush a buffer streambuf itself doesn't know about.

Overriding sync and using it in overflow is not all that has to be done. When the object of the class defining the buffer reaches the end of its lifetime the buffer may be only partially full. In that situation the buffer must also be flushed. This is easily done by simply calling sync from the class's destructor.

Now that we've considered the consequences of using an output buffer, we're almost ready to design our derived class. Several more features will be added as well, though:

First, we should allow the user of the class to specify the size of the output buffer.
Second, it should be possible to construct an object of our class before the file descriptor is actually known. Later, in section 23.3 we'll encounter a situation where this feature will be used.

To save some space in the C++ Annotations, the successful completion of the functions designed here is not checked in the example code. In `real life' implementations these checks should of course not be omitted. Our class OFdnStreambuf has the following characteristics:

Its member functions use low-level functions operating on file descriptors. So apart from streambuf the <unistd.h> header file must have been read by the compiler before its member functions can be compiled.
The class is derived from std::streambuf.

It defines three data members. These data members keep track of, respectively, the size of the buffer, the file descriptor, and the buffer itself. Here is the full class interface

    class OFdnStreambuf: public std::streambuf
    {
        size_t d_bufsize;
        int     d_fd;
        char    *d_buffer;

        public:
            OFdnStreambuf();
            OFdnStreambuf(int fd, size_t bufsize = 1);
            virtual ~OFdnStreambuf();
            void open(int fd, size_t bufsize = 1);
        private:
            virtual int sync();
            virtual int overflow(int c);
    };

Its default constructor merely initializes the buffer to 0. Slightly more interesting is its constructor expecting a file descriptor and a buffer size. This constructor passes its arguments on to the class's open member (see below). Here are the constructors:
```
    inline OFdnStreambuf::OFdnStreambuf()
    :
        d_bufsize(0),
        d_buffer(0)
    {}
    inline OFdnStreambuf::OFdnStreambuf(int fd, size_t bufsize)
    {
        open(fd, bufsize);
    }
```
The destructor calls sync, flushing any characters stored in the output buffer to the device. In implementations not using a buffer the destructor can be given a default implementation:
```
    inline OFdnStreambuf::~OFdnStreambuf()
    {
        if (d_buffer)
        {
            sync();
            delete[] d_buffer;
        }
    }
```
This implementation does not close the device. It is left as an exercise to the reader to change this class in such a way that the device is optionally closed (or optionally remains open). This approach was adopted by, e.g., the Bobcat library. See also section 23.1.2.2.

The open member initializes the buffer. Using streambuf::setp, the begin and end points of the buffer are defined. This is used by the streambuf base class to initialize streambuf::pbase, streambuf::pptr, and streambuf::epptr:

    inline void OFdnStreambuf::open(int fd, size_t bufsize)
    {
        d_fd = fd;
        d_bufsize = bufsize == 0 ? 1 : bufsize;

        d_buffer = new char[d_bufsize];
        setp(d_buffer, d_buffer + d_bufsize);
    }

The member sync flushes the as yet unflushed contents of the buffer to the device. After the flush the buffer is reinitialized using setp. After successfully flushing the buffer sync returns 0:

    inline int OFdnStreambuf::sync()
    {
        if (pptr() > pbase())
        {
            write(d_fd, d_buffer, pptr() - pbase());
            setp(d_buffer, d_buffer + d_bufsize);
        }
        return 0;
    }

The member streambuf::overflow is also overridden. Since this member is called from the streambuf base class when the buffer is full it should first call sync to flush the buffer to the device. Next it should write the character c to the (now empty) buffer. The character c should be wrtten using pptr and streambuf::pbump. Entering a character into the buffer should be implemented using available streambuf member functions, rather than `by hand' as doing so might invalidate streambuf's internal bookkeeping. Here is overflow's implementation:
```
    inline int OFdnStreambuf::overflow(int c)
    {
        sync();
        if (c != EOF)
        {
            *pptr() = c;
            pbump(1);
        }
        return c;
    }
```

The next program uses the OFfdStreambuf class to copy its standard input to file descriptor STDOUT_FILENO, which is the symbolic name of the file descriptor used for the standard output:

    #include <string>
    #include <iostream>
    #include <istream>
    #include "fdout.h"
    using namespace std;

    int main(int argc, char **argv)
    {
        OFdnStreambuf   fds(STDOUT_FILENO, 500);
        ostream         os(&fds);

        switch (argc)
        {
            case 1:
                for (string  s; getline(cin, s); )
                    os << s << '\n';
                os << "COPIED cin LINE BY LINE\n";
            break;

            case 2:
                cin >> os.rdbuf();      // Alternatively, use:  cin >> &fds;
                os << "COPIED cin BY EXTRACTING TO os.rdbuf()\n";
            break;

            case 3:
                os << cin.rdbuf();
                os << "COPIED cin BY INSERTING cin.rdbuf() into os\n";
            break;
        }
    }

23.1.2: Classes for input operations

When classes for input operation are derived from std::streambuf, they should be provided with an input buffer of at least one character. The one-character input buffer allows for the use of the member functions istream::putback or istream::ungetc. Strictly speaking it is not necessary to implement a buffer in classes derived from streambuf. But using buffers in these classes is strongly advised. Their implementation is very simple and straightforward and the applicability of such classes will be greatly improved. Therefore, in all our classes derived from the class streambuf a buffer of at least one character will be defined.

23.1.2.1: Using a one-character buffer

When deriving a class (e.g., IFdStreambuf) from streambuf using a buffer of one character, at least its member streambuf::underflow should be overridden, as this member eventually receives all requests for input. The member streambuf::setg is used to inform the streambuf base class of the size and location of the input buffer, so that it is able to set up its input buffer pointers accordingly. This will ensure that streambuf::eback, streambuf::gptr, and streambuf::egptr return correct values.

The class IFdStreambuf is designed like this:

Its member functions use low-level functions operating on file descriptors. Therefore, in addition to streambuf, the <unistd.h> header file must have been read by the compiler before its member functions can be compiled.
Like most classes designed for input operations, this class is derived from std::streambuf as well.
The class defines two data members, one of them a fixed-sized one character buffer. The data members are defined as protected data members so that derived classes (e.g., see section 23.1.2.3) can access them. Here is the full class interface:
```
    class IFdStreambuf: public std::streambuf
    {
        protected:
            int     d_fd;
            char    d_buffer[1];
        public:
            IFdStreambuf(int fd);
        private:
            int underflow();
    };
```
The constructor initializes the buffer. However, the initialization sets gptr's return value equal to egptr's return value. This implies that the buffer is empty so underflow is immediately called to fill the buffer:
```
    inline IFdStreambuf::IFdStreambuf(int fd)
    :
        d_fd(fd)
    {
        setg(d_buffer, d_buffer + 1, d_buffer + 1);
    }
```
Finally underflow is overridden. The buffer is refilled by reading from the file descriptor. If this fails (for whatever reason), EOF is returned. More sophisticated implementations could act more intelligently here, of course. If the buffer could be refilled, setg is called to set up streambuf's buffer pointers correctly:
```
    inline int IFdStreambuf::underflow()
    {
        if (read(d_fd, d_buffer, 1) <= 0)
            return EOF;

        setg(d_buffer, d_buffer, d_buffer + 1);
        return *gptr();
    }
```

The following main function shows how IFdStreambuf can be used:

    int main()
    {
        IFdStreambuf fds(STDIN_FILENO);
        istream      is(&fds);

        cout << is.rdbuf();
    }

23.1.2.2: Using an n-character buffer

How complex would things get if we would decide to use a buffer of substantial size? Not that complex. The following class allows us to specify the size of a buffer, but apart from that it is basically the same class as IFdStreambuf developed in the previous section. To make things a bit more interesting, in the class IFdNStreambuf developed here, the member streambuf::xsgetn is also overridden, to optimize reading a series of characters. Also a default constructor is provided that can be used in combination with the open member to construct an istream object before the file descriptor becomes available. In that case, once the descriptor becomes available, the open member can be used to initiate the object's buffer. Later, in section 23.3, we'll encounter such a situation.

To save some space, the success of various calls was not checked. In `real life' implementations, these checks should of course not be omitted. The class IFdNStreambuf has the following characteristics:

Its member functions use low-level functions operating on file descriptors. So apart from streambuf the <unistd.h> header file must have been read by the compiler before its member functions can be compiled.
As usual, it is derived from std::streambuf.

Like the class IFdStreambuf (section 23.1.2.1), its data members are protected. Since the buffer's size is configurable, this size is kept in a dedicated data member, d_bufsize:

    class IFdNStreambuf: public std::streambuf
    {
        protected:
            int         d_fd;
            size_t      d_bufsize;
            char*       d_buffer;
        public:
            IFdNStreambuf();
            IFdNStreambuf(int fd, size_t bufsize = 1);
            virtual ~IFdNStreambuf();
            void open(int fd, size_t bufsize = 1);
        private:
            virtual int underflow();
            virtual std::streamsize xsgetn(char *dest, std::streamsize n);
    };

The default constructor does not allocate a buffer. It can be used to construct an object before the file descriptor becomes known. A second constructor simply passes its arguments to open. Open will then initialize the object so that it can actually be used:
```
    inline IFdNStreambuf::IFdNStreambuf()
    :
        d_bufsize(0),
        d_buffer(0)
    {}
    inline IFdNStreambuf::IFdNStreambuf(int fd, size_t bufsize)
    {
        open(fd, bufsize);
    }
```
Once the object has been initialized by open, its destructor will both delete the object's buffer and use the file descriptor to close the device:
```
    IFdNStreambuf::~IFdNStreambuf()
    {
        if (d_bufsize)
        {
            close(d_fd);
            delete[] d_buffer;
        }
    }
```
Even though the device is closed in the above implementation this may not always be desirable. In cases where the open file descriptor is already available the intention may be to use that descriptor repeatedly, each time using a newly constructed IFdNStreambuf object. It is left as an exercise to the reader to change this class in such a way that the device may optionally be closed. This approach was followed in, e.g., the Bobcat library.
The open member simply allocates the object's buffer. It is assumed that the calling program has already opened the device. Once the buffer has been allocated, the base class member setg is used to ensure that streambuf::eback streambuf::gptr and streambuf::egptr return correct values:
```
    void IFdNStreambuf::open(int fd, size_t bufsize)
    {
        d_fd = fd;
        d_bufsize = bufsize;
        d_buffer = new char[d_bufsize];
        setg(d_buffer, d_buffer + d_bufsize, d_buffer + d_bufsize);
    }
```

The overridden member underflow is implemented almost identically to IFdStreambuf's (section 23.1.2.1) member. The only difference is that the current class supports buffers of larger sizes. Therefore, more characters (up to d_bufsize) may be read from the device at once:

    int IFdNStreambuf::underflow()
    {
        if (gptr() < egptr())
            return *gptr();

        int nread = read(d_fd, d_buffer, d_bufsize);

        if (nread <= 0)
            return EOF;

        setg(d_buffer, d_buffer, d_buffer + nread);
        return *gptr();
    }

Finally xsgetn is overridden. In a loop, n is reduced until 0, at which point the function terminates. Alternatively, the member returns if underflow fails to obtain more characters. This member optimizes the reading of series of characters. Instead of calling streambuf::sbumpc n times, a block of avail characters is copied to the destination, using streambuf::gbump to consume avail characters from the buffer using one function call:

    std::streamsize IFdNStreambuf::xsgetn(char *dest, std::streamsize n)
    {
        int nread = 0;

        while (n)
        {
            if (!in_avail())
            {
                if (underflow() == EOF)
                    break;
            }

            int avail = in_avail();

            if (avail > n)
                avail = n;

            memcpy(dest + nread, gptr(), avail);
            gbump(avail);

            nread += avail;
            n -= avail;
        }

        return nread;
    }

The member function xsgetn is called by streambuf::sgetn, which is a streambuf member. Here is an example illustrating the use of this member function with a IFdNStreambuf object:

    #include <unistd.h>
    #include <iostream>
    #include <istream>
    #include "ifdnbuf.h"
    using namespace std;

    int main()
    {
                                    // internally: 30 char buffer
        IFdNStreambuf fds(STDIN_FILENO, 30);

        char buf[80];               // main() reads blocks of 80
                                    // chars
        while (true)
        {
            size_t n = fds.sgetn(buf, 80);
            if (n == 0)
                break;
            cout.write(buf, n);
        }
    }

23.1.2.3: Seeking positions in `streambuf' objects

When devices support seek operations, classes derived from std::streambuf should override the members streambuf::seekoff and streambuf::seekpos. The class IFdSeek, developed in this section, can be used to read information from devices supporting seek operations. The class IFdSeek was derived from IFdStreambuf, so it uses a character buffer of just one character. The facilities to perform seek operations, which are added to our new class IFdSeek, ensure that the input buffer is reset when a seek operation is requested. The class could also be derived from the class IFdNStreambuf. In that which case the arguments to reset the input buffer must be adapted so that its second and third parameters point beyond the available input buffer. Let's have a look at the characteristics of IFdSeek:

As mentioned, IFdSeek is derived from IFdStreambuf. Like the latter class, IFdSeek's member functions use facilities declared in unistd.h. So, the header file <unistd.h> must have been read by the compiler before it can compile the class's members functions. To reduce the amount of typing when specifying types and constants from streambuf and std::ios, several typedefs are defined by the class. These typedefs refer to types that are defined in the header file <ios>, which must therefore also be included before the compiler can compile IFdSeek's class interface:
```
    class IFdSeek: public IFdStreambuf
    {
        typedef std::streambuf::pos_type        pos_type;
        typedef std::streambuf::off_type        off_type;
        typedef std::ios::seekdir               seekdir;
        typedef std::ios::openmode              openmode;

        public:
            IFdSeek(int fd);
        private:
            pos_type seekoff(off_type offset, seekdir dir, openmode);
            pos_type seekpos(pos_type offset, openmode mode);
    };
```
The class has a very basic interface. Its (only) constructor expects the device's file descriptor. It has no special tasks to perform and just cals its base class constructor:
```
    inline IFdSeek::IFdSeek(int fd)
    :
        IFdStreambuf(fd)
    {}
```

The member seek_off is responsible for performing the actual seek operations. It calls lseek to seek a new position in a device whose file descriptor is known. If seeking succeeds, setg is called to define an already empty buffer, so that the base class's underflow member refills the buffer at the next input request.

    IFdSeek::pos_type IFdSeek::seekoff(off_type off, seekdir dir, openmode)
    {
        pos_type pos =
            lseek
            (
                d_fd, off,
                (dir ==  std::ios::beg) ? SEEK_SET :
                (dir ==  std::ios::cur) ? SEEK_CUR :
                                          SEEK_END
            );

        if (pos < 0)
            return -1;

        setg(d_buffer, d_buffer + 1, d_buffer + 1);
        return pos;
    }

Finally, the companion function seekpos is overridden as well: it is actually defined as a call to seekoff:

    inline IFdSeek::pos_type IFdSeek::seekpos(pos_type off, openmode mode)
    {
        return seekoff(off, std::ios::beg, mode);
    }

Here is an example of a program using the class IFdSeek. If this program is given its own source file using input redirection then seeking is supported (and with the exception of the first line, every other line is shown twice):

    #include "fdinseek.h"
    #include <string>
    #include <iostream>
    #include <istream>
    #include <iomanip>
    using namespace std;

    int main()
    {
        IFdSeek fds(0);
        istream is(&fds);
        string  s;

        while (true)
        {
            if (!getline(is, s))
                break;

            streampos pos = is.tellg();

            cout << setw(5) << pos << ": `" << s << "'\n";

            if (!getline(is, s))
                break;

            streampos pos2 = is.tellg();

            cout << setw(5) << pos2 << ": `" << s << "'\n";

            if (!is.seekg(pos))
            {
                cout << "Seek failed\n";
                break;
            }
        }
    }

23.1.2.4: Multiple `unget' calls in `streambuf' objects

Streambuf classes and classes derived from streambuf should support at least ungetting the last read character. Special care must be taken when series of unget calls must be supported. In this section the construction of a class supporting a configurable number of istream::unget or istream::putback calls is discussed.

Support for multiple (say `n') unget calls is implemented by reserving an initial section of the input buffer, which is gradually filled up to contain the last n characters read. The class was implemented as follows:

Once again, the class is derived from std::streambuf. It defines several data members, allowing the class to perform the bookkeeping required to maintain an unget-buffer of a configurable size:

    class FdUnget: public std::streambuf
    {
        int     d_fd;
        size_t  d_bufsize;
        size_t  d_reserved;
        char   *d_buffer;
        char   *d_base;
        public:
            FdUnget(int fd, size_t bufsz, size_t unget);
            virtual ~FdUnget();
        private:
            int underflow();
    };

The class's constructor expects a file descriptor, a buffer size and the number of characters that can be ungot or pushed back as its arguments. This number determines the size of a reserved area, defined as the first d_reserved bytes of the class's input buffer.
- The input buffer will always be at least one byte larger than d_reserved. So, a certain number of bytes may be read. Once d_reserved bytes have been read at most d_reserved bytes can be ungot.
- Next, the starting point for reading operations is configured. It is called d_base, pointing to a location d_reserved bytes from the start of d_buffer. This will always be the point where buffer refills start.
- Now that the buffer has been constructed, we're ready to define streambuf's buffer pointers using setg. As no characters have been read yet, all pointers are set to point to d_base. If unget is called at this point, no characters are available, so unget will (correctly) fail.
- Eventually, the refill buffer's size is determined as the number of allocated bytes minus the size of the reserved area.
Here is the class's constructor:
```
    FdUnget::FdUnget(int fd, size_t bufsz, size_t unget)
    :
        d_fd(fd),
        d_reserved(unget)
    {
        size_t allocate =
                bufsz > d_reserved ?
                    bufsz
                :
                    d_reserved + 1;

        d_buffer = new char[allocate];

        d_base = d_buffer + d_reserved;
        setg(d_base, d_base, d_base);

        d_bufsize = allocate - d_reserved;
    }
```
The class's destructor simply returns the memory allocated for the buffer to the common pool:
```
    inline FdUnget::~FdUnget()
    {
        delete[] d_buffer;
    }
```
Finally, underflow is overridden as follows:
- First underflow determines the number of characters that could potentially be ungot. If that number of characters are ungot, the input buffer is exhausted. So this value may be any value between 0 (the initial state) or the input buffer's size (when the reserved area has been filled up completely, and all current characters in the remaining section of the buffer have also been read);
- Next the number of bytes to move into the reserved area is computed. This number is at most d_reserved, but it is set equal to the actual number of characters that can be ungot if this value is smaller;
- Now that the number of characters to move into the reserved area is known, this number of characters is moved from the input buffer's end to the area immediately before d_base;
- Then the buffer is refilled. This all is standard, but notice that reading starts from d_base and not from d_buffer;
- Finally, streambuf's read buffer pointers are set up. Eback is set to move locations before d_base, thus defining the guaranteed unget-area, gptr is set to d_base, since that's the location of the first read character after a refill, and is set just beyond the location of the last character read into the buffer.
Here is underflow's implementation:
```
    int FdUnget::underflow()
    {
        size_t ungetsize = gptr() - eback();
        size_t move = std::min(ungetsize, d_reserved);

        memcpy(d_base - move, egptr() - move, move);

        int nread = read(d_fd, d_base, d_bufsize);
        if (nread <= 0)       // none read -> return EOF
            return EOF;

        setg(d_base - move, d_base, d_base + nread);

        return *gptr();
    }
```

The example program illustrates the class FdUnget. It reads at most 10 characters from the standard input, stopping at EOF. A guaranteed unget-buffer of 2 characters is defined in a buffer holding 3 characters. Just before reading a character, the program tries to unget at most 6 characters. This is, of course, not possible; but the program will nicely unget as many characters as possible, considering the actual number of characters read:

    #include "fdunget.h"
    #include <string>
    #include <iostream>
    #include <istream>
    using namespace std;

    int main()
    {
        FdUnget fds(0, 3, 2);
        istream is(&fds);
        char    c;

        for (int idx = 0; idx < 10; ++idx)
        {
            cout << "after reading " << idx << " characters:\n";
            for (int ug = 0; ug <= 6; ++ug)
            {
                if (!is.unget())
                {
                    cout
                    << "\tunget failed at attempt " << (ug + 1) << "\n"
                    << "\trereading: '";

                    is.clear();
                    while (ug--)
                    {
                        is.get(c);
                        cout << c;
                    }
                    cout << "'\n";
                    break;
                }
            }

            if (!is.get(c))
            {
                cout << " reached\n";
                break;
            }
            cout << "Next character: " << c << '\n';
        }
    }
    /*
        Generated output after 'echo abcde | program':

        after reading 0 characters:
                unget failed at attempt 1
                rereading: ''
        Next character: a
        after reading 1 characters:
                unget failed at attempt 2
                rereading: 'a'
        Next character: b
        after reading 2 characters:
                unget failed at attempt 3
                rereading: 'ab'
        Next character: c
        after reading 3 characters:
                unget failed at attempt 4
                rereading: 'abc'
        Next character: d
        after reading 4 characters:
                unget failed at attempt 4
                rereading: 'bcd'
        Next character: e
        after reading 5 characters:
                unget failed at attempt 4
                rereading: 'cde'
        Next character:

        after reading 6 characters:
                unget failed at attempt 4
                rereading: 'de
        '
         reached
    */

23.2: Fixed-sized field extraction from istream objects

Usually when extracting information from istream objects operator>>, the standard extraction operator, is perfectly suited for the task as in most cases the extracted fields are white-space (or otherwise clearly separated) from each other. But this does not hold true in all situations. For example, when a web-form is posted to some processing script or program, the receiving program may receive the form field's values as url-encoded characters: letters and digits are sent unaltered, blanks are sent as + characters, and all other characters start with % followed by the character's ascii-value represented by its two digit hexadecimal value.

When decoding url-encoded information, simple hexadecimal extraction won't work, since that will extract as many hexadecimal characters as available, instead of just two. Since the letters a-f` and 0-9 are legal hexadecimal characters, a text like My name is `Ed', url-encoded as

    My+name+is+%60Ed%27

results in the extraction of the hexadecimal values 60ed and 27, instead of 60 and 27. The name Ed disappears from view, which is clearly not what we want.

In this case, having seen the %, we could extract 2 characters, put them in an istringstream object, and extract the hexadecimal value from the istringstream object. A bit cumbersome, but doable. Other approaches are possible as well.

The class Fistream for fixed-sized field istream defines an istream class supporting both fixed-sized field extractions and blank-delimited extractions (as well as unformatted read calls). The class may be initialized as a wrapper around an existing istream, or it can be initialized using the name of an existing file. The class is derived from istream, allowing all extractions and operations supported by istreams in general. Fistream defines the following data members:

d_filebuf: a filebuffer used when Fistream reads its information from a named (existing) file. Since the filebuffer is only needed in that case, and since it must be allocated dynamically, it is defined as a unique_ptr<filebuf> object.
d_streambuf: a pointer to Fistream's streambuf. It points to d_filebuf when Fistream opens a file by name. When an existing istream is used to construct an Fistream, it will point to the existing istream's streambuf.
d_iss: an istringstream object used for the fixed field extractions.
d_width: a size_t indicating the width of the field to extract. If 0 no fixed field extractions is used, but information is extracted from the istream base class object using standard extractions.

Here is the initial section of Fistream's class interface:

    class Fistream: public std::istream
    {
        std::unique_ptr<std::filebuf> d_filebuf;
        std::streambuf *d_streambuf;
        std::istringstream d_iss;
        size_t d_width;

As stated, Fistream objects can be constructed from either a filename or an existing istream object. The class interface therefore declares two constructors:

            Fistream(std::istream &stream);
            Fistream(char const *name,
                std::ios::openmode mode = std::ios::in);

When an Fistream object is constructed using an existing istream object, the Fistream's istream part simply uses the stream's streambuf object:

Fistream::Fistream(istream &stream)
:
    istream(stream.rdbuf()),
    d_streambuf(rdbuf()),
    d_width(0)
{}

When an fstream object is constructed using a filename, the istream base initializer is given a new filebuf object to be used as its streambuf. Since the class's data members are not initialized before the class's base class has been constructed, d_filebuf can only be initialized thereafter. By then, the filebuf is only available as rdbuf, returning a streambuf. However, as it is actually a filebuf, a reinterpret_cast is used to cast the streambuf pointer returned by rdbuf to a filebuf *, so d_filebuf can be initialized:

Fistream::Fistream(char const *name, ios::openmode mode)
:
    istream(new filebuf()),
    d_filebuf(reinterpret_cast<filebuf *>(rdbuf())),
    d_streambuf(d_filebuf.get()),
    d_width(0)
{
    d_filebuf->open(name, mode);
}

There is only one additional public member: setField(field const &). This member defines the size of the next field to extract. Its parameter is a reference to a field class, a manipulator class defining the width of the next field.

Since a field & is mentioned in Fistream's interface, field must be declared before Fistream's interface starts. The class field itself is simple and declares Fistream as its friend. It has two data members: d_width specifies the width of the next field, and d_newWidth which is set to true if d_width's value should actually be used. If d_newWidth is false, Fistream returns to its standard extraction mode. The class field has two constructors: a default constructor, setting d_newWidth to false, and a second constructor expecting the width of the next field to extract as its value. Here is the class field:

    class field
    {
        friend class Fistream;
        size_t d_width;
        bool     d_newWidth;

        public:
            field(size_t width);
            field();
    };

    inline field::field(size_t width)
    :
        d_width(width),
        d_newWidth(true)
    {}

    inline field::field()
    :
        d_newWidth(false)
    {}

Since field declares Fistream as its friend, setField may inspect field's members directly.

Time to return to setField. This function expects a reference to a field object, initialized in one of three different ways:

field(): When setField's argument is a field object constructed by its default constructor the next extraction will use the same fieldwidth as the previous extraction.
field(0): When this field object is used as setField's argument, fixed-sized field extraction stops, and the Fistream will act like any standard istream object.
field(x): When the field object itself is initialized by a non-zero size_t value x, then the next field width will be x characters wide. The preparation of such a field is left to setBuffer, Fistream's only private member.

Here is setField's implementation:

std::istream &Fistream::setField(field const &params)
{
    if (params.d_newWidth)                  // new field size requested
        d_width = params.d_width;           // set new width

    if (!d_width)                           // no width?
        rdbuf(d_streambuf);                 // return to the old buffer
    else
        setBuffer();                        // define the extraction buffer

    return *this;
}

The private member setBuffer defines a buffer of d_width + 1 characters and uses read to fill the buffer with d_width characters. The buffer is terminated by an ASCII-Z character. This buffer is used to initialize the d_str member. Fistream's rdbuf member is used to extract the d_str's data via the Fistream object itself:

void Fistream::setBuffer()
{
    char *buffer = new char[d_width + 1];

    rdbuf(d_streambuf);                     // use istream's buffer to
    buffer[read(buffer, d_width).gcount()] = 0; // read d_width chars,
                                                // terminated by ascii-Z
    d_iss.str(buffer);
    delete buffer;

    rdbuf(d_iss.rdbuf());                   // switch buffers
}

Although setField could be used to configure Fistream to use or not to use fixed-sized field extraction, using manipulators is probably preferable. To allow field objects to be used as manipulators an overloaded extraction operator was defined. This extraction operator accepts istream & and a field const & objects. Using this extraction operator, statements like

fis >> field(2) >> x >> field(0);

are possible (assuming fis is a Fistream object). Here is the overloaded operator>>, as well as its declaration:

istream &std::operator>>(istream &str, field const &params)
{
    return reinterpret_cast<Fistream *>(&str)->setField(params);
}

Declaration:

namespace std
{
    istream &operator>>(istream &str, FBB::field const &params);
}

Finally, an example. The following program uses a Fistream object to url-decode url-encoded information appearing at its standard input:

    int main()
    {
        Fistream fis(cin);

        fis >> hex;
        while (true)
        {
            size_t x;
            switch (x = fis.get())
            {
                case '\n':
                    cout << '\n';
                break;
                case '+':
                    cout << ' ';
                break;
                case '%':
                    fis >> field(2) >> x >> field(0);
                // FALLING THROUGH
                default:
                    cout << static_cast<char>(x);
                break;
                case EOF:
                return 0;
            }
        }
    }
    /*
        Generated output after:
            echo My+name+is+%60Ed%27 | a.out

        My name is `Ed'
    */

23.3: The `fork' system call

From the C programming language the fork system call is well known. When a program needs to start a new process, system can be used System requires the program to wait for the child process to terminate. The more general way to spawn subprocesses is to use fork.

In this section we investigate how C++ can be used to wrap classes around a complex system call like fork. Much of what follows in this section directly applies to the Unix operating system, and the discussion therefore focuses on that operating system. Other systems usually provide comparable facilities. What follows is closely related to the Template Design Pattern (cf. Gamma et al. (1995) Design Patterns, Addison-Wesley)

When fork is called, the current program is duplicated in memory, thus creating a new process. Following the duplication both processes continue their execution just below the fork system call. The two processes may inspect fork's return value: the return value in the original process (called the parent process) differs from the return value in the newly created process (called the child process):

In the parent process fork returns the process ID of the (child) process that was created by the fork system call. This is a positive integer value.
In the child process fork returns 0.
If fork fails, -1 is returned.

A basic Fork class should hide all bookkeeping details of a system call like fork from its users. The class Fork developed here will do just that. The class itself only ensures the proper execution of the fork system call. Normally, fork is called to start a child process, usually boiling down to the execution of a separate process. This child process may expect input at its standard input stream and/or may generate output to its standard output and/or standard error streams. Fork does not know all this, and does not have to know what the child process will do. Fork objects should be able to start their child processes.

Fork's constructor cannot know what actions its child process should perform. Similarly, it cannot know what actions the parent process should perform. For these kind of situations, the template method design pattern was developed. According to Gamma c.s., the template method design pattern

``Define(s) the skeleton of an algorithm in an operation, deferring some steps to subclasses. [The] Template Method (design pattern) lets subclasses redefine certain steps of an algorithm, without changing the algorithm's structure.''

This design pattern allows us to define an abstract base class already providing the essential steps related to the fork system call, deferring the implementation of other parts of the fork system call to subclasses.

The Fork abstract base class has the following characteristics:

It defines a data member d_pid. In the parent process this data member contains the child's process id and in the child process it has the value 0. Its public interface declares only two members:

a fork member function, responsible for the actual forking (i.e., it will create the (new) child process);
an (default) virtual destructor ~Fork.

Here is Fork's interface:

    class Fork
    {
        int d_pid;

        public:
            virtual ~Fork() = default;
            void fork();

        protected:
            int pid() const;
            int waitForChild();                 // returns the status

        private:
            virtual void childRedirections();
            virtual void parentRedirections();

            virtual void childProcess() = 0;    // pure virtual members
            virtual void parentProcess() = 0;
    };

All non-virtual member functions are declared in the class's protected section and can thus only be used by derived classes. They are:
- pid(): The member function pid allows derived classes to access the system fork's return value:
```
    inline int Fork::pid() const
    {
        return d_pid;
    }
```
- waitForChild(): The member int waitForChild can be called by parent processes to wait for the completion of their child processes (as discussed below). This member is declared in the class interface. Its implementation is:
```
    #include "fork.ih"

    int Fork::waitForChild()
    {
        int status;

        waitpid(d_pid, &status, 0);

        return WEXITSTATUS(status);
    }
```
  This simple implementation returns the child's exit status to the parent. The called system function waitpid blocks until the child terminates.
When fork system calls are used, parent processes and child processes must always be distinguished. The main distinction between these processes is that d_pid becomes the child's process-id in the parent process, while d_pid becomes 0 in the child process itself. Since these two processes must always be distinguished (and present), their implementation by classes derived from Fork is enforced by Fork's interface: the members childProcess, defining the child process' actions and parentProcess, defining the parent process' actions were defined as pure virtual functions.
communication between parent- and child processes may use standard streams or other facilities, like pipes (cf. section 23.3.3). To facilitate this inter-process communication, derived classes may implement:
- childRedirections(): this member should be implemented if any standard stream (cin, cout) or cerr must be redirected in the child process (cf. section 23.3.1);
- parentRedirections(): this member should be implemented if any standard stream (cin, cout) or cerr must be redirected in the parent process.
Redirection of the standard streams is necessary if parent- and child processes must communicate with each other via the standard streams. Here are their default definitions provided by the class's interface:
```
    inline void Fork::childRedirections()
    {}
    inline void Fork::parentRedirections()
    {}
```

The member function fork calls the system function fork (Caution: since the system function fork is called by a member function having the same name, the :: scope resolution operator must be used to prevent a recursive call of the member function itself). ::fork's return value determines whether parentProcess or childProcess is called. Maybe redirection is necessary. Fork::fork's implementation calls childRedirections just before calling childProcess, and parentRedirections just before calling parentProcess:

    #include "fork.ih"

    void Fork::fork()
    {
        if ((d_pid = ::fork()) < 0)
            throw "Fork::fork() failed";

        if (d_pid == 0)                 // childprocess has pid == 0
        {
            childRedirections();
            childProcess();
            exit(1);                    // we shouldn't come here:
        }                               // childProcess() should exit

        parentRedirections();
        parentProcess();
    }

In fork.cc the class's internal header file fork.ih is included. This header file takes care of the inclusion of the necessary system header files, as well as the inclusion of fork.h itself. Its implementation is:

    #include "fork.h"
    #include <cstdlib>
    #include <unistd.h>
    #include <sys/types.h>
    #include <sys/wait.h>

Child processes should not return: once they have completed their tasks, they should terminate. This happens automatically when the child process performs a call to a member of the exec... family, but if the child itself remains active, then it must make sure that it terminates properly. A child process normally uses exit to terminate itself, but note that exit prevents the activation of destructors of objects defined at the same or more superficial nesting levels than the level at which exit is called. Destructors of globally defined objects are activated when exit is used. When using exit to terminate childProcess, it should either itself call a support member function defining all nested objects it needs, or it should define all its objects in a compound statement (e.g., using a throw block) calling exit beyond the compound statement.

Parent processes should normally wait for their children to complete. Terminating child processes inform their parents that they are about to terminate by sending a signal that should be caught by their parents. If child processes terminate and their parent processes do not catch those signals then such child processes remain visible as so-called zombie processes.

If parent processes must wait for their children to complete, they may call the member waitForChild. This member returns the exit status of a child process to its parent.

There exists a situation where the child process continues to live, but the parent dies. This is a fairly natural event: parents tend to die before their children do. In our context (i.e. C++), this is called a daemon program. In a daemon the parent process dies and the child program continues to run as a child of the basic init process. Again, when the child eventually dies a signal is sent to its `step-parent' init. This does not create a zombie as init catches the termination signals of all its (step-) children. The construction of a daemon process is very simple, given the availability of the class Fork (cf. section 23.3.2).

23.3.1: Redirection revisited

Earlier, in section 6.6.1 streams were redirected using the ios::rdbuf member function. By assigning the streambuf of a stream to another stream, both stream objects access the same streambuf, thus implementing redirection at the level of the programming language itself.

This may be fine within the context of a C++ program, but once we leave that context the redirection terminates. The operating system does not know about streambuf objects. This situation is encountered, e.g., when a program uses a system call to start a subprogram. The example program at the end of this section uses C++ redirection to redirect the information inserted into cout to a file, and then calls

    system("echo hello world")

to echo a well-known line of text. Since echo writes its information to the standard output, this would be the program's redirected file if the operating system would recognize C++'s redirection.

But redirection doesn't happen. Instead, hello world still appears at the program's standard output and the redirected file is left untouched. To write hello world to the redirected file redirection must be realized at the operating system level. Some operating systems (e.g., Unix and friends) provide system calls like dup and dup2 to accomplish this. Examples of the use of these system calls are given in section 23.3.3.

Here is the example of the failing redirection at the system level following C++ redirection using streambuf redirection:

    #include <iostream>
    #include <fstream>
    #include <cstdlib>
    using namespace std;

    int main()
    {
        ofstream of("outfile");

        streambuf *buf = cout.rdbuf(of.rdbuf());
        cout << "To the of stream\n";
        system("echo hello world");
        cout << "To the of stream\n";
        cout.rdbuf(buf);
    }
    /*
        Generated output: on the file `outfile'

        To the of stream
        To the of stream

        On standard output:

        hello world
    */

23.3.2: The `Daemon' program

Applications exist in which the only purpose of fork is to start a child process. The parent process terminates immediately after spawning the child process. If this happens, the child process continues to run as a child process of init, the always running first process on Unix systems. Such a process is often called a daemon, running as a background process.

Although the next example can easily be constructed as a plain C program, it was included in the C++ Annotations because it is so closely related to the current discussion of the Fork class. I thought about adding a daemon member to that class, but eventually decided against it because the construction of a daemon program is very simple and requires no features other than those currently offered by the class Fork. Here is an example illustrating the construction of such a daemon program. Its child process doesn't do exit but throw 0 which is caught by the catch clause of the child's main function. Doing this ensures that any objects defined by the child process are properly destroyed:

    #include <iostream>
    #include <unistd.h>
    #include "fork.h"

    class Daemon: public Fork
    {
        virtual void parentProcess()        // the parent does nothing.
        {}
        virtual void childProcess()         // actions by the child
        {
            sleep(3);
                                            // just a message...
            std::cout << "Hello from the child process\n";
            throw 0;                        // The child process ends
        }
    };

    int main()
    try
    {
        Daemon().fork();
    }
    catch(...)
    {}

    /*
        Generated output:
    The next command prompt, then after 3 seconds:
    Hello from the child process
    */

23.3.3: The class `Pipe'

Redirection at the system level requires the use of file descriptors, created by the pipe system call. When two processes want to communicate using such file descriptors, the following happens:

The process constructs two associated file descriptors using the pipe system call. One of the file descriptors is used for writing, the other file descriptor is used for reading.
Forking takes place (i.e., the system fork function is called), duplicating the file descriptors. Now we have four file descriptors as the child process and the parent process both have their own copies of the two file descriptors created by pipe.
One process (say, the parent process) uses the filedescriptors for reading. It should close its filedescriptor intended for writing.
The other process (say, the child process) uses the filedescriptors for writing. It should therefore close its filedescriptor intended for reading.
All information written by the child process to the file descriptor intended for writing, can now be read by the parent process from the corresponding file descriptor intended for reading, thus establishing a communication channel between the child- and the parent process.

Though basically simple, errors may easily creep in. Functions of file descriptors available to the two processes (child- or parent-) may easily get mixed up. To prevent bookkeeping errors, the bookkeeping may be properly set up once, to be hidden therafter inside a class like the Pipe class developed here. Let's have a look at its characteristics (before using functions like pipe and dup the compiler must have read the <unistd.h> header file):

The pipe system call expects a pointer to two int values, which will represent, subsequent to the pipe call, the file descriptor used for reading and the file descriptor used for writing, respectively. To avoid confusion, the class Pipe defines an enum having values associating the indices of the array of 2-ints with symbolic constants. The two file descriptors themselves are stored in a data member d_fd. Here is the initial section of the class's interface:
```
    class Pipe
    {
        enum    RW { READ, WRITE };
        int     d_fd[2];
```
The class only needs a default constructor. This constructor calls pipe to create a set of associated file descriptors used for accessing both ends of a pipe:
```
    Pipe::Pipe()
    {
        if (pipe(d_fd))
            throw "Pipe::Pipe(): pipe() failed";
    }
```
The members readOnly and readFrom are used to configure the pipe's reading end. The latter function is used when using redirection. It is provided with an alternate file descriptor to be used for reading from the pipe. Usually this alternate file descriptor is STDIN_FILENO, allowing cin to extract information from the pipe. The former function is merely used to configure the reading end of the pipe. It closes the matching writing end and returns a file descriptor that can be used to read from the pipe:
```
    int Pipe::readOnly()
    {
        close(d_fd[WRITE]);
        return d_fd[READ];
    }
    void Pipe::readFrom(int fd)
    {
        readOnly();

        redirect(d_fd[READ], fd);
        close(d_fd[READ]);
    }
```
writeOnly and two writtenBy members are available to configure the writing end of a pipe. The former function is only used to configure the writing end of the pipe. It closes the reading end, and returns a file descriptor that can be used for writing to the pipe:
```
    int Pipe::writeOnly()
    {
        close(d_fd[READ]);
        return d_fd[WRITE];
    }
    void Pipe::writtenBy(int fd)
    {
        writtenBy(&fd, 1);
    }
    void Pipe::writtenBy(int const *fd, size_t n)
    {
        writeOnly();

        for (size_t idx = 0; idx < n; idx++)
            redirect(d_fd[WRITE], fd[idx]);

        close(d_fd[WRITE]);
    }
```
For the latter member two overloaded versions are available:
- writtenBy(int fileDescriptor) is used to configure single redirection, so that a specific file descriptor (usually STDOUT_FILENO or STDERR_FILENO) can be used to write to the pipe;
- (writtenBy(int *fileDescriptor, size_t n = 2)) may be used to configure multiple redirection, providing an array argument containing file descriptors. Information written to any of these file descriptors is actually written to the pipe.
The class has one private data member, redirect, used to set up redirection through the dup2 system call. This function expects two file descriptors. The first file descriptor represents a file descriptor that can be used to access the device's information; the second file descriptor is an alternate file descriptor that may also be used to access the device's information. Here is redirect's implementation:
```
    void Pipe::redirect(int d_fd, int alternateFd)
    {
        if (dup2(d_fd, alternateFd) < 0)
            throw "Pipe: redirection failed";
    }
```

Now that redirection can be configured easily using one or more Pipe objects, we'll use Fork and Pipe in various example programs.

23.3.4: The class `ParentSlurp'

The class ParentSlurp, derived from Fork, starts a child process executing a stand-along program (like /bin/ls). The (standard) output of the execed program is not shown on the screen but is read by the parent process.

For demonstration purposes the parent process will write the lines it receives to its standard output stream, prepending linenumbers to the lines. It is attractive to redirect the parent's standard input stream to allow the parent to read the output from the child process using its std::cin input stream. Therefore, the only pipe in the program is used as an input pipe for the parent, and an output pipe for the child.

The class ParentSlurp has the following characteristics:

It is derived from Fork. Before starting ParentSlurp's class interface, the compiler must have read fork.h and pipe.h. The class only uses one data member, a Pipe object d_pipe.
As Pipe's constructor already defines a pipe, and as d_pipe is automatically initialized by ParentSlurp's default constructor, which is implicitly provided. All additional members are only there for ParentSlurp's own benefit so they can be defined in the class's (implicit) private section. Here is the class's interface:
```
    class ParentSlurp: public Fork
    {
        Pipe    d_pipe;

        virtual void childRedirections();
        virtual void parentRedirections();
        virtual void childProcess();
        virtual void parentProcess();
    };
```
The childRedirections member configures the writing end of the pipe. So, all information written to the child's standard output stream will end up in the pipe. The big advantage of this is that no additional streams are needed to write to a file descriptor:
```
    inline void ParentSlurp::childRedirections()
    {
        d_pipe.writtenBy(STDOUT_FILENO);
    }
```
The parentRedirections member, configures the reading end of the pipe. It does so by connecting the reading end of the pipe to the parent's standard input file descriptor (STDIN_FILENO). This allows the parent to perform extractions from cin, not requiring any additional streams for reading.
```
    inline void ParentSlurp::parentRedirections()
    {
        d_pipe.readFrom(STDIN_FILENO);
    }
```
The childProcess member only needs to concentrate on its own actions. As it only needs to execute a program (writing information to its standard output), the member can consist of one single statement:
```
    inline void ParentSlurp::childProcess()
    {
        execl("/bin/ls", "/bin/ls", 0);
    }
```

The parentProcess member simply `slurps' the information appearing at its standard input. Doing so, it actually reads the child's output. It copies the received lines to its standard output stream prefixing line numbers to them:

    void ParentSlurp::parentProcess()
    {
        std::string     line;
        size_t    nr = 1;

        while (getline(std::cin, line))
            std::cout << nr++ << ": " << line << '\n';

        waitForChild();
    }

The following program simply constructs a ParentSlurp object, and calls its fork() member. Its output consists of a numbered list of files in the directory where the program is started. Note that the program also needs the fork.o, pipe.o and waitforchild.o object files (see earlier sources):

    int main()
    {
        ParentSlurp().fork();
    }
    /*
        Generated Output (example only, actually obtained output may differ):

        1: a.out
        2: bitand.h
        3: bitfunctional
        4: bitnot.h
        5: daemon.cc
        6: fdinseek.cc
        7: fdinseek.h
        ...
    */

23.3.5: Communicating with multiple children

The next step up the ladder is the construction of a child-process monitor. Here, the parent process is responsible for all its child processes, but it also must read their standard output. The user enters information at the standard input of the parent process. A simple command language is used for this:

start: this start a new child process. The parent returns the child's ID (a number) to the user. The ID is thereupon be used to identify a particular child process
<nr> text sends ``text'' to the child process having ID <nr>;
stop <nr> terminates the child process having ID <nr>;
exit terminates the parent as well as all its child processes.

The child process that hasn't received text for some time will complain by sending a message to the parent-process. Those messages are simply transmitted to the user by copying them to the standard output stream.

A problem with programs like our monitor is that they programs allow asynchronous input from multiple sources. Input may appear at the standard input as well as at the input-sides of pipes. Also, multiple output channels are used. To handle situations like these, the select system call was developed.

23.3.5.1: The class `Select'

The select system call was developed to handle asynchronous I/O multiplexing. The select system call is used to handle, e.g., input appearing simultaneously at a set of file descriptors.

The select function is rather complex, and its full discussion is beyond the C++ Annotations' scope. By encapsulating select in a class Selector, hiding its details and offering an intuitively attractive interface, its use is simplified. The Selector class has these features:

Efficiency. As most of Select's members are very small, most members can be implemented inline. The class requires quite a few data members. Most of these data members belong to types that require some system headers to be included first:
```
    #include <limits.h>
    #include <unistd.h>
    #include <sys/time.h>
    #include <sys/types.h>
```

The class interface can now be defined. The data type fd_set is a type designed to be used by select and variables of this type contain the set of file descriptors on which select may sense some activity. Furthermore, select allows us to fire an asynchronous alarm. To set the alarm time, the class Selector defines a timeval data member. Other members are used for internal bookkeeping purposes. Here is the class Selector's interface:

    class Selector
    {
        fd_set          d_read;
        fd_set          d_write;
        fd_set          d_except;
        fd_set          d_ret_read;
        fd_set          d_ret_write;
        fd_set          d_ret_except;
        timeval         d_alarm;
        int             d_max;
        int             d_ret;
        int             d_readidx;
        int             d_writeidx;
        int             d_exceptidx;

        public:
            Selector();

            int exceptFd();
            int nReady();
            int readFd();
            int wait();
            int writeFd();
            void addExceptFd(int fd);
            void addReadFd(int fd);
            void addWriteFd(int fd);
            void noAlarm();
            void rmExceptFd(int fd);
            void rmReadFd(int fd);
            void rmWriteFd(int fd);
            void setAlarm(int sec, int usec = 0);

        private:
            int checkSet(int *index, fd_set &set);
            void addFd(fd_set *set, int fd);
    };

Selector's member functions serve the following tasks:

Selector(): the (default) constructor. It clears the read, write, and execute fd_set variables, and switches off the alarm. Except for d_max, the remaining data members do not require specific initializations:
```
    Selector::Selector()
    {
        FD_ZERO(&d_read);
        FD_ZERO(&d_write);
        FD_ZERO(&d_except);
        noAlarm();
        d_max = 0;
    }
```

int wait(): this member blocks until the alarm times out or until activity is sensed at any of the file descriptors monitored by the Selector object. It throws an exception when the select system call itself fails:

    int Selector::wait()
    {
        timeval t = d_alarm;

        d_ret_read = d_read;
        d_ret_write = d_write;
        d_ret_except = d_except;

        d_readidx = 0;
        d_writeidx = 0;
        d_exceptidx = 0;

        d_ret = select(d_max, &d_ret_read, &d_ret_write, &d_ret_except,
                       t.tv_sec == -1 && t.tv_usec == -1 ? 0 : &t);

        if (d_ret < 0)
            throw "Selector::wait()/select() failed";

        return d_ret;
    }

int nReady: this member function's return value is only defined when wait has returned. In that case it returns 0 for an alarm-timeout, -1 if select failed, and otherwise the number of file descriptors on which activity was sensed:
```
    inline int Selector::nReady()
    {
        return d_ret;
    }
```
int readFd(): this member function's return value is also only defined after wait has returned. Its return value is -1 if no (more) input file descriptors are available. Otherwise the next file descriptor available for reading is returned:
```
    inline int Selector::readFd()
    {
        return checkSet(&d_readidx, d_ret_read);
    }
```
int writeFd(): operating analogously to readFd, it returns the next file descriptor to which output is written. It uses d_writeidx and d_ret_read and is implemented analogously to readFd;
int exceptFd(): operating analogously to readFd, it returns the next exception file descriptor on which activity was sensed. It uses d_except_idx and d_ret_except and is implemented analogously to readFd;
void setAlarm(int sec, int usec = 0): this member activates Select's alarm facility. At least the number of seconds to wait for the alarm to go off must be specified. It simply assigns values to d_alarm's fields. At the next Select::wait call, the alarm will fire (i.e., wait returns with return value 0) once the configured alarm-interval has passed:
```
    inline void Selector::setAlarm(int sec, int usec)
    {
        d_alarm.tv_sec  = sec;
        d_alarm.tv_usec = usec;
    }
```
void noAlarm(): this member switches off the alarm, by simply setting the alarm interval to a very long period:
```
    inline void Selector::noAlarm()
    {
        setAlarm(-1, -1);
    }
```
void addReadFd(int fd): this member adds a file descriptor to the set of input file descriptors monitored by the Selector object. The member function wait will return once input is available at the indicated file descriptor:
```
    inline void Selector::addReadFd(int fd)
    {
        addFd(&d_read, fd);
    }
```
void addWriteFd(int fd): this member adds a file descriptor to the set of output file descriptors monitored by the Selector object. The member function wait will return once output is available at the indicated file descriptor. Using d_write, it is implemented analogously to addReadFd;
void addExceptFd(int fd): this member adds a file descriptor to the set of exception file descriptors to be monitored by the Selector object. The member function wait will return once activity is sensed at the indicated file descriptor. Using d_except, it is implemented analogously to addReadFd;
void rmReadFd(int fd): this member removes a file descriptor from the set of input file descriptors monitored by the Selector object:
```
    inline void Selector::rmReadFd(int fd)
    {
        FD_CLR(fd, &d_read);
    }
```
void rmWriteFd(int fd): this member removes a file descriptor from the set of output file descriptors monitored by the Selector object. Using d_write, it is implemented analogously to rmReadFd;
void rmExceptFd(int fd): this member removes a file descriptor from the set of exception file descriptors to be monitored by the Selector object. Using d_except, it is implemented analogously to rmReadFd;

The class's remaining (two) members are support members, and should not be used by non-member functions. Therefore, they are declared in the class's private section:

The member addFd adds a file descriptor to a fd_set:

    void Selector::addFd(fd_set *set, int fd)
    {
        FD_SET(fd, set);
        if (fd >= d_max)
            d_max = fd + 1;
    }

The member checkSet tests whether a file descriptor (*index) is found in a fd_set:

    int Selector::checkSet(int *index, fd_set &set)
    {
        int &idx = *index;

        while (idx < d_max && !FD_ISSET(idx, &set))
            ++idx;

        return idx == d_max ? -1 : idx++;
    }

23.3.5.2: The class `Monitor'

The monitor program uses a Monitor object doing most of the work. The class Monitor's public interface only offers a default constructor and one member, run, to perform its tasks. All other member functions are located in the class's private section.

Monitor defines the private enum Commands, symbolically listing the various commands its input language supports, as well as several data members. Among the data members are a Selector object and a map using child order numbers as its keys and pointer to Child objects (see section 23.3.5.3) as its values. Furthermore, Monitor has a static array member s_handler[], storing pointers to member functions handling user commands.

A destructor should be implemented as well, but its implementation is left as an exercise to the reader. Here is Monitor's interface, including the interface of the nested class Find that is used to create a function object:

    class Monitor
    {
        enum Commands
        {
            UNKNOWN,
            START,
            EXIT,
            STOP,
            TEXT,
            sizeofCommands
        };

        typedef std::map<int, std::shared_ptr<Child>> MapIntChild;

        friend class Find;
        class Find
        {
            int     d_nr;
            public:
                Find(int nr);
                bool operator()(MapIntChild::value_type &vt) const;
        };

        Selector    d_selector;
        int         d_nr;
        MapIntChild d_child;

        static void (Monitor::*s_handler[])(int, std::string const &);
        static int s_initialize;

        public:
            enum Done
            {};

            Monitor();
            void run();

        private:
            static void killChild(MapIntChild::value_type it);
            static int initialize();

            Commands    next(int *value, std::string *line);
            void    processInput();
            void    processChild(int fd);

            void    createNewChild(int, std::string const &);
            void    exiting(int = 0, std::string const &msg = std::string());
            void    sendChild(int value, std::string const &line);
            void    stopChild(int value, std::string const &);
            void    unknown(int, std::string const &);
    };

Since there's only one non-class type data member, the class's constructor could be implemented inline. The array s_handler, storing pointers to functions needs to be initialized as well. This can be accomplished in several ways:

Since the Command enumeration only contains a fairly limited set of commands, compile-time initialization could be considered:
```
    void (Monitor::*Monitor::s_handler[])(int, string const &)  =
    {
        &Monitor::unknown,              // order follows enum Command's
        &Monitor::createNewChild,       // elements
        &Monitor::exiting,
        &Monitor::stopChild,
        &Monitor::sendChild,
    };
```
The advantage of this is that it's simple, not requiring any run-time effort. The disadvantage is of course relatively complex maintenance. If for some reason Commands is modified, s_handler must be modified as well. In cases like these, compile-time initialization often is asking for trouble. There is a simple alternative though.
Looking at Monitor's interface we see a static data member s_initialize and a static member function initialize. The static member function handles the initialization of the s_handler array. It explicitly assigns the array's elements and any modification in ordering of the enum Command's values is automatically accounted for by recompiling initialize:
```
    void (Monitor::*Monitor::s_handler[sizeofCommands])(int, string const &);

    int Monitor::initialize()
    {
        s_handler[UNKNOWN] =    &Monitor::unknown;
        s_handler[START] =      &Monitor::createNewChild;
        s_handler[EXIT] =       &Monitor::exiting;
        s_handler[STOP] =       &Monitor::stopChild;
        s_handler[TEXT] =       &Monitor::sendChild;
        return 0;
    }
```
The member initialize is a static member and so it can be called to initialize s_initialize, a static int variable. The initialization is enforced by placing the initialization statement in the source file of a function that is known to be executed. It could be main, but if we're Monitor's maintainers and only have control over the library containing Monitor's code then that's not an option. In those cases the source file containing the destructor is a very good candidate. If a class has only one constructor and it's not defined inline then the constructor's source file is a good candidate as well. In Monitor's current implementation the initialization statement is put in run's source file, reasoning that s_handler is only needed when run is used.

Monitor's constructor is a very simple function and may be implemented inline:

    inline Monitor::Monitor()
    :
        d_nr(0)
    {}

The core of Monitor's activities are performed by run. It performs the following tasks:

Initially, the Monitor object only monitors its standard input. The set of input file descriptors to which d_selector listens is initialized to STDIN_FILENO.
Then, in a loop d_selector's wait function is called. If input on cin is available, it is processed by processInput. Otherwise, the input has arrived from a child process. Information sent by children is processed by processChild.
To prevent zombies, the child processes must catch their children's termination signals. This is discussed below (in an earlier version Monitor caught the termination signals. As noted by Ben Simons (ben at mrxfx dot com) this is inappropriate. Instead, the process spawning child processes has that responsibility (so, the parent process is responsible for its child processes; a child process is in turn responsible for its own child processes). Thanks, Ben).
As stated, run's source file also defines and initializes s_initialize to ensure the proper initialization of the s_handler array.

Here is run's implementation and s_initialize's definition:

    #include "monitor.ih"

    int Monitor::s_initialize = Monitor::initialize();

    void Monitor::run()
    {
        d_selector.addReadFd(STDIN_FILENO);

        while (true)
        {
            cout << "? " << flush;
            try
            {
                d_selector.wait();

                int fd;
                while ((fd = d_selector.readFd()) != -1)
                {
                    if (fd == STDIN_FILENO)
                        processInput();
                    else
                        processChild(fd);
                }
                cout << "NEXT ...\n";
            }
            catch (char const *msg)
            {
                exiting(1, msg);
            }
        }
    }

The member function processInput reads the commands entered by the user using the program's standard input stream. The member itself is rather simple. It calls next to obtain the next command entered by the user, and then calls the corresponding function using the matching element of the s_handler[] array. Here are the members processInput and next:

    void Monitor::processInput()
    {
        string line;
        int    value;
        Commands cmd = next(&value, &line);
        (this->*s_handler[cmd])(value, line);
    }

    Monitor::Commands Monitor::next(int *value, string *line)
    {
        if (!getline(cin, *line))
            exiting(1, "Command::next(): reading cin failed");

        if (*line == "start")
            return START;

        if (*line == "exit" || *line == "quit")
        {
            *value = 0;
            return EXIT;
        }

        if (line->find("stop") == 0)
        {
            istringstream istr(line->substr(4));
            istr >> *value;
            return !istr ? UNKNOWN : STOP;
        }

        istringstream istr(line->c_str());
        istr >> *value;
        if (istr)
        {
            getline(istr, *line);
            return TEXT;
        }

        return UNKNOWN;
    }

All other input sensed by d_select is created by child processes. Because d_select's readFd member returns the corresponding input file descriptor, this descriptor can be passed to processChild. Using a IFdStreambuf (see section 23.1.2.1), its information is read from an input stream. The communication protocol used here is rather basic. For every line of input sent to a child, the child replies by sending back exactly one line of text. This line is then read by processChild:

    void Monitor::processChild(int fd)
    {
        IFdStreambuf ifdbuf(fd);
        istream istr(&ifdbuf);
        string line;

        getline(istr, line);
        cout << d_child[fd]->pid() << ": " << line << '\n';
    }

The construction d_child[fd]->pid() used in the above source deserves some special attention. Monitor defines the data member map<int, shared_ptr<Child>> d_child. This map contains the child's order number as its key, and a (shared) pointer to the Child object as its value. A shared pointer is used here, rather than a Child object, since we want to use the facilities offered by the map, but don't want to copy a Child object time and again.

Now that run's implementation has been covered, we'll concentrate on the various commands users might enter:

When the start command is issued, a new child process is started. A new element is added to d_child by the member createNewChild. Next, the Child object should start its activities, but the Monitor object can not wait for the child process to complete its activities, as there is no well-defined endpoint in the near future, and the user will probably want to enter more commands. Therefore, the Child process must run as a daemon. So the forked process terminates immediately, but its own child process will continue to run (in the background). Consequently, createNewChild calls the child's fork member. Although it is the child's fork function that is called, it is still the monitor program wherein that fork function is called. So, the monitor program is duplicated by fork. Execution then continues:
- At the Child's parentProcess in its parent process;
- At the Child's childProcess in its child process
As the Child's parentProcess is an empty function, returning immediately, the Child's parent process effectively continues immediately below createNewChild's cp->fork() statement. As the child process never returns (see section 23.3.5.3), the code below cp->fork() is never executed by the Child's child process. This is exactly as it should be.
In the parent process, createNewChild's remaining code simply adds the file descriptor that's available for reading information from the child to the set of input file descriptors monitored by d_select, and uses d_child to establish the association between that file descriptor and the Child object's address:
```
    void Monitor::createNewChild(int, string const &)
    {
        Child *cp = new Child(++d_nr);

        cp->fork();

        int fd = cp->readFd();

        d_selector.addReadFd(fd);
        d_child[fd].reset(cp);

        cerr << "Child " << d_nr << " started\n";
    }
```
Direct communication with the child is required for the stop <nr> and <nr> text commands. The former command terminates child process <nr>, by calling stopChild. This function locates the child process having the order number using an anonymous object of the class Find, nested inside Monitor. The class Find simply compares the provided nr with the children's order number returned by their nr members:
```
    inline Monitor::Find::Find(int nr)
    :
        d_nr(nr)
    {}
    inline bool Monitor::Find::operator()(MapIntChild::value_type &vt) const
    {
        return d_nr == vt.second->nr();
    }
```
If the child process having order number nr was found, its file descriptor is removed from d_selector's set of input file descriptors. Then the child process itself is terminated by the static member killChild. The member killChild is declared as a static member function, as it is used as function argument of the for_each generic algorithm by erase (see below). Here is killChild's implementation:
```
    void Monitor::killChild(MapIntChild::value_type it)
    {
        if (kill(it.second->pid(), SIGTERM))
            cerr << "Couldn't kill process " << it.second->pid() << '\n';

        // reap defunct child process
        int status = 0;
        while( waitpid( it.second->pid(), &status, WNOHANG) > -1)
            ;
    }
```
Having terminated the specified child process, the corresponding Child object is destroyed and its pointer is removed from d_child:
```
    void Monitor::stopChild(int nr, string const &)
    {
        auto it = find_if(d_child.begin(), d_child.end(), Find(nr));

        if (it == d_child.end())
            cerr << "No child number " << nr << '\n';
        else
        {
            d_selector.rmReadFd(it->second->readFd());
            d_child.erase(it);
        }
    }
```

The command <nr> text> sends text to child process nr using the member function sendChild. This function too, will use a Find object to locate the process having order number nr, and then simply inserts the text into the writing end of a pipe connected to the indicated child process:

    void Monitor::sendChild(int nr, string const &line)
    {
        auto it = find_if(d_child.begin(), d_child.end(), Find(nr));

        if (it == d_child.end())
            cerr << "No child number " << nr << '\n';
        else
        {
            OFdnStreambuf ofdn(it->second->writeFd());
            ostream out(&ofdn);

            out << line << '\n';
        }
    }

When users enter exit or quit the member exiting is called. It terminates all child processes using the for_each generic algorithm (see section 19.1.17) to visit all elements of d_child. Then the program itself ends:

    void Monitor::exiting(int value, string const &msg)
    {
        for_each(d_child.begin(), d_child.end(), killChild);
        if (msg.length())
            cerr << msg << '\n';
        throw value;
    }

The program's main function is simple and needs no further comment:

    int main()
    try
    {
        Monitor().run();
    }
    catch (int exitValue)
    {
        return exitValue;
    }

23.3.5.3: The class `Child'

When the Monitor object starts a child process, it creates an object of the class Child. The Child class is derived from the class Fork, allowing it to operate as a daemon (as discussed in the previous section). Since Child is a daemon class, we know that its parent process must be defined as an empty function. Its childProcess member has a non-empty implementation. Here are the characteristics of the class Child:

The Child class has two Pipe data members, to handle communications between its own child- and parent processes. As these pipes are used by the Child's child process, their names refer to the child process. The child process reads from d_in, and writes to d_out. Here is the interface of the class Child:

    class Child: public Fork
    {
        Pipe                d_in;
        Pipe                d_out;

        int         d_parentReadFd;
        int         d_parentWriteFd;
        int         d_nr;

        public:
            Child(int nr);
            virtual ~Child();
            int readFd() const;
            int writeFd() const;
            int pid() const;
            int nr() const;
        private:
            virtual void childRedirections();
            virtual void parentRedirections();
            virtual void childProcess();
            virtual void parentProcess();
    };

The Child's constructor simply stores its argument, a child-process order number, in its own d_nr data member:
```
    inline Child::Child(int nr)
    :
        d_nr(nr)
    {}
```
The Child's child process obtains commands from its standard input stream and writes its output to its standard output stream. Since the actual communication channels are pipes, redirections must be used. The childRedirections member looks like this:
```
    void Child::childRedirections()
    {
        d_in.readFrom(STDIN_FILENO);
        d_out.writtenBy(STDOUT_FILENO);
    }
```
Although the parent process performs no actions, it must configure some redirections. Realizing that the names of the pipes indicate their functions in the child process. So the parent writes to d_in and reads from d_out. Here is parentRedirections:
```
    void Child::parentRedirections()
    {
        d_parentReadFd = d_out.readOnly();
        d_parentWriteFd = d_in.writeOnly();
    }
```
The Child object exists until it is destroyed by the Monitor's stopChild member. By allowing its creator, the Monitor object, to access the parent-side ends of the pipes, the Monitor object can communicate with the Child's child process via those pipe-ends. The members readFd and writeFd allow the Monitor object to access these pipe-ends:
```
    inline int Child::readFd() const
    {
        return d_parentReadFd;
    }
    inline int Child::writeFd() const
    {
        return d_parentWriteFd;
    }
```

The Child object's child process performs two tasks:

It must reply to information appearing at its standard input stream;
If no information has appeared within a certain time frame (the implementations uses an interval of five seconds), then a message is written to its standard output stream.

To implement this behavior, childProcess defines a local Selector object, adding STDIN_FILENO to its set of monitored input file descriptors.

Then, in an endless loop, childProcess waits for selector.wait() to return. When the alarm goes off it sends a message to its standard output (hence, into the writing pipe). Otherwise, it will echo the messages appearing at its standard input to its standard output. Here is the childProcess member:

    void Child::childProcess()
    {
        Selector    selector;
        size_t    message = 0;

        selector.addReadFd(STDIN_FILENO);
        selector.setAlarm(5);

        while (true)
        {
            try
            {
                if (!selector.wait())       // timeout
                    cout << "Child " << d_nr << ": standing by\n";
                else
                {
                    string  line;
                    getline(cin, line);
                    cout << "Child " << d_nr << ":" << ++message << ": " <<
                                                        line << '\n';
                }
            }
            catch (...)
            {
                    cout << "Child " << d_nr << ":" << ++message << ": " <<
                                "select() failed" << '\n';
            }
        }
        exit(0);
    }

Two accessors are defined allowing the Monitor object to obtain the Child's process ID and its order number:

    inline int Child::pid() const
    {
        return Fork::pid();
    }
    inline int Child::nr() const
    {
        return d_nr;
    }

A Child process terminates when the user enters a stop command. When an existing child process number was entered, the corresponding Child object is removed from Monitor's d_child map. As a result, its destructor is called. Child's destructor calls kill to terminate its child, and then waits for the child to terminate. Once its child has terminated, the destructor has completed its work and returns, thus completing the erasure from d_child. The current implementation fails if the child process doesn't react to the SIGTERM signal. In this demonstration program this does not happen. In `real life' more elaborate killing-procedures may be required (e.g., using SIGKILL in addition to SIGTERM). As discussed in section 9.11 it is important to ensure the proper destruction. Here is the Child's destructor:
```
    Child::~Child()
    {
        if (pid())
        {
            cout << "Killing process " << pid() << "\n";
            kill(pid(), SIGTERM);
            int status;
            wait(&status);
        }
    }
```

23.4: Function objects performing bitwise operations

In section 18.1 several predefined function objects were introduced. Predefined function objects performing arithmetic operations, relational operations, and logical operations exist, corresponding to a multitude of binary- and unary operators.

Some operators appear to be missing: there appear to be no predefined function objects corresponding to bitwise operations. However, their construction is, given the available predefined function objects, not difficult. The following examples show a class template implementing a function object calling the bitwise and ( operator&), and a template class implementing a function object calling the unary not ( operator~). It is left to the reader to construct similar function objects for other operators.

Here is the implementation of a function object calling the bitwise operator&:

    #include <functional>

    template <typename _Tp>
    struct bit_and: public std::binary_function<_Tp, _Tp, _Tp>
    {
        _Tp operator()(_Tp const &__x, _Tp const &__y) const
        {
            return __x & __y;
        }
    };

Here is the implementation of a function object calling operator~():

    #include <functional>

    template <typename _Tp>
    struct bit_not: public std::unary_function<_Tp, _Tp>
    {
        _Tp operator()(_Tp const &__x) const
        {
            return ~__x;
        }
    };

These and other missing predefined function objects are also implemented in the file bitfunctional, which is found in the cplusplus.yo.zip archive. These classes are derived from existing class templates (e.g., std::binary_function and std::unary_function). These base classes define several types which are expected (used) by various generic algorithms as defined in the STL (cf. chapter 19), thus following the advice offered in, e.g., the C++ header file bits/stl_function.h:

   *  The standard functors are derived from structs named unary_function
   *  and binary_function.  These two classes contain nothing but typedefs,
   *  to aid in generic (template) programming.  If you write your own
   *  functors, you might consider doing the same.

Here is an example using bit_and, removing all odd numbers from a vector of int values:

    #include <iostream>
    #include <algorithm>
    #include <vector>
    #include "bitand.h"
    using namespace std;

    int main()
    {
        vector<int> vi;

        for (int idx = 0; idx < 10; ++idx)
            vi.push_back(idx);

        copy
        (
            vi.begin(),
            remove_if(vi.begin(), vi.end(), bind2nd(bit_and<int>(), 1)),
            ostream_iterator<int>(cout, " ")
        );
        cout << '\n';
    }
    /*
        Generated output:

        0 2 4 6 8
    */

23.5: A text to anything converter

The standard C library offers conversion functions like atoi, atol, and other functions that can be used to convert ASCII-Z strings to numeric values. In C++, these functions are still available, but a more type safe way to convert text to other types uses objects of the class std::istringsteam.

Using the class istringstream instead of the C standard conversion functions may have the advantage of type-safety, but it also appears to be a rather cumbersome alternative. After all, we first have to construct and initialize a std::istringstream object before we're able to extract a value of some type from it. This requires us to use a variable. Then, in cases where the extracted value is only needed to initialize some function-parameter, one might wonder whether the added variable and the istringstream construction can somehow be avoided.

In this section we'll develop a class ( A2x) preventing all the disadvantages of the standard C library functions, without requiring the cumbersome definitions of istringstream objects over and over again. The class is called A2x, meaning ` ascii to anything'.

A2x objects can be used to extract values of any type extractable from std::istream objects. Since A2x represents the object-variant of the C functions, it is not only type-safe but also extensible. So its use is greatly preferred over using the standard C functions. Here are its characteristics:

A2x is derived from std::istringstream, and so all members of the class istringstream are available for A2x objects. Extractions of values of variables can always effortlessly be performed. Here's the class's interface:

    class A2x: public std::istringstream
    {
        public:
            A2x() = default;
            A2x(char const *txt);
            A2x(std::string const &str);

            template <typename Type>
            operator Type();

            template <typename Type>
            Type to();

            A2x &operator=(char const *txt);

            A2x &operator=(std::string const &str);
            A2x &operator=(A2x const &other);
    };

A2x has a default constructor and a constructor expecting a std::string argument. The latter constructor may be used to initialize A2x objects with text to be converted (e.g., a line of text obtained from reading a configuration file):
```
    inline A2x::A2x(char const *txt)        // initialize from text
    :
        std::istringstream(txt)
    {}

    inline A2x::A2x(std::string const &str)
    :
        std::istringstream(str.c_str())
    {}
```
A2x's real strength comes from its operator Type() conversion member template. As it is a member template, it will automatically adapt itself to the type of the variable that should be given a value, obtained by converting the text stored inside the A2x object to the variable's type. When the extraction fails, A2x's inherited good member returns false.
Occasionally the compiler may not be able to determine which type to convert to. In that case, an explicit template type could be used:
```
            A2x.operator int<int>();
            // or just:
            A2x.operator int();
```
As neither syntax looks attractive, the member template to is provided too, allowing constructions like:
```
    A2x.to<int>();
```
Here is its implementation:
```
    template <typename Type>
    inline Type A2x::to()
    {
        Type t;

        return (*this >> t) ? t : Type();
    }
```

The to member makes it easy to implement operator Type():

    template <typename Type>
    inline A2x::operator Type()
    {
        return to<Type>();
    }

Once an A2x object is available, it may be reinitialized using operator=:

#include "a2x.h"

A2x &A2x::operator=(char const *txt)
{
    clear();        // very important!!! If a conversion failed, the object
                    // remains useless until executing this statement
    str(txt);
    return *this;
}

Here are some examples showing A2x being used:

    int x = A2x("12");          // initialize int x from a string "12"
    A2x a2x("12.50");           // explicitly create an A2x object

    double d;
    d = a2x;                    // assign a variable using an A2x object
    cout << d << '\n';

    a2x = "err";
    d = a2x;                    // d is 0: the conversion failed,
    cout << d << '\n';          // and a2x.good() == false

    a2x = " a";                 // reassign a2x to new text
    char c = a2x;               // c now 'a': internally operator>>() is used
    cout << c << '\n';          // so initial blanks are skipped.

    int expectsInt(int x);      // initialize a parameter using an
    expectsInt(A2x("1200"));    // anonymous A2x object

    d = A2x("12.45").to<int>(); // d is 12, not 12.45
    cout << d << '\n';

A complementary class ( X2a), converting values to text, can be constructed as well. Its construction is left as an exercise to the reader.

23.6: Distinguishing lvalues from rvalues with operator[]()

A problem with operator[] is that it can't distinguish between its use as an lvalue and as an rvalue. It is a familiar misconception to think that since

    Type const &operator[](size_t index) const

is used as rvalue (since the object isn't modified) that that

    Type &operator[](size_t index)

is used as lvalue (since the returned value can be modified). In fact, the compiler distinguishes between the two operators only by the const-status of the object for which operator[] is called. With const objects the former operator is called, with non-const objects the latter is always used. It is always used, irrespective of it being used as lvalue or rvalue.

Being able to distinguish between lvalues and rvalues can be very useful. Consider the situation where a class supporting operator[] stores data of a type that is very hard to copy. With data like that reference counting (e.g., using shared_ptrs) is probably used to prevent needless copying.

As long as operator[] is used as rvalue there's no need to copy the data, but the information must be copied if it is used as lvalue.

The Proxy Design Pattern (cf. Gamma et al. (1995)) can be used to distinguish between lvalues and rvalues. With the Proxy Design Pattern an object of another class (the Proxy class) is used to acts as a stand in for the `real thing'. The proxy class offers functionality that cannot be offered by the data themselves, like distinguishing between its use as lvalue or rvalue. A proxy class can be used in many situations where access to the real date cannot or should not be directly provided. In this regard iterator types are examples of proxy classes as they create a layer between the real data and the software using the data. Proxy classes could also dereference pointers in a class storing its data by pointers.

In this section we concentrate on the distinction between using operator[] as lvalue and rvalue. Let's assume we have a class Lines storing lines from a file. It's constructor expects the name of a stream from which the lines are read and it offers a non-const operator[] that can be used as lvalue or rvalue (the const version of operator[] is omitted as it offers no problem because it is always used as rvalue):

    class Lines
    {
        std::vector<std::string> d_line;

        public:
            Lines(std::istream &in);
            std::std::string &operator[](size_t idx);
    };

To distinguish between lvalues and rvalues we must find distinguishing characteristics of lvalues and rvalues that we can exploit. Such distinguishing characteristics are operator= (which is always used as lvalue) and the conversion operator (which is always used as rvalue). Rather than having operator[] return a string & we can let it return a Proxy object that is able to distinguish between its use as lvalue and rvalue.

The class Proxy thus needs operator=(string const &other) (acting as lvalue) and operator std::string const &() const (acting as rvalue). Do we need more operators? The std::string class also offers operator+=, so we should probably implement that operator as well. Plain characters can also be assigned to string objects (even using their numeric values). As string objects cannot be constructed from plain characters promotion cannot be used with operator=(string const &other) if the right-hand side argument is a character. Implementing operator=(char value) could therefore also be considered. These additional operators are left out of the current implementation but `real life' proxy classes should consider implementing these additional operators as well. Another subtlety is that Proxy's operator std::string const &() const will not be used when using ostream's insertion operator or istream's extraction operator as these operators are implemented as templates not recognizing our Proxy class type. So when stream insertion and extraction is required (it probably is) then Proxy must be given its own overloaded insertion and extraction operator. Here is an implementation of the overloaded insertion operator inserting the object for which Proxy is a stand-in:

inline std::ostream &operator<<(std::ostream &out, Lines::Proxy const &proxy)
{
    return out << static_cast<std::string const &>(proxy);
}

There's no need for any code (except Lines) to create or copy Proxy objects. Proxy's constructor should therefore be made private, and Proxy can declare Lines to be its friend. In fact, Proxy is intimately related to Lines and can be defined as a nested class. In the revised Lines class operator[] no longer returns a string but instead a Proxy is returned. Here is the revised Lines class, including its nested Proxy class:

    class Lines
    {
        std::vector<std::string> d_line;

        public:
            class Proxy;
            Proxy operator[](size_t idx);
            class Proxy
            {
                friend Proxy Lines::operator[](size_t idx);
                std::string &d_str;
                Proxy(std::string &str);
                public:
                    std::string &operator=(std::string const &rhs);
                    operator std::string const &() const;
            };
            Lines(std::istream &in);
    };

Proxy's members are very lightweight and can usually be implemented inline:

    inline Lines::Proxy::Proxy(std::string &str)
    :
        d_str(str)
    {}
    inline std::string &Lines::Proxy::operator=(std::string const &rhs)
    {
        return d_str = rhs;
    }
    inline Lines::Proxy::operator std::string const &() const
    {
        return d_str;
    }

The member Lines::operator[] can also be implemented inline: it merely returns a Proxy object initialized with the idx+sup(th) string.

Now that the class Proxy has been developed it can be used in a program. Here is an example using the Proxy object as lvalue or rvalue. On the surface Lines objects won't behave differently from Lines objects using the original implementation, but adding an identifying cout statement to Proxy's members will show that operator[] will behave differently when used as lvalue or as rvalue:

    int main()
    {
        ifstream in("lines.cc");
        Lines lines(in);

        string s = lines[0];        // rvalue use
        lines[0] = s;               // lvalue use
        cout << lines[0] << '\n';   // rvalue use
        lines[0] = "hello world";   // lvalue use
        cout << lines[0] << '\n';   // rvalue use
    }

23.7: Implementing a `reverse_iterator'

In section 21.12.1 the construction of iterators and reverse iteraters was discussed. In that section the iterator was constructed as an inner class in a class derived from a vector of pointers to strings.

An object of this nested iterator class handles the dereferencing of the pointers stored in the vector. This allowed us to sort the strings pointed to by the vector's elements rather than the pointers.

A drawback of this is that the class implementing the iterator is closely tied to the derived class as the iterator class was implemented as a nested class. What if we would like to provide any class derived from a container class storing pointers with an iterator handling pointer-dereferencing?

In this section a variant of the earlier (nested class) approach is discussed. Here the iterator class is defined as a class template, not only parameterizing the data type to which the container's elements point but also the container's iterator type itself. Once again, we will concentrate on developing a RandomIterator as it is the most complex iterator type.

Our class is named RandomPtrIterator, indicating that it is a random iterator operating on pointer values. The class template defines three template type parameters:

The first parameter specifies the derived class type (Class). Like before, RandomPtrIterator's constructor is private. Therefore friend declarations are needed to allow client classes to construct RandomPtrIterators. However, a friend class Class cannot be used as template parameter types cannot be used in friend class ... declarations. But this is a minor problem as not every member of the client class needs to construct iterators. In fact, only Class's begin and end members must construct iterators. Using the template's first parameter, friend declarations can be specified for the client's begin and end members.
The second template parameter parameterizes the container's iterator type (BaseIterator);
The third template parameter indicates the data type to which the pointers point (Type).

RandomPtrIterator has one private data member, a BaseIterator. Here is the class interface and the constructor's implementation:

    #include <iterator>

    template <typename Class, typename BaseIterator, typename Type>
    class RandomPtrIterator:
          public std::iterator<std::random_access_iterator_tag, Type>
    {
        friend RandomPtrIterator<Class, BaseIterator, Type> Class::begin();
        friend RandomPtrIterator<Class, BaseIterator, Type> Class::end();

        BaseIterator d_current;

        RandomPtrIterator(BaseIterator const &current);

        public:
            bool operator!=(RandomPtrIterator const &other) const;
            int operator-(RandomPtrIterator const &rhs) const;
            RandomPtrIterator const operator+(int step) const;
            Type &operator*() const;
            bool operator<(RandomPtrIterator const &other) const;
            RandomPtrIterator &operator--();
            RandomPtrIterator const operator--(int);
            RandomPtrIterator &operator++();
            RandomPtrIterator const operator++(int);
            bool operator==(RandomPtrIterator const &other) const;
            RandomPtrIterator const operator-(int step) const;
            RandomPtrIterator &operator-=(int step);
            RandomPtrIterator &operator+=(int step);
            Type *operator->() const;
    };

    template <typename Class, typename BaseIterator, typename Type>
    RandomPtrIterator<Class, BaseIterator, Type>::RandomPtrIterator(
                                    BaseIterator const &current)
    :
        d_current(current)
    {}

Looking at its friend declarations, we see that the members begin and end of a class Class, returning a RandomPtrIterator object for the types Class, BaseIterator and Type are granted access to RandomPtrIterator's private constructor. That is exactly what we want. Begin and end are declared as bound friends.

All RandomPtrIterator's remaining members are public. Since RandomPtrIterator is just a generalization of the nested class iterator developed in section 21.12.1, re-implementing the required member functions is easy and only requires us to change iterator into RandomPtrIterator and to change std::string into Type. For example, operator<, defined in the class iterator as

inline bool StringPtr::iterator::operator<(iterator const &other) const
{
    return **d_current < **other.d_current;
}

is now implemented as:

    template <typename Class, typename BaseIterator, typename Type>
    bool RandomPtrIterator<Class, BaseIterator, Type>::operator<(
                                    RandomPtrIterator const &other) const
    {
        return **d_current < **other.d_current;
    }

Some additional examples: operator*, defined in the class iterator as

inline std::string &StringPtr::iterator::operator*() const
{
    return **d_current;
}

is now implemented as:

    template <typename Class, typename BaseIterator, typename Type>
    Type &RandomPtrIterator<Class, BaseIterator, Type>::operator*() const
    {
        return **d_current;
    }

The pre- and postfix increment operators are now implemented as:

    template <typename Class, typename BaseIterator, typename Type>
    RandomPtrIterator<Class, BaseIterator, Type>
    &RandomPtrIterator<Class, BaseIterator, Type>::operator++()
    {
        ++d_current;
        return *this;
    }
    template <typename Class, typename BaseIterator, typename Type>
    RandomPtrIterator<Class, BaseIterator, Type> const
    RandomPtrIterator<Class, BaseIterator, Type>::operator++(int)
    {
        return RandomPtrIterator(d_current++);
    }

Remaining members can be implemented accordingly, their actual implementations are left as exercises to the reader (or can be obtained from the cplusplus.yo.zip archive, of course).

Re-implementing the class StringPtr developed in section 21.12.1 is not difficult either. Apart from including the header file defining the class template RandomPtrIterator, it only requires a single modification. Its iterator typedef must now be associated with a RandomPtrIterator. Here is the full class interface and the class's inline member definitions:

    #ifndef INCLUDED_STRINGPTR_H_
    #define INCLUDED_STRINGPTR_H_

    #include <vector>
    #include <string>
    #include "iterator.h"

    class StringPtr: public std::vector<std::string *>
    {
        public:
            typedef RandomPtrIterator
                    <
                        StringPtr,
                        std::vector<std::string *>::iterator,
                        std::string
                    >
                        iterator;

            typedef std::reverse_iterator<iterator> reverse_iterator;

            iterator begin();
            iterator end();
            reverse_iterator rbegin();
            reverse_iterator rend();
    };

    inline StringPtr::iterator StringPtr::begin()
    {
        return iterator(this->std::vector<std::string *>::begin() );
    }
    inline StringPtr::iterator StringPtr::end()
    {
        return iterator(this->std::vector<std::string *>::end());
    }
    inline StringPtr::reverse_iterator StringPtr::rbegin()
    {
        return reverse_iterator(end());
    }
    inline StringPtr::reverse_iterator StringPtr::rend()
    {
        return reverse_iterator(begin());
    }
    #endif

Including StringPtr's modified header file into the program given in section 21.12.2 results in a program behaving identically to its earlier version. In this case StringPtr::begin and StringPtr::end return iterator objects constructed from a template definition.

23.8: Using `bisonc++' and `flex'

The example discussed below digs into the peculiarities of using parser- and scanner generators generating C++ sources. Once the input for a program exceeds a certain level of complexity, it becomes attractive to use scanner- and parser-generators generating the code which does the actual input recognition.

The current example assumes that the reader knows how to use the scanner generator flex and the parser generator bison. Both bison and flex are well documented elsewhere. The original predecessors of bison and flex, called yacc and lex are described in several books, e.g. in O'Reilly's book `lex & yacc'.

Scanner- and parser generators are also available as free software. Both bison and flex are usually part of software distributions or they can be obtained from ftp://prep.ai.mit.edu/pub/non-gnu. Flex creates a C++ class when %option c++ is specified.

For parser generators the program bison is available. In the early 90's Alain Coetmeur (coetmeur@icdc.fr) created a C++ variant ( bison++) creating a parser class. Although the bison++ program produces code that can be used in C++ programs it also shows many characteristics that are more suggestive of a C context than a C++ context. In January 2005 I rewrote parts of Alain's bison++ program, resulting in the original version of the program bisonc++. Then, in May 2005 a complete rewrite of the bisonc++ parser generator was completed (version number 0.98). Current versions of bisonc++ can be downloaded from http://bisoncpp.sourceforge.net/, where it is available as source archive and as binary (i386) Debian package (including bisonc++'s documentation).

Bisonc++ creates a cleaner parser class than bison++. In particular, it derives the parser class from a base-class, containing the parser's token- and type-definitions as well as all member functions which should not be (re)defined by the programmer. As a result of this approach, the generated parser class is very small, declaring only members that are actually defined by the programmer (as well as some other members, generated by bisonc++ itself, implementing the parser's parse() member). One member that is not implemented by default is lex, producing the next lexical token. When the directive %scanner (see section 23.8.2.1) is used, bisonc++ produces a standard implementation for this member; otherwise it must be implemented by the programmer.

This section of the C++ Annotations focuses on bisonc++ as our parser generator.

Using flex and bisonc++ class-based scanners and parsers can be generated. The advantage of this approach is that the interface to the scanner and the parser tends to become cleaner than without using the class interface. Furthermore, classes allow us to get rid of most if not all global variables, making it easy to use multiple parsers in one program.

Below two examples programs are developed. The first example only uses flex. The generated scanner monitors the production of a file from several parts. That example focuses on the lexical scanner and on switching files while churning through the information. The second example uses both flex and bisonc++ to generate a scanner and a parser transforming standard arithmetic expressions to their postfix notations, commonly used in code generated by compilers and in HP-calculators. In the second example the emphasis is mainly on bisonc++ and on composing a scanner object inside a generated parser.

23.8.1: Using `flex' to create a scanner

The lexical scanner developed in this section is used to monitor the production of a file from several subfiles. The setup is as follows: the input-language knows of an #include directive, followed by a text string specifying the file (path) which should be included at the location of the #include.

In order to avoid complexities irrelevant to the current example, the format of the #include statement is restricted to the form #include <filepath>. The file specified between the pointed brackets should be available at the location indicated by filepath. If the file is not available, the program terminates after issuing an error message.

The program is started with one or two filename arguments. If the program is started with just one filename argument, the output is written to the standard output stream cout. Otherwise, the output is written to the stream whose name is given as the program's second argument.

The program defines a maximum nesting depth. Once this maximum is exceeded, the program terminates after issuing an error message. In that case, the filename stack indicating where which file was included is printed.

An additional feature of the program is that (standard C++) comment-lines are ignored. Include-directives in comment-lines are also ignored.

The program is created in five major steps:

First, the file lexer is constructed, containing the input-language specifications.
From the specifications in lexer the requirements for the class Scanner evolve. The Scanner class is a wrapper class around the class yyFlexLexer generated by flex. The requirements result in the interface of the class Scanner.
Next, main is constructed. A Scanner object is created inspecting the command-line arguments. If successful, the scanner's member yylex is called to produce the program's output.
Now that the global setup of the program has been specified, the member functions of the various classes are implemented.
Finally, the program is compiled and linked.

23.8.1.1: The derived class `Scanner'

The code associated with the regular expression rules is located inside the class yyFlexLexer. However, we of course want to use the derived class's members in this code. This causes a small problem. How does a base-class member know about members of classes derived from it?

Inheritance helps us to overcome this problem. In the specification of the class yyFlexLexer, we notice that the function yylex is a virtual function. The header file FlexLexer.h declares the virtual member int yylex:

    class yyFlexLexer: public FlexLexer
    {
        public:
            yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 );

            virtual ~yyFlexLexer();

            void yy_switch_to_buffer( struct yy_buffer_state* new_buffer );
            struct yy_buffer_state* yy_create_buffer( istream* s, int size );
            void yy_delete_buffer( struct yy_buffer_state* b );
            void yyrestart( istream* s );

            virtual int yylex();

            virtual void switch_streams( istream* new_in, ostream* new_out );
    };

As this function is virtual it can be overridden by a derived class. In that case the overridden function will be called from its base class (i.e., yyFlexLexer) code. Since the derived class's yylex() is called, it will now have access to the members of the derived class, and also to the public and protected members of its base class.

By default, the context in which the generated scanner is placed is the function yyFlexLexer::yylex. This context changes if we use a derived class, e.g., Scanner. To derive Scanner from yyFlexLexer, generated by flex, do as follows:

The function yylex must be declared in the derived class Scanner.
Options (see below) are used to inform flex about the derived class's name.

Looking at the regular expressions themselves, notice that we need rules to recognize comment, #include directives, and all remaining characters. This is all fairly standard practice. When an #include directive is sensed, the directive is parsed by the scanner. This too is common practice. Here is what our lexical scanner will do:

As usual, preprocessor directives are not analyzed by a parser, but by the lexical scanner;
The scanner uses a mini scanner to extract the filename from the directive, throwing a Scanner::Error value (invalidInclude) if this fails;
If the filename could be extracted, it is stored in nextSource;
When the #include directive has been processed, pushSource is called to perform the switch to another file;
When the end of the file (EOF) is reached, the derived class's member function popSource is called, popping the previously pushed file and returning true;
Once the file-stack is empty, popSource returns false, resulting in calling yyterminate, terminating the scanner.

The lexical scanner specification file is organized similarly as the one used for flex in C contexts. However, in C++ contexts, flex may create a class (yyFlexLexer) from which another class (e.g., Scanner) can be derived. Flex's specification file itself has three sections:

The lexer specification file's first section is a C++ preamble, containing code which can be used in the code defining the actions that are performed once a regular expression has been matched. In the current setup, where each class has its own internal header file, the internal header file includes the file scanner.h, in turn including FlexLexer.h, which is part of the flex distribution. FlexLexer.h has a peculiar setup, due to which it should not be read twice by the code generated by flex. So, we now have the following situation:
- First we look at the lexer specification file. It contains a preamble including scanner.ih. The class Scanner is declared in scanner.h, which is read by scanner.ih. Therefore Scanner's members are known and can be called from the code associated with the regular expressions defined in the lexer specification file.
- However, in scanner.h, defining class Scanner, the header file FlexLexer.h, declaring Scanner's base class, must have been read by the compiler before the class Scanner itself is defined.
- Code generated by flex already includes FlexLexer.h, and as mentioned, FlexLexer.h may not be read twice. Unfortunately, flex also inserts the specification file's preamble into the code it generates.
- Since this preamble includes scanner.ih, and so scanner.h, and so FlexLexer.h, we now do include FlexLexer.h twice in code generated by flex. This must be prevented.
This is how multiple inclusions of FlexLexer.h can be prevented:
- Although scanner.ih includes scanner.h, scanner.h itself is modified such that it includes FlexLexer.h, unless the C preprocesser variable SKIP_FLEXLEXER_ is defined.
- In flex' specification file SKIP_FLEXLEXER_ is defined just prior to including scanner.ih.
Using this scheme, code generated by flex now re-includes FlexLexer.h. At the same time the compilation of Scanner's members proceeds independently of the lexer specification file's preamble, so here FlexLexer.h is properly included too. Here is the specification files' preamble:
```
%{
    #define _SKIP_YYFLEXLEXER_
    #include "scanner.ih"
%}
```
The specification file's second section is a flex symbol area, used to define symbols, like a mini scanner, or options. The following options are suggested:
- %option 8bit: allowing the generated lexical scanner to read 8-bit characters (rather than 7-bit, which is the default).
- %option c++: this results in flex generating C++ code.
- %option debug: includes debugging code into the code generated by flex. Calling the member function set_debug(true) activates this debugging code at run-time. When activated, information about which rules are matched is written to the standard error stream. To suppress the execution of debug code the member function set_debug(false) may be called.
- %option noyywrap: when the scanner reaches the end of file, it will (by default) call a function yywrap which may perform the switch to another file. Calling this function is suppressed when %option noyywrap is specified. Since there exist alternatives which render this function superfluous (see below), it is suggested to specify this option as well.
- %option outfile="yylex.cc": this defines yylex.cc as the name of the generated C++ source file.
- %option warn: this option is strongly suggested by the flex documentation, so it's mentioned here as well. See flex' documentation for details.
- %option yyclass="Scanner": this defines Scanner as the name of the class derived from yyFlexLexer.
- %option yylineno: this option causes the lexical scanner to keep track of the line numbers of the files it is scanning. When processing nested files the variable yylineno is not automatically reset to the last line number of a file when yylex returns to a partially processed file. In those cases, yylineno must explicitly be reset to a former value. If specified, the current line number is returned by the public member lineno, returning an int.
Here is the specification files' symbol area:
```
%option yyclass="Scanner" outfile="yylex.cc" c++ 8bit warn noyywrap yylineno
%option debug

%x      comment
%x      include

eolnComment     "//".*
anyChar         .|\n
```

The specification file's third section is a rules section, in which the regular expressions and their associated actions are defined. In the example developed here, the lexer should copy information from the istream *yyin to the ostream *yyout. For this the predefined macro ECHO can be used. Here is the used symbol area:

%%
    /*
        The comment-rules: comment lines are ignored.
    */
{eolnComment}
"/*"                    BEGIN comment;
<comment>{anyChar}
<comment>"*/"           BEGIN INITIAL;

    /*
        File switching: #include <filepath>
    */
#include[ \t]+"<"       BEGIN include;
<include>[^ \t>]+       d_nextSource = yytext;
<include>">"[ \t]*\n    {
                            BEGIN INITIAL;
                            pushSource(YY_CURRENT_BUFFER, YY_BUF_SIZE);
                        }
<include>{anyChar}      throw invalidInclude;

    /*
        The default rules: eating all the rest, echoing it to output
    */
{anyChar}               ECHO;

    /*
        The <<EOF>> rule: pop a pushed file, or terminate the lexer
    */
<<EOF>>                 {
                            if (!popSource(YY_CURRENT_BUFFER))
                                yyterminate();
                        }
%%

Since the derived class's members may now access the information stored in the lexical scanner itself (it can even access the information directly, since yyFlexLexer's data members of are protected, and thus accessible to derived classes), most processing can be left to the derived class's member functions. This results in a very clean setup of the lexer specification file, requiring no or hardly any code in the preamble.

23.8.1.2: Implementing `Scanner'

The class Scanner, derived as usual from the class yyFlexLexer, is generated by flex. The derived class has access to data controlled by the lexical scanner. Specifically, it has access to the following members:

char *yytext, containing the text matched by a regular expression. Clients may access this information using the scanner's YYText member;
int yyleng, the length of the text in yytext. Clients may access this value using the scanner's YYLeng member;
int yylineno: the current line number. This variable is only maintained if %option yylineno is specified. Clients may access this value using the scanner's lineno member.

Other members are available as well, but are used less often. Details can be found in FlexLexer.h.

Objects of the class Scanner perform two tasks:

They push file information about the current file to a file stack;
They pop the last-pushed information from the stack once EOF is detected in a file.

Several member functions are used to accomplish these tasks. As they are auxiliary to the scanner, they are private members. In practice, develop these private members only when the need for them arises. Note that, apart from the private member functions, several private data members are defined as well. Let's have a closer look at the implementation of the class Scanner:

First, we have a look at the class's initial section, showing the conditional inclusion of FlexLexer.h, its class opening, and its private data. At the top of the class interface the private struct FileInfo is defined. FileInfo is used to store the names and pointers to open files. The struct has two constructors. One accepts a filename, the other also expecting a bool argument indicating that the file is already open and should not be handled by FileInfo. This former constructor is used only once. As the initial stream is an already open file there is no need to open it again and so Scanner's constructor will use this constructor to store the name of the initial file only. Scanner's public section starts off by defining the enum Error defining various symbolic constants for errors that may be detected:

#if ! defined(_SKIP_YYFLEXLEXER_)
#include <FlexLexer.h>
#endif

class Scanner: public yyFlexLexer
{
    struct FileInfo
    {
        std::string d_name;
        std::ifstream    *d_in;

        FileInfo(std::string name)
        :
            d_name(name),
            d_in(new std::ifstream(name.c_str()))
        {}
        FileInfo(std::string name, bool)
        :
            d_name(name),
            d_in(0)
        {}
//        inline bool operator==(FileInfo const &rhs) const
//        {
//            return d_name == rhs.d_name;
//        }
    };

    friend bool operator==(FileInfo const &fi, std::string const &name);

    std::stack<yy_buffer_state *>   d_state;
    std::vector<FileInfo>           d_fileInfo;
    std::string                     d_nextSource;

    static size_t const           s_maxDepth = 10;

    public:
        enum Error
        {
            invalidInclude,
            circularInclusion,
            nestingTooDeep,
            cantRead,
        };

As they are objects, the class's data members are initialized automatically by Scanner's constructor. It activates the initial input (and output) file and pushes the name of the initial input file on the file stack, using the second FileInfo constructor. Here is its implementation:
```
    #include "scanner.ih"

    Scanner::Scanner(istream *yyin, string const &initialName)
    {
        switch_streams(yyin, yyout);
        d_fileInfo.push_back(FileInfo(initialName, false));
    }
```
The scanning process proceeds as follows: once the scanner extracts a filename from an #include directive, a switch to another file is performed by pushSource. If the filename could not be extracted, the scanner throws an invalidInclude exception. The pushSource member and the matching function popSource handle file switching. Switching to another file proceeds like this:
- First, the current depth of the include-nesting is inspected. If s_maxDepth is reached, the stack is considered full, and the scanner throws a nestingTooDeep exception.
- Next, throwOnCircularInclusion is called to avoid circular inclusions when switching to new files. This function throws an exception if a filename is included twice using a simple literal name check. Here is its implementation:
```
    #include "scanner.ih"

    inline bool operator==(Scanner::FileInfo const &fi, string const &name)
    {
        return fi.d_name == name;
    }

    void Scanner::throwOnCircularInclusion()
    {
        vector<FileInfo>::iterator
            it = find(d_fileInfo.begin(), d_fileInfo.end(), d_nextSource);

        if (it != d_fileInfo.end())
            throw circularInclusion;
    }
```
- Then the new filename is added to the FileInfo vector, at the same time creating a new ifstream object. If this fails, the scanner throws a cantRead exception.
- Finally, a new yy_buffer_state is created for the newly opened stream, and the lexical scanner is instructed to switch to that stream using yyFlexLexer's member function yy_switch_to_buffer.
Here is pushSource's implementation:
```
    #include "scanner.ih"

    void Scanner::pushSource(yy_buffer_state *current, size_t size)
    {
        if (d_state.size() == s_maxDepth)
            throw nestingTooDeep;

        throwOnCircularInclusion();
        d_fileInfo.push_back(FileInfo(d_nextSource));

        ifstream *newStream = d_fileInfo.back().d_in;

        if (!*newStream)
            throw cantRead;

        d_state.push(current);
        yy_switch_to_buffer(yy_create_buffer(newStream, size));
    }
```
The class yyFlexLexer provides a series of member functions that can be used to switch files. The file-switching capability of a yyFlexLexer object is founded on the struct yy_buffer_state, containing the state of the scan-buffer of the currently read file. This buffer is pushed on the d_state stack when an #include is encountered. Then yy_buffer_state's contents are replaced by the buffer created for the file to be processed next. In the flex specification file the function pushSource is called as
```
    pushSource(YY_CURRENT_BUFFER, YY_BUF_SIZE);
```
YY_CURRENT_BUFFER and YY_BUF_SIZE are macros that are only available in the rules section of the lexer specification file, so they must be passed as arguments to pushSource. It is not possible to use these macros in the Scanner class's member functions directly.
Note that yylineno is not updated when a file switch is performed. If line numbers are to be monitored, then the current value of yylineno should be pushed on a stack and yylineno should be reset by pushSource. Correspondingly, popSource should reinstate a former value of yylineno by popping a previously pushed value from the stack. Scanner's current implementation maintains a stack of yy_buffer_state pointers. Changing that into a stack of pair<yy_buffer_state *, size_t> elements allows us to save (and restore) line numbers as well. This modification is left as an exercise to the reader.
The member function popSource is called to pop the previously pushed buffer off the stack. This allows the scanner to continue its scanning process just beyond the just completed #include directive. The member popSource first inspects the size of the d_state stack. If it is empty, false is returned and the function terminates. If it isn't empty, then the current buffer is deleted and it is replaced by the state waiting on top of the stack. The file switch is performed by the yyFlexLexer members yy_delete_buffer and yy_switch_to_buffer. The yy_delete_buffer function does not close the ifstream and does not delete the memory allocated for this stream by pushSource. Therefore delete is called for the ifstream pointer stored at the back of d_fileInfo to take care of both. Following this the last FileInfo entry is removed from d_fileInfo. Finally the function returns true:
```
    #include "scanner.ih"

    bool Scanner::popSource(yy_buffer_state *current)
    {
        if (d_state.empty())
            return false;

        yy_delete_buffer(current);
        yy_switch_to_buffer(d_state.top());
        d_state.pop();

        delete d_fileInfo.back().d_in;      // closes the stream as well
        d_fileInfo.pop_back();

        return true;
    }
```

Two service members are offered: stackTrace dumps the names of the currently pushed files to the standard error stream. It may be called by exception catchers. Here is its implementation:

    #include "scanner.ih"

    void Scanner::stackTrace()
    {
        for (size_t idx = 0; idx < d_fileInfo.size() - 1; ++idx)
            cerr << idx << ": " << d_fileInfo[idx].d_name  << " included " <<
                                    d_fileInfo[idx + 1].d_name  << '\n';
    }

lastFile returns the name of the currently processed file. It may be implemented inline:

inline std::string const &Scanner::lastFile()
{
    return d_fileInfo.back().d_name;
}

The lexical scanner itself is defined in Scanner::yylex. Therefore, int yylex must be declared by the class Scanner, as it overrides FlexLexer's virtual member yylex.

23.8.1.3: Using a `Scanner' object

The program using our Scanner is very simple. It expects a filename indicating where to start the scanning process. It first checks the number of arguments. If at least one argument was given, then an ifstream object is created. If this object can be created, then a Scanner object is constructed, receiving the address of the ifstream object and the name of the initial input file as its arguments. Then the Scanner object's yylex member is called. The scanner object throws Scanner::Error exceptions if it fails to perform its tasks properly. These exceptions are caught near main's end. Here is the program's source:

    #include "lexer.h"
    using namespace std;

    int main(int argc, char **argv)
    try
    {
        if (argc == 1)
        {
            cerr << "Filename argument required\n";
            return 1;
        }

        ifstream yyin(argv[1]);
        if (!yyin)
        {
            cerr << "Can't read " << argv[1] << '\n';
            return 1;
        }

        Scanner scanner(&yyin, argv[1]);
        try
        {
            return scanner.yylex();
        }
        catch (Scanner::Error err)
        {
            char const *msg[] =
            {
                "Include specification",
                "Circular Include",
                "Nesting",
                "Read",
            };
            cerr << msg[err] << " error in " << scanner.lastFile() <<
                                ", line " << scanner.lineno() << '\n';
            scanner.stackTrace();
            return 1;
        }
    }

23.8.1.4: Building the program

The final program is constructed in two steps. These steps are given for a Unix system, on which flex and the Gnu C++ compiler g++ have been installed:

First, the lexical scanner's source is created using flex. For this the following command can be given:
```
    flex lexer
```
Next, all sources are compiled and linked. In situations where the default yywrap() function is used, the libfl.a library should be linked against the final program. Normally, that's not required, and the program can be constructed as, e.g.:
```
    g++ -o lexer *.cc
```

For the purpose of debugging a lexical scanner, the matched rules and the returned tokens provide useful information. When the %option debug was specified, debugging code is included in the generated scanner. To obtain debugging info, this code must also be activated. Assuming the scanner object is called scanner, then the statement

    scanner.set_debug(true);

will produce debugging info written to the standard error stream.

23.8.2: Using `bisonc++' and `flex'

Once an input language exceeds a certain level of complexity, a parser is often used to control the complexity of the language. In this case, a parser generator can be used to generate the code verifying the input's grammatical correctness. The lexical scanner (preferably composed into the parser) provides chunks of the input, called tokens. The parser then processes the series of tokens generated by the lexical scanner.

Starting point when developing programs that use both parsers and scanners is the grammar. The grammar defines a set of tokens that can be returned by the lexical scanner (called the scanner below).

Finally, auxiliary code is provided to `fill in the blanks': the actions performed by the parser and by the scanner are not normally specified literally in the grammar rules or lexical regular expressions, but should be implemented in member functions, called from the parser's rules or which are associated with the scanner's regular expressions.

In the previous section we've seen an example of a C++ class generated by flex. In the current section we concentrate on the parser. The parser can be generated from a grammar specification file, processed by the program bisonc++. The grammar specification file required by bisonc++ is similar to the file processed by bison (or by bison's successor (and bisonc++'s predecessor) bison++, written in the early nineties by the Frenchman Alain Coetmeur).

In this section a program is developed converting infix expressions, where binary operators are written between their operands, to postfix expressions, where operators are written behind their operands. Also, the unary operator - is converted from its prefix notation to a postfix form. The unary + operator is ignored as it requires no further actions. In essence our little calculator is a micro compiler, transforming numeric expressions into assembly-like instructions.

Our calculator will recognize a very basic set of operators: multiplication, addition, parentheses, and the unary minus. We'll distinguish real numbers from integers, to illustrate a subtlety in bison-like grammar specifications. That's all. The purpose of this section is, after all, to illustrate the construction of a C++ program that uses both a parser and a lexical scanner, rather than to construct a full-fledged calculator.

In the coming sections we'll develop the grammar specification for bisonc++. Then, the regular expressions for the scanner are specified. Following that, the final program is constructed.

23.8.2.1: The `bisonc++' specification file

The grammar specification file required by bisonc++ is comparable to the specification file required by bison. Differences are related to the class nature of the resulting parser. Our calculator distinguishes real numbers from integers, and supports a basic set of arithmetic operators.

Bisonc++ should be used as follows:

As usual, a grammar is defined. With bisonc++ this is no different, and bisonc++ grammar definitions are for all practical purposes identical to bison's grammar definitions.
Having specified the grammar and (usually) some declarations bisonc++ can generate files defining the parser class and the implementation of the member function parse.
All class members (except those that are required for the proper functioning of the member parse) must be separately implemented. Of course, they should also be declared in the parser class's header. At the very least the member lex must be implemented. This member is called by parse to obtain the next available token. However, bisonc++ offers a facility providing a standard implementation of the function lex. The member function error(char const *msg) is given a simple default implementation that may be modified by the programmer. The member function error is called when parse detects (syntactic) errors.

The parser can now be used in a program. A very simple example would be:

    int main()
    {
        Parser parser;
        return parser.parse();
    }

The bisonc++ specification file has two sections:

The declaration section. In this section bison's tokens, and the priority rules for the operators are declared. However, bisonc++ also supports several new declarations. These new declarations are important and are discussed below.
The rules section. The grammatical rules define the grammar. This section is identical to the one required by bison, albeit that some members that were available in bison and bison++ are obsolete in bisonc++, while other members can be used in a wider context. For example, ACCEPT and ABORT can be called from any member called from the parser's action blocks to terminate the parsing process.

Readers familiar with bison will note that there is no header section anymore. Header sections are used by bison to provide for the necessary declarations allowing the compiler to compile the C function generated by bison. In C++ declarations are part of or already used by class definitions. Therefore, a parser generator generating a C++ class and some of its member functions does not require a header section anymore.

The declaration section

The declaration section contains several sets of declarations, among which definitions of all the tokens used in the grammar and the priorities and associativities of the mathematical operators. Moreover, several new and important specifications can be used here as well. Those relevant to our current example and only available in bisonc++ are discussed here. The reader is referred to bisonc++'s man-page for a full description.

%baseclass-header header
Defines the pathname of the file to contain (or containing) the parser's base class. Defaults to the name of the parser class plus the suffix base.h.
%baseclass-preinclude header
Use header as the pathname to the file pre-included in the parser's base-class header. This declaration is useful in situations where the base class header file refers to types which might not yet be known. E.g., with %union a std::string * field might be used. Since the class std::string might not yet be known to the compiler once it processes the base class header file we need a way to inform the compiler about these classes and types. The suggested procedure is to use a pre-include header file declaring the required types. By default header will be surrounded by double quotes (using, e.g., #include "header"). When the argument is surrounded by angle brackets #include <header> will be included. In the latter case, quotes might be required to escape interpretation by the shell (e.g., using -H '<header>').
%class-header header
Defines the pathname of the file to contain (or containing) the parser class. Defaults to the name of the parser class plus the suffix .h
%class-name parser-class-name
Declares the class name of this parser. This declaration replaces the %name declaration previously used by bison++. It defines the name of the C++ class that will be generated. Contrary to bison++'s %name declaration, %class-name may appear anywhere in the first section of the grammar specification file. It may be defined only once. If no %class-name is specified the default class name Parser will be used.
%debug
Provides parse and its support functions with debugging code, showing the actual parsing process on the standard output stream. When included, the debugging output is active by default, but its activity may be controlled using the setDebug(bool on-off) member. Note that no #ifdef DEBUG macros are used anymore. By rerunning bisonc++ without the --debug option an equivalent parser is generated not containing the debugging code.
%filenames header
Defines the generic name of all generated files, unless overridden by specific names. By default the generated files use the class-name as the generic file name.
%implementation-header header
Defines the pathname of the file to contain (or containing) the implementation header. Defaults to the name of the generated parser class plus the suffix .ih. The implementation header should contain all directives and declarations only used by the implementations of the parser's member functions. It is the only header file that is included by the source file containing parse's implementation. It is suggested that user defined implementations of other class members use the same convention, thus concentrating all directives and declarations that are required for the compilation of other source files belonging to the parser class in one header file.
%parsefun-source source
Defines the pathname of the file containing the parser member parse. Defaults to parse.cc.
%scanner header
Use header as the pathname to the file pre-included in the parser's class header. This file should define a class Scanner, offering a member int yylex() producing the next token from the input stream to be analyzed by the parser generated by bisonc++. When this option is used the parser's member int lex() will be predefined as (assuming the parser class name is Parser):
```
    inline int Parser::lex()
    {
        return d_scanner.yylex();
    }
```
and an object Scanner d_scanner will be composed into the parser. The d_scanner object will be constructed using its default constructor. If another constructor is required, the parser class may be provided with an appropriate (overloaded) parser constructor after having constructed the default parser class header file using bisonc++. By default header will be surrounded by double quotes (using, e.g., #include "header"). When the argument is surrounded by angle brackets #include <header> will be included.
%stype typename
The type of the semantic value of tokens. The specification typename should be the name of an unstructured type (e.g., size_t). By default it is int. See YYSTYPE in bison. It should not be used if a %union specification is used. Within the parser class, this type may be used as STYPE.
%union union-definition
Acts identically to the bison declaration. As with bison this generates a union for the parser's semantic type. The union type is named STYPE. If no %union is declared, a simple stack-type may be defined using the %stype declaration. If no %stype declaration is used, the default stacktype (int) is used.

An example of a %union declaration is:

    %union
    {
        int     i;
        double  d;
    };

In pre-C++0x code a union cannot contain objects as its fields, as constructors cannot be called when a union is created. This means that a string cannot be a member of the union. A string *, however, is a possible union member. In time C++0x will offer unrestricted unions (cf. section 7.11) allowing class type objects to become fields in union definitions, but these unrestricted unions have not yet been implemented by compilers.

As an aside: the scanner does not have to know about such a union. It can simply pass its scanned text to the parser through its YYText member function. For example using a statement like

    $$.i = A2x(scanner.YYText());

matched text may be converted to a value of an appropriate type.

Tokens and non-terminals can be associated with union fields. This is strongly advised, as it prevents type mismatches, since the compiler will be able to check for type correctness. At the same time, the bison specific variabels $$, $1, $2, etc. may be used, rather than the full field specification (like $$.i). A non-terminal or a token may be associated with a union field using the <fieldname> specification. E.g.,

    %token <i> INT          // token association (deprecated, see below)
           <d> DOUBLE
    %type  <i> intExpr      // non-terminal association

In the example developed here, both the tokens and the non-terminals can be associated with a field of the union. However, as noted before, the scanner does not have to know about all this. In our opinion, it is cleaner to let the scanner do just one thing: scan texts. The parser, knowing what the input is all about, may then convert strings like "123" to an integer value. Consequently, the association of a union field and a token is discouraged. When describing the rules of the grammar this will be illustrated further.

In the %union discussion the %token and %type specifications should be noted. They are used to specify the tokens (terminal symbols) that can be returned by the scanner, and to specify the return types of non-terminals. Apart from %token the token declarators %left, %right, and %nonassoc can be used to specify the associativity of operators. The tokens mentioned at these indicators are interpreted as tokens indicating operators, associating in the indicated direction. The precedence of operators is defined by their order: the first specification has the lowest priority. To overrule a certain precedence in a certain context %prec can be used. As all this is standard bisonc++ practice, it isn't further elaborated here. The documentation provided with bisonc++'s distribution should be consulted for further reference.

Here is the specification of the calculator's declaration section:

%filenames parser
%scanner ../scanner/scanner.h
%lines

%union {
    int i;
    double d;
};

%token  INT
        DOUBLE

%type   <i> intExpr
%type   <d> doubleExpr

%left   '+'
%left   '*'
%right  UnaryMinus

In the declaration section %type specifiers are used, associating the intExpr rule's value (see the next section) to the i-field of the semantic-value union, and associating doubleExpr's value to the d-field. At first sight this may look complex, since the expression rules must be included for each individual return type. On the other hand, if the union itself would have been used, we would still have had to specify somewhere in the returned semantic values what field to use: less rules, but more complex and error-prone code.

The grammar rules

The rules and actions of the grammar are specified as usual. The grammar for our little calculator is given below. There are quite a few rules, but they illustrate various features offered by bisonc++. In particular, note that no action block requires more than a single line of code. This keeps the grammar simple, and therefore enhances its readability and understandability. Even the rule defining the parser's proper termination (the empty line in the line rule) uses a single member function called done. The implementation of that function is simple, but it is worth while noting that it calls Parser::ACCEPT, showing that ACCEPT can be called indirectly from a production rule's action block. Here are the grammar's production rules:

    lines:
        lines
        line
    |
        line
    ;

    line:
        intExpr
        '\n'
        {
            display($1);
        }
    |
        doubleExpr
        '\n'
        {
            display($1);
        }
    |
        '\n'
        {
            done();
        }
    |
        error
        '\n'
        {
            reset();
        }
    ;

    intExpr:
        intExpr '*' intExpr
        {
            $$ = exec('*', $1, $3);
        }
    |
        intExpr '+' intExpr
        {
            $$ = exec('+', $1, $3);
        }
    |
        '(' intExpr ')'
        {

            $$ = $2;
        }
    |
        '-' intExpr         %prec UnaryMinus
        {
            $$ = neg($2);
        }
    |
        INT
        {
            $$ = convert<int>();
        }
    ;

    doubleExpr:
        doubleExpr '*' doubleExpr
        {
            $$ = exec('*', $1, $3);
        }
    |
        doubleExpr '*' intExpr
        {
            $$ = exec('*', $1, d($3));
        }
    |
        intExpr '*' doubleExpr
        {
            $$ = exec('*', d($1), $3);
        }
    |
        doubleExpr '+' doubleExpr
        {
            $$ = exec('+', $1, $3);
        }
    |
        doubleExpr '+' intExpr
        {
            $$ = exec('+', $1, d($3));
        }
    |
        intExpr '+' doubleExpr
        {
            $$ = exec('+', d($1), $3);
        }
    |
        '(' doubleExpr ')'
        {
            $$ = $2;
        }
    |
        '-' doubleExpr         %prec UnaryMinus
        {

            $$ = neg($2);
        }
    |
        DOUBLE
        {
            $$ = convert<double>();
        }
    ;

This grammar is used to implement a simple calculator in which integer and real values can be negated, added, and multiplied and in which standard priority rules can be overruled by parentheses. The grammar shows the use of typed nonterminal symbols: doubleExpr is linked to real (double) values, intExpr is linked to integer values. Precedence and type association is defined in the parser's definition section.

The Parser's header file

Several functions called from the grammar are defined as template functions. Bisonc++ generates multiple files, among which the file defining the parser's class. Functions called from the production rule's action blocks are usually member functions of the parser. These member functions must be declared and defined. Once bisonc++ has generated the header file defining the parser's class it will not automatically rewrite that file, allowing the programmer to add new members to the parser class once they are required. Here is the parser.h file as used in our little calculator:

#ifndef Parser_h_included
#define Parser_h_included

#include <iostream>
#include <sstream>
#include <bobcat/a2x>

#include "parserbase.h"
#include "../scanner/scanner.h"


#undef Parser
class Parser: public ParserBase
{
    std::ostringstream d_rpn;
    // $insert scannerobject
    Scanner d_scanner;

    public:
        int parse();

    private:
        template <typename Type>
            Type exec(char c, Type left, Type right);
        template <typename Type>
            Type neg(Type op);
        template <typename Type>
            Type convert();

        void display(int x);
        void display(double x);
        void done() const;
        void reset();
        void error(char const *msg);
        int lex();
        void print();

        static double d(int i);

    // support functions for parse():

        void executeAction(int d_ruleNr);
        void errorRecovery();
        int lookup(bool recovery);
        void nextToken();
};


inline double Parser::d(int i)
{
    return i;
}

template <typename Type>
Type Parser::exec(char c, Type left, Type right)
{
    d_rpn << " " << c << " ";
    return c == '*' ? left * right : left + right;
}

template <typename Type>
Type Parser::neg(Type op)
{
    d_rpn << " n ";
    return -op;
}

template <typename Type>
Type Parser::convert()
{
    Type ret = FBB::A2x(d_scanner.YYText());
    d_rpn << " " << ret << " ";
    return ret;
}

inline void Parser::error(char const *msg)
{
    std::cerr << msg << '\n';
}

inline int Parser::lex()
{
    return d_scanner.yylex();
}

inline void Parser::print()
{}

#endif

23.8.2.2: The `flex' specification file

The flex-specification file used by the calculator is simple: blanks are skipped, single characters are returned, and numeric values are returned as either Parser::INT or Parser::DOUBLE tokens. Here is the complete specification file:

%{
    #define _SKIP_YYFLEXLEXER_
    #include "scanner.ih"

    #include "../parser/parserbase.h"
%}

%option yyclass="Scanner" outfile="yylex.cc" c++ 8bit warn noyywrap
%option debug

%%

[ \t]                       ;
[0-9]+                      return Parser::INT;

"."[0-9]*                   |
[0-9]+("."[0-9]*)?          return Parser::DOUBLE;

.|\n                        return *yytext;

%%

23.8.2.3: Generating code

The program's code is generated in the same way as with programs using bison and flex. The files parser.cc and parser.h are generated by the command:

    bisonc++ -V grammar

The option -V produces the file parser.output showing information about the internal structure of the provided grammar (among which its states). It is useful for debugging purposes and can be omitted if no debugging is required.

Bisonc++ may detect conflicts ( shift-reduce conflicts and/or reduce-reduce conflicts) in the provided grammar. These conflicts may be resolved explicitly using disambiguating rules or they are `resolved' by default. By default, a shift-reduce conflict is resolved by shifting (i.e., the next token is consumed). By default a reduce-reduce conflict is resolved by using the first of two competing production rules. Bisonc++'s conflict resolution procedures are identical to bison's procedures.

Once a parser class and a parsing member function has been constructed flex may be used to create a lexical scanner using, e.g., the command

    flex -I lexer

On Unix systems a command like

    g++ -o calc -Wall *.cc -s

can be used to compile and link the source of the main program and the sources produced by the scanner and parser generators.

Finally, here is a source file in which the main function and the parser object is defined. The parser features the lexical scanner as one of its data members:

#include "parser/parser.h"

using namespace std;

int main()
{
    Parser parser;

    cout << "Enter (nested) expressions containing ints, doubles, *, + and "
            "unary -\n"
            "operators. Enter an empty line to stop.\n";

    return parser.parse();
}

Bisonc++ can be downloaded from http://bisoncpp.sourceforge.net/. It requires the bobcat library, which can be downloaded from http://bobcat.sourceforge.net/.

23.8.3: Using polymorphic semantic values with Bisonc++

Below the way Bisonc++ may use a polymorphic semantic value is described. The described method is a direct result of a suggestion initially brought forward by Dallas A. Clement in September 2007.

One may wonder why a union is still used by Bisonc++ as C++ offers inherently superior constructs to combine multiple types into one type. The C++ way to combine types into one type is by defining a polymorphic base class and a series of derived classes implementing the alternative data types. Bisonc++ supports the union approach (and the unrestricted unions with C++0x) for various (e.g., backward compatibility) reasons. Bison and bison++ both support the %union directive.

An alternative to using a union is using a polymorphic base class. Such a class is developed below (the class Base). As it is a polymorphic base class it has the following characteristics:

Its destructor is virtual (and has a default implementation);
Objects of the derived classes may be obtained from a pure virtual ownClone member implementing a so-called virtual constructor (cf. the virtual constructor design pattern, Gamma et al. (1995)); ;
To obtain a clone of the current (derived class) object, the interface function clone must be called. It calls ownClone and forms a layer between the derived class implementations of ownClone and the user-software. Right now it only calls ownClone, but by not defining it as an inline function clone can easily be extended once that is required;
Several convenient utility members are defined: a pure virtual insert member and an overloaded operator<< to allow derived objects to be inserted into ostream objects.

Here is Base's interface:

    #ifndef INCLUDED_BASE_
    #define INCLUDED_BASE_

    #include <iosfwd>

    class Base
    {
        friend std::ostream &operator<<(std::ostream &out, Base const &obj);

        public:
            virtual ~Base() = default;
            Base *clone() const;
        private:
            virtual Base *ownClone() const = 0;
            virtual std::ostream &insert(std::ostream &os) const = 0;
    };

    inline std::ostream &operator<<(std::ostream &out, Base const &obj)
    {
        return obj.insert(out);
    }

    #endif

Instead of using fields of a classical union we are now using classes that are derived from the class Base. For example:

Objects of the class Int contain int values. Here is its interface (and implementation):

#ifndef INCLUDED_INT_
#define INCLUDED_INT_

#include <ostream>

#include <bobcat/a2x>

#include "../base/base.h"

class Int: public Base
{
    int d_value;

    public:
        Int(char const *text);
        Int(int v);
        int value() const;                // directly access the value
    private:
        virtual Base *ownClone() const;
        virtual std::ostream &insert(std::ostream &os) const;
};

inline Int::Int(char const *txt)
:
    d_value(FBB::A2x(txt))
{}

inline Int::Int(int v)
:
    d_value(v)
{}

inline Base *Int::ownClone() const
{
    return new Int(*this);
}

inline int Int::value() const
{
    return d_value;
}

inline std::ostream &Int::insert(std::ostream &out) const
{
    return out << d_value;
}

#endif

Objects of the class Text contain text. These objects can be used, e.g., to store the names of identifiers recognized by a lexical scanner. Here is the interface of the class Text:

#ifndef INCLUDED_TEXT_
#define INCLUDED_TEXT_

#include <string>
#include <ostream>

#include "../base/base.h"

class Text: public Base
{
    std::string d_text;

    public:
        Text(char const *id);
        std::string const &id() const;          // directly access the name.
    private:
        virtual Base *ownClone() const;
        virtual std::ostream &insert(std::ostream &os) const;
};

inline Text::Text(char const *id)
:
    d_text(id)
{}

inline Base *Text::ownClone() const
{
    return new Text(*this);
}

inline std::string const &Text::id() const
{
    return d_text;
}

inline std::ostream &Text::insert(std::ostream &out) const
{
    return out << d_text;
}

#endif

The polymorphic Base can't immediately be used as the parser's semantic value type for various reasons:

A Base class object cannot contain derived class's data members, so plain Base class objects cannot be used to store the parser's semantic values.
It's not possible to define a Base class reference as a semantic value either as containers cannot store references.
Finally, the semantic value should not be a pointer to a Base class object. Although a pointer would offer programmers the benefits of the polymorphic nature of the Base class, it would also require them to keep track of all memory used by Base objects, thus countering many of the benefits of using a polymorphic base class.

To solve the above problems, a wrapper class Semantic around a Base pointer is used. To simplify memory bookkeeping Semantic itself is defined as a class derived from std::shared_ptr (cf. section 18.4). This allows us to benefit from default implementations of the copy constructor, the overloaded assignment operator, and the destructor. Semantic itself offers an overloaded insertion operator allowing us to insert the object that is controlled by the Semantic object and derived from Base into an ostream. Here is Semantic's interface:

#ifndef INCLUDED_SEMANTIC_
#define INCLUDED_SEMANTIC_

#include <memory>
#include <ostream>

#include "../base/base.h"

class Semantic: public std::shared_ptr<Base>
{
    friend std::ostream &operator<<(std::ostream &out, Semantic const &obj);

    public:
        Semantic(Base *bp = 0);             // Semantic owns the bp
        ~Semantic() = default;
};

inline Semantic::Semantic(Base *bp)
:
    std::shared_ptr<Base>(bp)
{}

inline std::ostream &operator<<(std::ostream &out, Semantic const &obj)
{
    if (obj)
        return out << *obj;

    return out << "<UNDEFINED>";
}

#endif

23.8.3.1: The parser using a polymorphic semantic value type

In Bisonc++'s grammar specification %stype is of course Semantic. A simple grammar is defined for this illustrative example. The grammar expects input according to the following rule:

    rule:
        IDENTIFIER '(' IDENTIFIER ')' ';'
    |
        IDENTIFIER '=' INT ';'
    ;

The rule's actions simply echo the received identifiers and int values to cout. Here is an example of a decorated production rule, where due to Semantic's overloaded insertion operator the insertion of the object controlled by Semantic is automatically performed:

    IDENTIFIER '=' INT ';'
    {
          cout << $1 << " " << $3 << '\n';
    }

Bisonc++'s parser stores all semantic values on its semantic values stack (irrespective of the number of tokens that were defined in a particular production rule). At any time all semantic values associated with previously recognized tokens are available in an action block. Once the semantic value stack is reduced, the Semantic class's destructor takes care of the proper destruction of the objects controlled by its shared_ptr base class.

The scanner must of course be allowed to access the parser's data member representing the most recent semantic value. This data member is available as the parser's data member d_val__, whose address or reference can be passed to the scanner when it is constructed. E.g., with a scanner expecting an STYPE__ & the parser's constructor could simply be implemented as:

    inline Parser::Parser()
    :
        d_scanner(d_val__)
    {}

23.8.3.2: The scanner using a polymorphic semantic value type

The scanner for our polymorphic parser is simple and only needs to recognize numbers, identifiers and some simple characters, returned as character tokens. The scanner must also have access to the parser's d_val__ data member. Therefore the Scanner class defines a Semantic *d_semval member, initialized to the parser's Semantic d_val__ data member that is made available to the Scanner's constructor:

    inline Scanner::Scanner(Parser::STYPE__ *semval)
    :                       // or: Semantic *semval
        d_semval(semval)
    {}

The scanner (generated by flex) recognizes input patterns, returns Parser tokens (e.g., Parser::INT), and returns a semantic value when applicable. E.g., when recognizing a Parser::INT the rule is:

    {
        d_semval->reset(new Int(yytext));
        return Parser::INT;
    }

IDENTIFIER's semantic value is obtained analogously:

[a-zA-Z_][a-zA-Z0-9_]*  {
                d_semval->reset(new Text(yytext));
                return Parser::IDENTIFIER;
            }