Acquisition of a modest Beowulf computing cluster to advance computationally intensive research in biology, chemistry, mathematics, physics, and computer science at Earlham College

A proposal for the National Science Foundation's Major Research Instrumentation Program, NSF 01-7

Submitted by Earlham College, Richmond, IN.


Table of Contents


Cover Sheet

NSF 01-7 Major Research Instrumentation program solicitation

Directorate for Computer and Information Science and Engineering

Return to the Table of Contents or the Start of this Section.


Project Summary

The PI requests funding to design, purchase, build, and manage a modest 16 node, 32 CPU, Beowulf cluster to be used for research and research training in biology, mathematics, physics, and computer science at Earlham College. The cluster's architecture, configured with commodity rack mounted server units and network components, would be easily adaptable to support a mix of computationally intensive research projects.

The primary goal of this project is to provide the resources necessary for Earlham's science faculty to extend computational methods into a broader range of research projects. The capabilities afforded by this new Beowulf cluster will be used by faculty and students to extend ongoing research projects in computer science and physics. These resources would also support pursuit of new work in biology, mathematics, physics, and computer science.

The new Beowulf cluster will complement an existing, three year old prototype cluster built by Earlham students and faculty by providing a production quality environment for mature projects to receive much larger and more predictable resource allotments and better turn-around times. This would also enable our existing cluster to serve as a development environment for projects working with early releases of software packages and/or for projects exploring new computational methods.

The primary work in computer science is in relational database management systems and open source software. Specifically the PI is investigating the use of Beowulf clusters as a basis for a coarse-grained parallel query RDBMS with high availability. One new project in CS would focus on system and network architecture and management for Beowulf clusters in small, mixed, science research environments such as those found in liberal arts colleges. A second new initiative involves logic programming for large-scale grammar development.

Current research in physics using our existing cluster focuses on modeling nuclear contributions to heavy-ion scattering. This grant would provide the necessary resources to start a new project: parallel data and analysis techniques for gamma-ray spectroscopy.

Mathematics proposes to explore parallel techniques for use in computational geometry. This would build on previous work done at Oak Ridge National Laboratory.

Biology has an active program researching serine protease inhibitors (serpins) present in insect hemolymph. They propose adding a computational technique, protein modeling, to their work.

Chemistry has three areas with specific needs for cluster resources: the use of computational methods to predict the secondary and tertiary structure of proteins; molecular modeling to predict reaction outcomes and model transition states for reactions; and Monte Carlo calculations to help model the chemical kinetics in complex systems.

All of the research activities described above involve Earlham's undergraduate science majors in meaningful ways. Training for students in the tools and techniques of research is an integral part of our program.

The impact of this project will be broader than the research and research training activities it directly supports. The existing development cluster will provide an environment for faculty and students learning computational techniques and engineering new projects to work without disturbing more stable projects. Publishing the design, implementation, and management research will provide a useful pattern for similar institutions to implement similar clusters.

Return to the Table of Contents or the Start of this Section.


Project Description

Results from Prior NSF Support

Not applicable, the PI has not received prior support from the NSF.

Research Activities

Research projects for the Beowulf cluster will come from biology, chemistry, mathematics, physics, and computer science. Initially the new cluster will be utilized by research projects that are currently up and running on our existing prototype cluster. Once these are established, the plan is to quickly move to work with the colleagues listed below who have identified particular computational techniques which they would like to apply in their current work.

All of the projects listed below have both research and research training aspects to them. Almost all of the research undertaken by Earlham faculty involves our students in meaningful ways.

The Supplementary Documents section contains letters of support from each of the major users listed below: Jim Rogers, Amy Mulnix, Mike Deibel, Mic Jackson, and Lew Riley. These letters more fully describe the research to be supported in these areas by the new Beowulf cluster. The PI's work is described in more detail following this overview.

Charles Peck, Principal Investigator

Lew Riley, Major User

Mic Jackson, Major User

Mike Deibel, Major User

Amy Mulnix, Major User

Jim Rogers, Major User

The principal investigator's research is centered around using open source software, tools, and techniques to architect and implement an RDBMS which is highly available and utilizes coarse-grained parallel query execution. The new cluster would provide an ideal environment for this work. Currently the project is using the existing prototype cluster.

Currently there are no parallel, highly available, ANSI-compliant relational database management systems (RDBMS) available in the open source software market. This represents a significant barrier to the adoption of an open source strategy by many consumers of RDBMS technology. This is particularly true for very large database (VLDB) environments such as those commonly found in the e-commerce and high performance computing application domains.

The primary objective of our work will be to design, construct, and evaluate the performance of a parallel, highly available, ANSI compliant RDBMS using open source tools and techniques and commodity hardware.

As our work progresses it is likely that we will encounter other smaller topics that will bear on both our process and product. We expect to organize all of our objectives and results into three broad catagories.

The second new research project to be undertaken by the PI during the course of this grant will focus on architectures, software interfaces, and operational tools to support a mixed work load of computationally intense research activities. By using easily reconfigurable networking components, we can study the effects of topology and resource allocation on efficiency in a mixed work load environment. The variety of research activity we propose will provide a rich environment for this work. We have done preliminary work in this area with our existing prototype cluster, specifically in the area of network topology and routing protocols.

During the course of this research we will carefully monitor and record system loads and responses as research projects are added to the cluster. Most of this work will be accomplished using SNMP to monitor system and network variables and MRTG to collate and graph the results. We currently use these same tools to analyze the performance of our existing cluster. Three primary issues are at the heart of this work:

All of our work is based on open source software and techniques. This speaks directly to many of the needs outlined by the President's Information Technology Advisory Committee in their September, 2000 "Report to the President on Open Source Software for High End Computing". In this report the panel stresses the importance developing open source technologies for high end computing.

Computational methods are increasingly important across the natural sciences. The ready availability of instruments and people to support this type of activity at smaller institutions is a key component of the research training and graduate school preparation afforded by such colleges. Our published research in this area will serve as a template for similar schools to implement similar clusters.

Return to the Table of Contents or the Start of this Section.

Description of the Research Instrumentation and Needs

New Equipment

This proposal is for a modest Beowulf computational cluster equipped with 16 dual Pentium III/933MHz rack mounted servers and manageable networking components. Using switches with gigabit backplane capacity, high speed interconnects, and support for virtual LANs gives us a tremendous amount of flexibility; we can configure the environment in a variety of topologies to support a mix of concurrent research activity.

The particular server, network, and infrastructure hardware components we are using in our design are listed below.

The cluster will be assembled into the three equipment racks and permanently affixed in a dedicated room in the computer science area. The College will supply the space with adequate power, ventilation, furnishings, and security. The PI and a group of students will install and configure the server.

The overall architecture of the new Beowulf cluster will follow from the commonly held design principles for Beowulf systems:

Existing Equipment

Our current prototype computational cluster, the Athena cluster, is composed of 16 Pentium I/266 MHz desktop enclosures each with 32MB of memory and four 100 MB ethernet interfaces. The network topology is a static hypercube utilizing the Zebra routing daemon and open shortest path first routing. The Athena cluster was built from hardware donated by a mathematics alumnus; the design and assembly was carried out by computer science majors and faculty in the context of a class on parallel programming and in successive research projects. The limited number of cycles available and the relatively static nature of the network topology are the biggest shortcomings of our existing cluster.

This prototype cluster supports ongoing work in computer science and physics as described in the Research Activities section of this proposal. This cluster has served us well and would continue to be a valuable resource as a development environment. Additional cycles and a more flexible architecture would enable us to expand our existing work in physics and computer science while extending computational research techniques into mathematics, biology, and chemistry.

The computer science department supports a network of 20 Linux workstations for faculty and student use and a FreeBSD-based file and network services machine. All of the system, network, and database administration tasks for these machines are carried out by students working under the direction of the PI and other faculty members.

Supported Software

Much of the software we are currently using would also be used on the new Beowulf cluster. The basic set of tools in use now includes:

Tools that we have started to experiment with, and plan to evaluate for the new Beowulf cluster, are:

Software configuration and maintenance would be accomplished using rsynch with a set of local distributions of the supported software packages installed on the management console. System monitoring and backup functions would also be performed using the management console's resources.

Return to the Table of Contents or the Start of this Section.

Impact of Infrastructure Projects

Beowulf clusters, a concept first developed by researchers at NASA in early 1994, are increasingly utilized in a wide variety of research settings. The computational approaches they enable are now an important part of basic research in all of the natural sciences.

Earlham has a long history of exceptional strength in science education. According to the 1998 Baccalaureate Origins Report which ranks institutions according to the ratio of Ph.D.s granted to bachelors awarded, we ranked 21st in the Science and Engineering category among all institutions of higher learning. Earlham ranked 12th among other small undergraduate colleges for overall Science and Engineering, 5th in the Geosciences, and 6th in the Life Sciences.

Over time the new science faculty that Earlham needs to attract are more likely to be familiar with, and in some cases dependent on, computational resources for their research. This new Beowulf cluster would help form the basis of the community necessary to support their work.

Why do this project at Earlham? We acknowledge that this project is a big step for Earlham. There are a number of reasons why we feel the natural science division at Earlham is particularly well-suited to pursue this project at this time.

The computer science department designed and built a small Beowulf cluster in 1998 from donated hardware and we've been running small research projects on it since then. Over time the quantity and depth of our work has grown to the point where we are ready for a significant boost in computational power and flexibility. These ongoing research projects can directly build on the additional computational resources provided by this new Beowulf cluster. Similarly the new research efforts we describe can leverage both the additional capacity of the new cluster and the presence of a development cluster, which is the prototype cluster we currently support.

Earlham has a tradition of student faculty research in the natural sciences. We are also a small college where faculty members from different disciplines can and do collaborate in their research. These two factors make it possible for us to support the cross-disciplinary effort that computational research methods often require.

Currently the College is working on a number of fronts to raise the amount of funding available for science research. In 1999 we secured a $1,000,000 grant from the Lilly Foundation for significant building renovations and equipment in the science complex. This grant, plus $2,200,000 of the College's funds, will combine to provide improved classroom and laboratory space for all the departments in the natural science division. These renovations will be completed this summer. This grant was built on the strength of our programs and activities in the sciences which attract Indiana high school students to Earlham and provide links to opportunities for our graduates to stay in the state.

The College recently seeded an endowed fund specifically for the support of summer science research with a gift of $166,000 from a chemistry alumnus. This same person donated an additional $166,000 for science scholarships and $208,000 to the science building renovation project described above. Earlham is committed to building this summer science research fund as one of the highest priorities during the next Capital Campaign.

We recently submitted a proposal for a $652,000 grant from the W.M. Keck Foundation to support the development of a multidisciplinary summer research community at Earlham with an emphasis on computational techniques and hardware interfacing. The purpose of the proposed project is to prepare students for the increasingly interdisciplinary and technically sophisticated world of science research. In each of the first two years, 12 students will receive strong research experience and develop a good understanding of computer data acquisition and analysis.

For many years Earlham has utilized upper level students working with full-time staff to perform most all of the system, network, and database administration duties on campus. The facilities they are responsible for include all of the workstations and servers in computer science, the existing prototype cluster, and the production database environment for the administration of the College. This gives us a pool of qualified students to draw on for different aspects of this project.

Earlham is committed to pursuing an open source strategy wherever practical. Open source software, tools, and techniques form the basis of much of the curriculum in computer science; open source supports general computing in mathematics and physics as well. Some of the College's core servers, including file, web, some database, and e-mail, are all running open source solutions. Many people in academia and research worldwide contribute to and depend on open source software.

Return to the Table of Contents or the Start of this Section.

Project and Management Plans

Initially, a particular research project will take a fair amount of technical and operational support on the part of the PI and student staff. As projects become more stable, and the researchers are more in control of their computational experiments on the Beowulf cluster, the PI and student staff will migrate their focus to the next research project/discipline on the list.

Year One

Year Two

Year Three

Ongoing projects will require less human time and energy to support than new ones. Our schedule permits us to phase-in activities on the cluster as we concurrently develop and refine our load balancing and resource utilization knowledge and skills. Job scheduling will be provided as necessary although we don't anticipate the need to ration resources during the life of this grant. No user fees will be charged.

Given our relatively modest research goals we do not anticipate exhausting the computational resources provided by the new Beowulf cluster. If we do begin to reach capacity we will quickly move to secure funding for additional nodes. The flexible architecture and commodity nature of the components, combined with close monitoring, should give us the ability to react quickly to resource demands.

The existing prototype cluster will continue to be utilized for student-initiated projects and development activities for all projects. The new Beowulf cluster will be used for projects that particularly need the stability and power it affords. The two clusters will be managed by the PI and a group of students.

Return to the Table of Contents or the Start of this Section.


Biographical Sketches

Charles F. Peck, Principal Investigator

Date of birth: October 18, 1962

Citizenship: US

Professional Preparation

Appointments

Collaborators

Date of birth:

Citizenship: US

Professional Preparation

Appointments

Publications

Synergistic Activities

Collaborators

Lew Riley, Major User

Date of birth: November 1, 1970

Citizenship: US

Professional Preparation

Appointments

Publications

Synergistic Activities

Collaborators

Mic Jackson, Major User

Mike Deibel, Major User

Amy Mulnix, Major User

Jim Rogers, Major User

Return to the Table of Contents or the Start of this Section.


Budget and Funding

1. Beowulf computational cluster and associated support hardware.

A quote from VA Linux, a leading provider of server hardware and supporter of open source software, for the rack mounted servers is included in the Supplementary Documents section of the proposal. The racks and UPSs would be supplied by a local dealer; their quote is also in the Supplementary Documents section. The remainder of the equipment is commodity in nature and can be obtained from a variety of sources at competitive pricing.

2. Administrator services - system, network, and database administration tasks as performed by undergraduate computer science student employees under the direction of the PI.

3. Technical support and operation - the PI would provide technical support and operational support directly to faculty and student researchers across the natural sciences. This work would be specifically directed towards computational research utilizing the Beowulf cluster.

Three Year Budget Summary:

Return to the Table of Contents or the Start of this Section.


Current and Pending Support

Return to the Table of Contents or the Start of this Section.


Facilities, Equipment, and Other Resources

This section is not required for Major Research Instrumentation Proposals.

Return to the Table of Contents or the Start of this Section.


Letters

In the Supplementary Documents section are letters from Earlham's Provost and Academic Dean committing to cost sharing, letters of support from the major users of the Beowulf cluster, a quote from VA Linux for the principal hardware components, and a quote from a local vendor for the racks and UPSs.

Return to the Table of Contents or the Start of this Section.


List of Suggested Reviewers

Joel C. Adams
Professor of Computer Science, Calvin College
PI for a successful MRI grant funding a similar scale Beowulf cluster
adams@calvin.edu

Return to the Table of Contents or the Start of this Section.


Supplementary Documents

  • Quote from VA Linux
  • Quote from Eastgate

  • Len, institutional cost sharing commitment

  • Lew Riley
  • Mic Jackson
  • Mike Deibel
  • Amy Mulnix
  • Jim Rogers

    Return to the Table of Contents or the Start of this Section.