A proposal for the National Science Foundation's Major Research Instrumentation Program, NSF 01-7
Submitted by Earlham College, Richmond, IN.
NSF 01-7 Major Research Instrumentation program solicitation
Directorate for Computer and Information Science and Engineering
Return to the Table of Contents or the Start of this Section.
The PI requests funding to design, purchase, build, and manage a modest 16 node, 32 CPU, Beowulf cluster to be used for research and research training in biology, mathematics, physics, and computer science at Earlham College. The cluster's architecture, configured with commodity rack mounted server units and network components, would be easily adaptable to support a mix of computationally intensive research projects.
The primary goal of this project is to provide the resources necessary for Earlham's science faculty to extend computational methods into a broader range of research projects. The capabilities afforded by this new Beowulf cluster will be used by faculty and students to extend ongoing research projects in computer science and physics. These resources would also support pursuit of new work in biology, mathematics, physics, and computer science.
The new Beowulf cluster will complement an existing, three year old prototype cluster built by Earlham students and faculty by providing a production quality environment for mature projects to receive much larger and more predictable resource allotments and better turn-around times. This would also enable our existing cluster to serve as a development environment for projects working with early releases of software packages and/or for projects exploring new computational methods.
The primary work in computer science is in relational database management systems and open source software. Specifically the PI is investigating the use of Beowulf clusters as a basis for a coarse-grained parallel query RDBMS with high availability. One new project in CS would focus on system and network architecture and management for Beowulf clusters in small, mixed, science research environments such as those found in liberal arts colleges. A second new initiative involves logic programming for large-scale grammar development.
Current research in physics using our existing cluster focuses on modeling nuclear contributions to heavy-ion scattering. This grant would provide the necessary resources to start a new project: parallel data and analysis techniques for gamma-ray spectroscopy.
Mathematics proposes to explore parallel techniques for use in computational geometry. This would build on previous work done at Oak Ridge National Laboratory.
Biology has an active program researching serine protease inhibitors (serpins) present in insect hemolymph. They propose adding a computational technique, protein modeling, to their work.
Chemistry has three areas with specific needs for cluster resources: the use of computational methods to predict the secondary and tertiary structure of proteins; molecular modeling to predict reaction outcomes and model transition states for reactions; and Monte Carlo calculations to help model the chemical kinetics in complex systems.
All of the research activities described above involve Earlham's undergraduate science majors in meaningful ways. Training for students in the tools and techniques of research is an integral part of our program.
The impact of this project will be broader than the research and research training activities it directly supports. The existing development cluster will provide an environment for faculty and students learning computational techniques and engineering new projects to work without disturbing more stable projects. Publishing the design, implementation, and management research will provide a useful pattern for similar institutions to implement similar clusters.
Return to the Table of Contents or the Start of this Section.
Not applicable, the PI has not received prior support from the NSF.
Research projects for the Beowulf cluster will come from biology, chemistry,
mathematics, physics, and computer science. Initially the new cluster will be
utilized by research projects that are currently up and running on our existing
prototype cluster. Once these are established, the plan is to quickly move to work
with the colleagues listed below who have identified particular computational
techniques which they would like to apply in their current work.
All of the projects listed below have both research and research training
aspects to them. Almost all of the research undertaken by Earlham faculty
involves our students in meaningful ways.
The Supplementary Documents section contains letters of support from each of
the major users listed below: Jim Rogers, Amy Mulnix, Mike Deibel, Mic Jackson, and Lew Riley.
These letters more fully describe the research to be supported in these areas
by the new Beowulf cluster. The PI's work is described in more detail following
this overview.
Charles Peck, Principal Investigator
Lew Riley, Major User
Mic Jackson, Major User
Mike Deibel, Major User
Amy Mulnix, Major User
Jim Rogers, Major User
The principal investigator's research is centered around using open source
software, tools, and techniques to architect and implement an RDBMS which is
highly available and utilizes coarse-grained parallel query execution. The
new cluster would provide an ideal environment for this work. Currently the
project is using the existing prototype cluster.
Currently there are no parallel, highly available, ANSI-compliant relational
database management systems (RDBMS) available in the open source software market.
This represents a significant barrier to the adoption of an open source strategy
by many consumers of RDBMS technology. This is particularly true for very large
database (VLDB) environments such as those commonly found in the e-commerce and
high performance computing application domains.
The primary objective of our work will be to design, construct, and evaluate
the performance of a parallel, highly available, ANSI compliant RDBMS using open
source tools and techniques and commodity hardware.
As our work progresses it is likely that we will encounter other smaller topics
that will bear on both our process and product. We expect to organize all of our
objectives and results into three broad catagories.
The second new research project to be undertaken by the PI during the course
of this grant will focus on architectures, software interfaces, and operational
tools to support a mixed work load of computationally intense research activities.
By using easily reconfigurable networking components, we can study the effects of
topology and resource allocation on efficiency in a mixed work load environment.
The variety of research activity we propose will provide a rich environment
for this work. We have done preliminary work in this area with our existing
prototype cluster, specifically in the area of network topology and routing
protocols.
During the course of this research we will carefully monitor and record system
loads and responses as research projects are added to the cluster. Most of this
work will be accomplished using SNMP to monitor system and network variables and
MRTG to collate and graph the results. We currently use these same tools to
analyze the performance of our existing cluster. Three primary issues are at
the heart of this work:
All of our work is based on open source software and techniques. This speaks
directly to many of the needs outlined by the President's Information Technology
Advisory Committee in their September, 2000 "Report to the President on Open Source
Software for High End Computing". In this report the panel stresses the importance
developing open source technologies for high end computing.
Computational methods are increasingly important across the natural sciences.
The ready availability of instruments and people to support this type of activity
at smaller institutions is a key component of the research training and graduate
school preparation afforded by such colleges. Our published research in this
area will serve as a template for similar schools to implement similar clusters.
Return to the Table of Contents or the Start of this Section.
This proposal is for a modest Beowulf computational cluster equipped with 16
dual Pentium III/933MHz rack mounted servers and manageable networking components.
Using switches with gigabit backplane capacity, high speed interconnects, and
support for virtual LANs gives us a tremendous amount of flexibility; we can
configure the environment in a variety of topologies to support a mix of
concurrent research activity.
The particular server, network, and infrastructure hardware components we
are using in our design are listed below.
The cluster will be assembled into the three equipment racks and permanently
affixed in a dedicated room in the computer science area. The College will
supply the space with adequate power, ventilation, furnishings, and security.
The PI and a group of students will install and configure the server.
The overall architecture of the new Beowulf cluster will follow from the
commonly held design principles for Beowulf systems:
Our current prototype computational cluster, the Athena cluster, is composed of
16 Pentium I/266 MHz desktop enclosures each with 32MB of memory and four 100 MB
ethernet interfaces. The network topology is a static hypercube utilizing the
Zebra routing daemon and open shortest path first routing. The Athena cluster was
built from hardware donated by a mathematics alumnus; the design and assembly was
carried out by computer science majors and faculty in the context of a class on
parallel programming and in successive research projects. The limited number of
cycles available and the relatively static nature of the network topology are
the biggest shortcomings of our existing cluster.
This prototype cluster supports ongoing work in computer science
and physics as described in the Research Activities section of this proposal.
This cluster has served us well and would continue to be a valuable resource
as a development environment. Additional cycles and a more flexible architecture
would enable us to expand our existing work in physics and computer science while
extending computational research techniques into mathematics, biology, and
chemistry.
The computer science department supports a network of 20 Linux
workstations for faculty and student use and a FreeBSD-based file and
network services machine. All of the system, network, and database
administration tasks for these machines are carried out by students working
under the direction of the PI and other faculty members.
Much of the software we are currently using would also be used on the
new Beowulf cluster. The basic set of tools in use now includes:
Tools that we have started to experiment with, and plan to evaluate for
the new Beowulf cluster, are:
Software configuration and maintenance would be accomplished using rsynch
with a set of local distributions of the supported software packages installed
on the management console. System monitoring and backup functions would also
be performed using the management console's resources.
Return to the Table of Contents or the Start of this Section.
Beowulf clusters, a concept first developed by researchers at NASA in early
1994, are increasingly utilized in a wide variety of research settings. The
computational approaches they enable are now an important part of basic research
in all of the natural sciences.
Earlham has a long history of exceptional strength in science education.
According to
the 1998 Baccalaureate Origins Report which ranks institutions according to
the ratio of Ph.D.s granted to bachelors awarded, we ranked 21st in the
Science and Engineering category among all institutions of higher learning.
Earlham ranked 12th among other small undergraduate colleges for overall
Science and Engineering, 5th in the Geosciences, and 6th in the Life
Sciences.
Over time the new science faculty that Earlham needs to attract are more
likely to be familiar with, and in some cases dependent on, computational
resources for their research. This new Beowulf cluster would help form the
basis of the community necessary to support their work.
Why do this project at Earlham? We acknowledge that this project is a big step
for Earlham. There are a number of reasons why we feel the natural science
division at Earlham is particularly well-suited to pursue this project at this
time.
The computer science department designed and built a small Beowulf cluster in
1998 from donated hardware and we've been running small research projects on it since
then. Over time the quantity and depth of our work has grown to the point where
we are ready for a significant boost in computational power and flexibility.
These ongoing research projects can directly build on the additional
computational resources provided by this new Beowulf cluster. Similarly the
new research efforts we describe can leverage both the additional capacity
of the new cluster and the presence of a development cluster, which is the
prototype cluster we currently support.
Earlham has a tradition of student faculty research in the natural sciences.
We are also a small college where faculty members from different disciplines can
and do collaborate in their research. These two factors make it possible for
us to support the cross-disciplinary effort that computational research methods
often require.
Currently the College is working on a number of fronts to raise the amount of
funding available for science research. In 1999 we secured a $1,000,000 grant
from the Lilly Foundation for significant building renovations and equipment in
the science complex. This grant, plus $2,200,000 of the College's funds, will
combine to provide improved classroom and laboratory space for all the departments
in the natural science division. These renovations will be completed this summer.
This grant was built on the strength of our programs and activities in the
sciences which attract Indiana high school students to Earlham and provide links
to opportunities for our graduates to stay in the state.
The College recently seeded an endowed fund specifically for the support of
summer science research with a gift of $166,000 from a chemistry alumnus. This
same person donated an additional $166,000 for science scholarships and $208,000
to the science building renovation project described above. Earlham is committed
to building this summer science research fund as one of the highest priorities
during the next Capital Campaign.
We recently submitted a proposal for a $652,000 grant from the W.M. Keck
Foundation to support the development of a multidisciplinary summer research
community at Earlham with an emphasis on computational techniques and hardware
interfacing. The purpose of the proposed project is to prepare students for the
increasingly interdisciplinary and technically sophisticated world of science
research. In each of the first two years, 12 students will receive strong research
experience and develop a good understanding of computer data acquisition and
analysis.
For many years Earlham has utilized upper level students working with full-time
staff to perform most all of the system, network, and database administration
duties on campus. The facilities they are responsible for include all of the
workstations and servers in computer science, the existing prototype
cluster, and the production database environment for the administration of the
College. This gives us a pool of qualified students to draw on for different
aspects of this project.
Earlham is committed to pursuing an open source strategy wherever practical.
Open source software, tools, and techniques form the basis of much of the
curriculum in computer science; open source supports general computing in
mathematics and physics as well. Some of the College's core servers,
including file, web, some database, and e-mail, are all running open source
solutions. Many people in academia and research worldwide contribute to and
depend on open source software.
Return to the Table of Contents or the Start of this Section.
Initially, a particular research project will take a fair amount of technical
and operational support on the part of the PI and student staff. As projects
become more stable, and the researchers are more in control of their computational
experiments on the Beowulf cluster, the PI and student staff will migrate their
focus to the next research project/discipline on the list.
Year One Year Two Year Three Ongoing projects will require less human time and energy to support than new
ones. Our schedule permits us to phase-in activities on the cluster as we
concurrently develop and refine our load balancing and resource utilization
knowledge and skills. Job scheduling will be provided as necessary although we
don't anticipate the need to ration resources during the life of this grant. No
user fees will be charged.
Given our relatively modest research goals we do not anticipate exhausting
the computational resources provided by the new Beowulf cluster. If we do
begin to reach capacity we will quickly move to secure funding for additional
nodes. The flexible architecture and commodity nature of the components,
combined with close monitoring, should give us the ability to react quickly
to resource demands.
The existing prototype cluster will continue to be utilized for
student-initiated projects and development activities for all projects. The new
Beowulf cluster will be used for projects that particularly need the stability
and power it affords. The two clusters will be managed by the PI and a group
of students.
Return to the Table of Contents or the Start of this Section.
Charles F. Peck, Principal Investigator Date of birth: October 18, 1962 Citizenship: US Professional Preparation Appointments Began as an adjunct teaching one course per year. Developed the
position into a 3/4 time tenure track position between 1993 and 2000. During
this same period I worked with the Provost and colleagues from Math and
Physics to strengthen the Computer Science department through improved course
offerings, additional FTEs, expanded enrollment, undergraduate research, and
interdisciplinary projects.
Teaching responsibilities have included CS2, database systems, operating
systems, principals of computer organization, networks and networking,
software engineering, and research seminars. Worked with a team of five people to design, build, finance, and manage
Wayne County's first Internet service provider. Active in the identification
and development of local content from a variety of publicly funded and
private sources. Worked with a larger community group to develop a
state-funded, county-wide, community network. Sold the business to a regional ISP
in May of 2000 in order to devote more time to academic pursuits. Research and implementation of a cost-based optimization system designed
to support distributed query processing. Worked on the team that designed
and built
the Omni Server RDBMS kernel. Leader of the team that designed and
built a series of operating system independent software layers for all Sybase
server products. Research into high availability RDBMS architectures
utilizing a fail-over server on top of mirrored data schemas. Collaborators Date of birth: Citizenship: US Professional Preparation Appointments Publications Synergistic Activities Collaborators Lew Riley, Major User Date of birth: November 1, 1970 Citizenship: US Professional Preparation Appointments Publications Synergistic Activities Collaborators Mic Jackson, Major User Mike Deibel, Major User Amy Mulnix, Major User Jim Rogers, Major User Return to the Table of Contents or the Start of this Section.
1. Beowulf computational cluster and associated support hardware.
A quote from VA Linux, a leading provider of server hardware and supporter of
open source software, for the rack mounted servers is included in the Supplementary
Documents section of the proposal. The racks and UPSs would be supplied by a local
dealer; their quote is also in the Supplementary Documents section. The remainder
of the equipment is commodity in nature and can be obtained from a variety of
sources at competitive pricing.
2. Administrator services - system, network, and database administration
tasks as performed by undergraduate computer science student employees under
the direction of the PI.
3. Technical support and operation - the PI would provide technical support
and operational support directly to faculty and student researchers across the
natural sciences. This work would be specifically directed towards computational
research utilizing the Beowulf cluster.
Three Year Budget Summary:
Return to the Table of Contents or the Start of this Section.
Return to the Table of Contents or the Start of this Section.
This section is not required for Major Research Instrumentation Proposals.
Return to the Table of Contents or the Start of this Section.
In the Supplementary Documents section are letters from Earlham's Provost and
Academic Dean committing to cost sharing, letters of support from the major users
of the Beowulf cluster, a quote from VA Linux for the principal hardware
components, and a quote from a local vendor for the racks and UPSs.
Return to the Table of Contents or the Start of this Section.
Joel C. Adams Return to the Table of Contents or the Start of this Section.
Research Activities
Currently using the existing prototype cluster
One senior personnel, six undergraduate students
Supported by Earlham College and private funds
New project
Two senior personnel, six undergraduate students
Supported by Earlham College and the Hughes Institute, additional
outside support pending
Currently using the existing prototype cluster
One senior personnel, three undergraduate students
Supported by Earlham College
New project
Two senior personnel, six undergraduate students
Supported by Earlham College, additional outside support pending
New project
One senior personnel, ten undergraduate students
Supported by Earlham College
New project
Three senior personnel, nine undergraduate students
Supported by Earlham College
New project
One senior personnel, nine undergraduate students
Supported by Earlham College and the Hughes Institute
New project
One senior personnel, six undergraduate students
Supported by Earlham College
Description of the Research Instrumentation and Needs
New Equipment
two 933 MHz Pentium III processors
1GB memory
two 100 MB full duplex ethernet ports
two 18 GB 10,000 RPM Ultra Wide SCSI disk drives
1GB backplane and matrix port
SNMP manageable
support for virtual LANs
integrated cable management
rack mount
manageable via ethernet with SNMP
rack mount
manageable via ethernet with SNMP
commodity Pentium III class desktop unit
256 MB memory, monitor, keyboard, mouse
two 100 MB full duplex ethernet ports
two 18 GB 10,000 RPM Ultra Wide SCSI disk drives
DDS4 SCSI tape drive
VASNet console management control unit
Existing Equipment
Supported Software
Impact of Infrastructure Projects
Project and Management Plans
Ongoing Projects
Computer Science - Parallel, available, relational database management system
New Projects
Physics - Modeling nuclear contributions to heavy-ion scattering
Computer Science - Beowulf Cluster Architectures and Management Strategies for Mixed Work Loads
Physics - Parallel data analysis techniques for gamma-ray spectroscopy
Ongoing Projects
Computer science - Parallel, available, relational database management system
New Projects
Computer science - Beowulf Cluster Architectures and Management Strategies for Mixed Work Loads
Physics - Modeling nuclear contributions to heavy-ion scattering
Physics - Parallel data analysis techniques for gamma-ray spectroscopy
Computer science - Logic programming for large-scale grammar development
Mathematics - Computational geometry for environmental science
Chemistry - Computational methods to predict the secondary and tertiary structure of proteins
Ongoing Projects
Computer science - Parallel, available, relational database management system
New Projects
Computer science - Beowulf Cluster Architectures and Management Strategies for Mixed Work Loads
Computer science - Logic programming for large-scale grammar development
Physics - Modeling nuclear contributions to heavy-ion scattering
Physics - Parallel data analysis techniques for gamma-ray spectroscopy
Mathematics - Computational geometry for environmental science
Chemistry - Computational methods to predict the secondary and tertiary structure of proteins
Chemistry - molecular modeling to predict reaction outcomes and model transition states for reactions
Biology - Protein modeling of serine protease inhibitors (serpins) present in insect hemolymph
Biographical Sketches
Earlham College, 1993-present.
infocom, incorporated, 1994-2000
Sybase, Incorporated, 1989-1994
Earlham College, 1997-present
Florida State University, summers 1998-2000
Florida State University, 1994-97
Florida State University, 1993-94
Bergische Universitat Gesamthochschule Wuppertal, Wuppertal, Germany, 1992-93
Southwest Research Institute, San Antonio, Texas, 1992
\item Zero degree detector for conversion electrons from a
heavy-ion-induced reaction, M.P. Metlay, L.A. Riley, P.D. Cottle,
J.K. Jewell, and K.W. Kemper, Nucl. Instr. Meth. {\bf A385}, 112
(1997).
\item Octupole states in $^{196}$Pt(p,p$' \gamma$),
J.K. Jewell, P.D. Cottle, K.W. Kemper, and L.A. Riley, Phys. Rev. {\bf
C56}, 2440 (1997).
\item Weak coupling and single-particle structure at high spin
in $^{143}$Nd, M. Fauerbach, L.A. Riley, P.D. Cottle, R.A. Kaye, and
K.W. Kemper, Phys. Rev. {\bf C58}, 826 (1998).
\item Collective behavior in $^{78}$Rb, R.A. Kaye, L.A. Riley,
G.Z. Solomon, and S.L. Tabor, Phys. Rev. {\bf C58}, 3228 (1998).
\item Inverse Kinematics Proton Scattering on $^{18}$Ne and
Mirror Symmetry in $A=18$ Nuclei, L.A. Riley, J.K. Jewell,
P.D. Cottle, T. Glasmacher, K.W. Kemper, N. Alamanos, Y. Blumenfeld,
J.A. Carr, M.J. Chromik, R.W. Ibbotson, F. Mar\'echal, W.E. Ormand,
F. Petrovich, H. Scheit, and T. Suomij\"arvi, Phys. Rev. Lett.
{\bf 82}, 4196 (1999).
40
\item Intermediate-energy Coulomb excitation of $^{19}$Ne,
G. Hackman, Sam M. Austin, T. Glasmacher, T. Aumann, B.A. Brown,
R.W. Ibbotson, K. Miller, B. Pritychenko, L.A. Riley, and E. Spears,
Phys. Rev. {\bf C61}, 052801(R) (2000).
\item Conversion electron-$\gamma$ coincidences and intrinsic
reflection asymmetry in $^{219}$Ra,
L. A. Riley, P. D. Cottle, M. Fauerbach, V. S. Griffin,
B. N. Guy, K. W. Kemper, G. S. Rajbaidya, and
O. J. Tekyi-Mensah,
Phys. Rev. {\bf C62}, 021301(R) (2000).
\item Spectroscopy of the $2_1^+$ State in $^{22}$O and Shell
Structure Near the Neutron Drip Line,
P.G.~Thirolf, B.V.~Pritychenko, B.A.~Brown, P.D.~Cottle, M.~Chromik,
T.~Glasmacher, G.~Hackman, R.W.~Ibbotson, K.W. Kemper, T.~Otsuka,
L.A.~Riley, H.~Scheit,
Phys.Lett. {\bf 485B}, 16 (2000).
\item $B(E2;0_{gs}^+ \rightarrow 2_1^+)$ in $^{18}$Ne and
isospin purity in $A=18$ nuclei,
L.A. Riley, P.D. Cottle, M. Fauerbach, T. Glasmacher, K.W. Kemper,
B. V. Pritychenko, H.~Scheit, Phys. Rev. {\bf C62}, 034306 (2000).
\item First Observation of an Excited State in the Neutron-Rich
Nucleus $^{31}$Na, B.V.~Pritychenko, T.~Glasmacher, B.A.~Brown,
P.D.~Cottle, R.W.~Ibbotson, K.W.~Kemper, L.A.~Riley, H.~Scheit,
Phys.Rev. \textbf{C63}, 011305 (2001).
\end{enumerate}
Budget and Funding
two 933 MHz Pentium III processors
1GB memory
two 100 MB full duplex ethernet ports
two 18 GB 10,000 RPM Ultra Wide SCSI disk drives
CD-ROM drive
three year parts and service warranty
1GB backplane and matrix port
SNMP manageable
support for virtual LANs
dedicated cascade port
lifetime parts and service warranty
integrated cable management
rack mount
manageable via ethernet with SNMP
rack mount
manageable via ethernet with SNMP
commodity Pentium III class desktop unit
256 MB memory, monitor, keyboard, mouse
two 100 MB full duplex ethernet adapters
two 18 GB 10,000 RPM Ultra Wide SCSI disk drives
DDS4 SCSI tape drive
VASNet console management control unit
Current and Pending Support
Teaching support from Earlham College
Research support from Earlham College, private funds, and the Hughes Institute
Teaching support from Earlham College
Research support from Earlham College
Teaching support from Earlham College
Research support from Earlham College
Teaching support from Earlham College
Research support from Earlham College and the Hughes Institute
Teaching support from Earlham College
Research support from Earlham College and the Hughes Institute
Teaching support from Earlham College funds
Research support from Earlham College
Facilities, Equipment, and Other Resources
Letters
List of Suggested Reviewers
Professor of Computer Science, Calvin College
PI for a successful MRI grant funding a similar scale Beowulf cluster
adams@calvin.edu
Supplementary Documents
Return to the Table of Contents or the Start of this Section.