Today was mostly spent finishing up the last of the synchronization for the athena cluster. PVFS is up and running properly on the cluster so we now have a 55GB drive to work with on the cluster. After manually executing all of the SSH connections to make the changes take effect, I decided to get C3 up and working. This is a really nice tool. I did have to configure the ssh servers on all of the athena boxes to allow root login, and then build ssh keys for root on noether, but I don't think that we are loosing any security by doing this. This model is similar to the one that I was planning on programming for use on the ACL to do the remote triggering of SystemImager. I really think we chose the right tools for synchronization since it appears that everyone is using the same ones and have already written the wrappers that I was planning on doing when I found the time. The one cool thing that I ran into today was the kernel module for PVFS. What was happening was insmod would work just fine, and the module would re-appear after a reboot. However, when we re-synchronized the machine, that would disappear, and we would have to re-run insmod. Turns out that /etc/modules is a file that contains the modules that must be loaded at boot regardless of what is found during probing. So, I just added the PVFS module to that file, and now the module loads all the time during boot. No more insmods to get the fs back... ain't that grand? I also learned that PVFS uses (or at least can use) striping to store the data on the IO nodes. That will be useful when we think about redundancy. Didn't get to starting Postgres yet. I still need to finish the compile and do the testing, but that shouldn't be a problem for tomorrow morning.
Worked on postgres today. Still didn't get it installed and working. Instead I worked
on putting it on the image so that we could eventually start using it on the rest of
the Athena later this summer. Let me just say that I love C3. I got it installed on
ACL0 and put the appropriate files on the rest of the ACL (that we have up and running)
and put the necessary files in the image. This way we can control the entire ACL from
one spot. One thing that I ran into was a small problem with the X servers that we are
running on the ACL and the shutdown command. Because we have X forwarding enabled, the
shutdown command wants a message to post to the display in order to warn the user that
the machine is going down. To get around this I set up the ssh command that runs the
shutdown command so that it would disable X forwarding for that session. Other than
that, it seems to be working just fine. One important thing is that by default c3
will attempt to shutdown the entire cluster (a 23 machines). For this summer I have
created a file c3-hostlists/summer-acl which contains a list of the ACL
boxes that we have up and running this summer. You need to pass this filename to the c3
command (via the -l option) or else it will try to reach the ACL boxes that
are not plugged in along with the ones that we have up and running.
Also helped Aaron with the routing table generator that he has been working on. We're
close, but not there yet. We still need to work around a DNS lookup problem where if
we have more than one IP listed for one node, it picks one of them at random to connect
to if we use the hostnames in the route command. Hopefully we can get the
routing working tomorrow. Now I'm off to go bail hay...
Got Postgres installed today. I'm having some big problems with getting the initdb script to work properly. I keep getting a Broken Pipe error that prevents the database from being initialized. I'm wondering if this is being caused in part by the routing. Regardless, Aaron and I figured that it might be good to get the routing working before trying any more with PVFS so that we could make sure that the routing was not the problem. Hassan was able to make a work-around which involved initializing the database in a different location and then moving it to the proper location. The one attempt I made at that was unsuccessful and produced the same error as I got when I attempted to run initdb in the proper location. Hopefully we can get the routing worked out tomorrow so that we can eliminate it as a possible problem spot.
Well, we got the routing tables populated properly. It still isn't working the way it should. This is probably because of the lacking of a routing deamon that is working properly. I managed to install the deb for routed, but didn't configure it completely (or at all for that mater). I'll work on that early next week. I have yet to set up the script that will set up the routing and start pvfs properly when the athena and noether are rebooted. I'll work on that early next week also. We also managed to finally determine how large the block size for a standard ext2 filesystem. The command dumpfs can be used to obtain all kinds of useful information about a formated filesystem. dumpfs -h produces a slightly thinned down version of the output. The reason that we were looking for this was to look into changing some of the striping parameters for pvfs so that it became more efficient. Hassan made the changes to the pvfs on the pentagon and ran some of his postgres tests and determined that there was relatively little improvement. Hopefully we can get the routing figured out sooner rather than later.
Today was the first day that we devoted to taking a break from software and programming and worked on some of the hardware projects that we have in our cue. Abby and I have paired up to work on the Motion Sensor for the glass case that houses the Athena Cluster. The goal is to wire the sensor up so that when it is triggered by movement, it produces the necessary signal on a serial port to cause the Screensaver to stop and display the the desktop for Noether. Some ideas for what should be on the desktop have been to have a browser opened up to show the SNMP data that we collect from the Cluster and Noether or some other display that shows what is happening in the cluster. Abby and I managed to get the sensor wired up and I was able to solder the wires that connect it to a power-supply, but there wasn't much progress beyond that. I think the next step would be to figure out what signal we need to produce to cause the serial port to shut-off the Screensaver.