Ned's Summer Research Web Page


Weekly Journal - Week 8


Monday, June 25, 2001

Today felt like a really slow day. Spent a good portion of it working on the routing and trying to figure out where things were going wrong. Routed doesn't seem to be doing any good and doesn't appear to be configurable (it just goes out there and does it's thing... not what we want). Charlie recommended some reading for me to see if it would help... hopefully it will since I've completely run out of ideas. The last idea I had today was to make some changes to the routing tables so that they treated intermediate hops as a gateway rather as individual hosts. This seems to have worked somewhat. We can now get to at least one of the intermediate nodes during a traceroute. There are still some problems that exist that prevent traceroute from going much further than one or two hops before starting to time-out. I think this is a result of the inner structure of the routing itself that results in a packet not necessarily returning on the same path (or interface) that it left on. For instance, when we to a traceroute from athena0 to athena5 (159.28.231.37 is the arriving interface on athena5), traceroute is able to get to athena4 (the intermediate step) but the other hops time-out. Then while the traceroute is still happening, if you do a tcpdump on athena4 on either eth3 or eth1 you will see the packets are actually going through athena4, but you may also notice that athena5 is not responding for whatever reason. Neither Aaron nor I can figure this out. Hopefully tonight's reading will enlighten me.

Tuesday, June 26, 2001

Spent a good part of the day reading. Still haven't figured out where the problem lies in the routing configuration that is causing it to do what it is doing. The one thing that I tried today was making it so that all of the routes leaving the node (even the single hop routes) were gateways. This had no effect beyond the results that we achieved yesterday. We can't seem to get the destination node to reply. The other key thing that we noticed was that the packets seemed to die in the same place regardless of where they were going (relatively two or three hops from where they started). Hopefully tonight's reading will shed some light on the matter. I also spend some time looking through the web for other places which had implemented a hypercube routing scheme. I can't seem to find any. All of the places that came close ended up just describing the idea, and then stating that because of the relatively low cost of 100Mb switches it was better/easier to go with a Star-topology rather than a hypercube. The concept of a hypercube is excellent, and I know that it should work. I just can't figure it out. On another note, I think we may have stumbled on something rather big in terms of pvfs. When looking into the problems that we are having with postgres and pvfs, we found out where the error is being created and have speculated that it is because of some bug in the way that pvfs implements hard-links. The reason that we think this is the problem lies in the creation of a hard link for a log file. The reason that Hassan't kludge of copying the data directory, after doing initdb on it, to the proper place was because the file didn't maintain its hard-link status as a result of the copy since cp -r doesn't preserve links. It then occurred to me that when I had tried the same maneuver with a move instead of a copy, it choked (mv preserves links). This has lead all of us to think that this is the problem, and we are now looking for solutions/evidence that this is true.

Wednesday, June 27, 2001

Today was the 3rd unsuccessful day this week for my troubleshooting of the hypercube routing. I still can't seem to figure it out. Every where I have looked I have been unable to find any hints as to how to begin to implement a hypercube routing scheme. All of my attempts have failed and I have run out of new ideas. My last idea is to use gated. It appears to be what I need. I've set up the config file for a scheme that I think will work, but I'm still dubious as to its success. I still want to do some more reading before I continue. Actually, I'm just trying put it off until we find out what the switch's backplane capacity is (since that may determine if we go from hypercube to switch). I know I shouldn't do this since there is still the possibility that the switch could end up being slower, but I can't really help it. I'm getting a little tired of doing the routing and can't seem to find motivation to keep working on it. Today I also started working on the ACL image package listing. This is one of the steps that hopefully lead to trimming down the ACL image. The process that I'm using is rather slow, but its a change from the routing.

Thursday, June 28, 2001

Continued working on the listing of packages today. Got it almost finished. I changed my tactic slightly and have started copying all of the actual RPM files that are installed into my home directory so that I can then run rpm2html on them and get a nice html archive of them. This would allow me to have package descriptions along with their names. I hope it works. Didn't get much else done today. Comparing package versions with the ones that are installed is rather time consuming, but it will hopefully be worth it.

Friday, June 29, 2001

Spent today trying to get rpm2html working. No luck. Turns out there is a bug in the binary that was packaged in the RPM. I can't seem to get around it. I tried running it on my own box in my room, but found out that the version that I have there requires MySQL to run. Forever why I don't know. Besides that, I got some reading done into the Gated problem. My goal is to get gated working sometime next week. Need to get motivated... that is the key.


Last modified: Thu Aug 2 13:58:49 EST 2001
Copyright © 2001, Ned Bingham ( binghne@cs.earlham.edu )