Deuce
A two-node Beowulf Cluster running Redhat Linux
Two PCs:
Master node:
Pentium II 300
64M RAM
40GB hard drive
2 100bT Ethernet NICs
3Com
Linksys
Slave node:
AMD K6-2 333
48M RAM
4GB hard drive
SMC 100bT NIC
Network:
Intel 8-port 10bT ethernet hub
Cat 5 ethernet cables
Software:
OS: Redhat Linux 9.0
http://www.redhat.com
Message passing: MPICH
http://www-unix.mcs.anl.gov/mpi/mpich/

"Deuce" Beowulf cluster
Deuce is the third Beowulf cluster I've built. The first
was Mini-wulf, which runs FreeBSD, and
the second is Zeus, which is still being put
together as I write this.
Purpose:
I built Deuce to gain some experience in setting up clusters
based on Redhat Linux. Since Zeus will run this OS, setting up a
minimal cluster with two old computers seemed like a good weekend
project.
Build:
The hardware build for Deuce was pretty straight forward.
I added the second network card in the Pentium II, configured both
systems to boot from their CDROM drives, and cabled things together
with the hub. I made install CDs from ISO images of Redhat 9.0
from a mirror server, booted each box in turn, and ran the install
program in text mode (since neither system had a mouse attached).
I chose an internal schema of 10.0.0.x for the LAN, since I
was already using 192.168.1.x on my home network, and didn't want to
get the two confused. The Redhat installer wasn't specific as to
which ethernet interface was which card, so I went ahead and configured
the two and waited to cable the master node until I could determine
that later. I used the 'custom' system type in the installer, and
added all the development packages I thought would be needed (gcc,
glibc, etc). The slave node I left pretty bare of compilers, since
none would be needed on it. Both systems got network services they
would need for things like ssh, rsh, and rlogin, as well as NFS and
NTP. Since deuce lives on my home network, behind a NAT router, I
disabled the firewalls on the master and slave. Normally one would
run a firewall only on the master node, and modify it so the slave
node(s) can attach to NFS, NTP, and rsh on the master.
After initial setup, I plugged the nodes into the hub and
tried pinging the slave from the master. Trial and error revealed
which ethernet interface was which, and I hooked up my home network
to the 'outside' interface on the master. I next modified the /etc/hosts file
(master slave) on
both nodes, adding the 'master' and 'slave' IP number assignments.
The /etc/ntp.conf on the master
was pointed at an internet time
server, and the slave /etc/ntp.conf
was pointed to the master.
Home directories were shared by adding:
/home 10.0.0.0/255.255.255.0(rw)
to the /etc/exports
file on the master. I moved the link in /etc/rc3.d from K20nfs
to S20nfs to cause nfs to start on boot.
After running '/etc/rc3.d/S20nfs restart'
to start the nfs server, I modified the slave
/etc/fstab file by adding:
master:/home home nfs rw 0 0
and mounted the directory with 'mount /home'. I created a personal account
on both machines, making sure to use the same uid, shell, and home
directory path.
Now the fun part: getting rsh to work. I added 'master' and 'slave'
to the /etc/hosts.equiv
file on both nodes, which should allow rsh
to run without individual .rhosts files in each home directory. No
dice (it didn't work). I had to edit the
/etc/xinetd.d/rsh
and /etc/xinetd.d/rlogin files,
setting 'disable' to 'no'. in
/etc/pam.d/rlogin I moved the
"auth sufficient pam_rhosts_auth.so" line to the top, which according
to the
Cluster quick start guide allows root to rlogin
without a password (scary). I also verifyed that
/etc/pam.d/rsh existed.
I did a '/etc/rc3.d/S56xinetd restart' to make the changes take effect, and
verified rsh functionality with 'rsh [node] date', where [node] was
the other cluster node on each machine as I tested it (i.e. I did
'rsh slave date' on the master node).
I downloaded MPICH 1.2.5 and ran './configure --prefix=/usr/local/mpich-1.2.5'.
'make' and (as root) 'make install' built and installed the package. I
modified the
/usr/local/mpich-1.2.5/share/machines.LINUX file to list
the master and slave nodes, and added /usr/local to
/etc/exports. I
mounted /usr/local on the slave node to allow access to mpich, and
modified my path to include /usr/local/mpich-1.2.5/bin for access to
the executables.
To test the system, I copied over the
cpi.c program, compiled
it with 'mpicc cpi.c -o cpi', and ran the program with one and two
nodes via 'mpirun -np 1 ./cpi' and 'mpirun -np 2 ./cpi'. Here's
the output:
[user@master ~/pi]$ mpirun -np 1 ./cpi
Process 0 of 1 on master
pi is approximately 3.1415926535897287, Error is 0.0000000000000644
wall clock time = 3.253982
[user@master ~/pi]$ mpirun -np 2 ./cpi
Process 0 of 2 on master
pi is approximately 3.1415926535899814, Error is 0.0000000000001883
wall clock time = 1.634312
Process 1 of 2 on slave
For further fun, I ran the primitive benchmark program I made
by modifying the cpi program on this cluster. Here are the results:
Number of nodes MFLOPS
--------------- ------
1 39.7
2 45.4
Conclusion:
This experiment taught me some of the specifics of setting up Redhat
linux 9.0 cluster systems. These will prove valuable in setting up and
trouble-shooting future clusters.
Update: May 19, 2003
Replaced 10/100 switch with 10bT hub. Although this cuts down the
bandwidth quite a bit, it did solve a flapping (NIC
speed flip-flopping) problem which was causing packet loss. I also ran the
cpi benchmark on
1 - 20 processes to gauge the scalability
of the cluster. This gave some rather odd results: running three processes gave
the optimum crunching ability on the two CPU cluster. I'm not sure why this
is.
Update: May 21, 2003
Ran the Pallas Benchmark
on Deuce. Here are the results. It looks like Deuce
is actually maxing out the 10bT hub in some instances, and getting close in others.
The large step discontinuity that Mini-wulf displayed is not evident on Deuce.
Update: October 23, 2003
Deuce has gone the way of Mini-wulf. Having
served its purpose (research on Redhat clustering), I pulled the second NIC from
the master node, reformatted the disk, and retasked the machine. The compute node
still runs, but with no compiler or other tools, it's not much use. I'll likely
reformat it with Fedora, as a test of
that new packaging of the old Redhat Linux distro.
So now I'm down to one cluster: Zeus. Building clusters out
of old computers is a real kick. I highly recommend it to any geeks out there
with too many computers on their hands. Enjoy!
Links:
Tools for building clusters:
Other cluster stuff: