This code amends the libDAI library to provide:
- A new parameter learning implementation called Age-Layered Expectation Maximization (ALEM), inspired by the ALPS genetic algorithm work of Greg Hornby of NASA Ames.
- A distributed parameter learning implementation using MapReduce and a population structure, with either ALEM or a large random restart-equivalent which I call Multiple Expectation Maximzation (MEM).
This code can be used to recreate the results shown in my paper:
Additionally, this work can be used (with more emphasis on high performance computing) to recreate the algorithms suggested in the following two papers:
# working directory: ./mapreduce
make
./main.shThis is useful for testing purposes before deploying on Amazon EC2 or another Hadoop cluster.
-
Install Hadoop (tested up to v1.1.1)
-
Set up environment variables (e.g.) export HADOOP_PREFIX=/home/erik/hadoop/hadoop-1.1.1
Also set up Hadoop for pseudo-distributed operation: http://hadoop.apache.org/docs/r1.1.0/single_node_setup.html
-
Run the following (I have hadoop in my path):
# working directory: .../libdai/mapreduce
make clobber # WARNING: this clears any existing HDFS data
# which seems to cause a bug sometimes when
# accessing the HDFS
hadoop namenode -format
start-all- Initialize a BN and some data
./scripts/init dat/bnets/asia.net 100 4- Start Hadoop streaming
make
./scripts/streaming dat/in- Gander at results
ls out- Stop Hadoop
stop-all-
$ make -
Launch an EC2 instance and send the mapreduce folder to it.
scp -rCi ~/Downloads/dai-mapreduce.pem mapreduce ec2-user@ec2-107-21-71-50.compute-1.amazonaws.com:- ssh into the instance and launch a cluster.
ssh -Ci ~/Downloads/dai-mapreduce.pem ec2-user@ec2-107-21-71-50.compute-1.amazonaws.com
# Assumes hadoop-ec2 has been configured.
# Launch a cluster of 10 (small) instances
hadoop-ec2 launch-cluster dai-mapreduce 10
# ... wait a really long time for the cluster to initialize- Push the mapreduce folder to the cluster master.
hadoop-ec2 push dai-mapreduce mapreduce- Login to the cluster master.
hadoop-ec2 login dai-mapreduce- Initialize a big bnet and start Hadoop streaming
./scripts/init dat/bnets/water.net 1000 10
# Do standard EM, pop-size=10, mappers=10
./scripts/streaming dat/in -u 10 10
# ... wait a few hours- Collect data!
ls out