This repository contains the framework for a very basic HPC cluster based on Vagrant, Ansible, and OpenHPC. It is just enough to build four nodes, a frontend, and a master node on your laptop or desktop system. From there, you can customize it however you wish!
- Install Vagrant: https://www.vagrantup.com/
- Clone this repository
- Run the
gensshkeys.shscript to generate ssh keys in the ansible repository - (Optional) Copy
localenv.sh.intolocalenv.shand populate it with any local environment variables you need during the vagrant provisioning step (HTTP proxy information, for example) - Run
vagrant upto fire up the cluster - Once the cluster is booted, you can run
vagrant ssh masterto log in to the master node, orvagrant ssh fe1to log in to the frontend - Run
sinfoon the frontend or master node to see if Slurm sees that your nodes are up. If they are not, runsudo scontrol update nodename=node[01-04] state=resumeto wake them up. - Start using your cluster! At this point, you should be able to run a simple test across the cluster (
srun -N 4 /bin/hostname) or run some more complex jobs. - When you are done, shut down your cluster by logging out of it and running
vagrant halt. - If you want to completely rebuild your cluster, run
vagrant destroy, and then runvagrant upagain.
This virtual cluster is built around convenience, not security. It uses Vagrant's default ssh keys for convenience, and it contains some private keys (for munge, for example). This is good enough to run on an isolated desktop or laptop for experimentation, but you shouldn't plan to base an actual cluster configuration on its ansible repository without doing a good security sanity check.