This tutorial describes how to build up an Ansible-based distributed analysis system for proteome analysis. Cloning the repository, setting up the required SSH keys, setting up the control node, and starting the analytic pipeline are all part of the setup procedure.
Ensure that the lecturer key was used while building the cluster. After that, run the following command:
ssh -i path/to/key ec2-user@ec2-35-178-42-199.eu-west-2.compute.amazonaws.comIn order for the control node to access the workers, this is necessary.
scp path/to/lecturer_key ec2-user@ec2-35-178-42-199.eu-west-2.compute.amazonaws.com:~/.ssh/lecturer_keyThis sets up the control node with Python, Pip, Git, and Ansible installed.
sudo yum update -y
sudo yum install -y python3 git
sudo yum install -y python3-pip
sudo pip3 install ansible
sudo pip3 install pandasUse the following command to clone the repository and install the necessary code on the control node:
git clone https://github.com/Dennoh12Place the ids you wish to analyze in the experiment_ids.txt file before starting the workflow.
Use this command to launch the distributed analysis:
cd cw0235
ansible-playbook --private-key=~/.ssh/lecturer_key -i hosts Biochemistry_PipelineFollowing pipeline completion, the following files in the cw0235 directory will contain the results:
- best_hits.csv
- stats.csv