-
Notifications
You must be signed in to change notification settings - Fork 22
Add Ansible playbook and roles to configure the host and deploy Koina server #194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Ansible playbook and roles to configure the host and deploy Koina server #194
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you delete this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this PR deletes that file.
| - name: Enable UFW | ||
| community.general.ufw: | ||
| state: enabled | ||
|
|
||
| - name: Allow SSH (port 22) | ||
| community.general.ufw: | ||
| rule: allow | ||
| port: '22' | ||
| proto: tcp | ||
|
|
||
| - name: Allow HTTP (port 80) | ||
| community.general.ufw: | ||
| rule: allow | ||
| port: '80' | ||
| proto: tcp | ||
|
|
||
| - name: Allow HTTPS (port 443) | ||
| community.general.ufw: | ||
| rule: allow | ||
| port: '443' | ||
| proto: tcp | ||
|
|
||
| - name: Allow all outgoing traffic | ||
| community.general.ufw: | ||
| default: allow | ||
| direction: outgoing | ||
|
|
||
| - name: Deny all other incoming traffic | ||
| community.general.ufw: | ||
| default: deny | ||
| direction: incoming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My experience with Ansible is limited. But would it make sense to group this in a role as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean. Would you like all of the fiewall rules and changes as a specific role?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created a new role and moved these tasks to that role.
| - name: Install Nvidia driver for GPU # This is not idempotent (need to re-work this). | ||
| ansible.builtin.command: ubuntu-drivers install --gpgpu | ||
| register: nvidia_install | ||
| changed_when: "'installed' in nvidia_install.stdout" | ||
|
|
||
| - name: Detect installed Nvidia server driver versions | ||
| ansible.builtin.command: bash -c "dpkg -l | awk '/nvidia-compute-utils-[0-9]+-server/{print $2}' | sort -V | tail -n 1" | ||
| register: nvidia_driver_pkg | ||
| changed_when: false | ||
|
|
||
| - name: Extract Nvidia driver version number | ||
| ansible.builtin.set_fact: | ||
| nvidia_driver_version: "{{ nvidia_driver_pkg.stdout | regex_search('[0-9]+') }}" | ||
|
|
||
| - name: Install matching Nvidia server-utils package | ||
| ansible.builtin.apt: | ||
| name: "nvidia-utils-{{ nvidia_driver_version }}-server" | ||
| state: present | ||
| update_cache: yes | ||
| when: nvidia_driver_version is defined and nvidia_driver_version | length > 0 | ||
|
|
||
| - name: Reboot if Nvidia driver was installed # This is not idempotent due to the driver install step above (so reboot always runs :( ). | ||
| ansible.builtin.reboot: | ||
| msg: "Rebooting after Nvidia driver installation" | ||
| pre_reboot_delay: 10 | ||
| when: "'installed' in nvidia_install.stdout" | ||
|
|
||
| - name: Check if nvidia-smi works | ||
| ansible.builtin.command: nvidia-smi | ||
| register: nvidia_smi | ||
| failed_when: nvidia_smi.rc != 0 | ||
| changed_when: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above, all of these steps could be nicely bundled in a role.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created a new role and moved these tasks to that role.
| - role: geerlingguy.docker | ||
| vars: | ||
| docker_users: | ||
| - 'ubuntu' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this fail if we use a different user for ansible_ssh_user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I can update the readme so the users in the future using the playbook can update the value of the username accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the readme as well as moved the value into a variable and added comments to make it clear.
| # - name: Ensure Koina server is running by curl health endpoint | ||
| # ansible.builtin.uri: | ||
| # url: "http://localhost:{{ koinahttp_docker_port }}/v2/health/ready" | ||
| # method: GET | ||
| # return_content: true | ||
| # status_code: 200 | ||
| # register: koina_health_check | ||
| # retries: 3 | ||
| # delay: 10 | ||
| # until: koina_health_check.status == 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this is included as comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the server startup and the loading of all the models takes more than 15 to 20 minutes, hence checking if this is ready via the task is not feasible, so I commented it out.
This PR adds the Ansible playbook and roles to configure and deploy the Koina server.