Skip to content

Conversation

@sanjaysrikakulam
Copy link

This PR adds the Ansible playbook and roles to configure and deploy the Koina server.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you delete this file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this PR deletes that file.

Comment on lines 32 to 62
- name: Enable UFW
community.general.ufw:
state: enabled

- name: Allow SSH (port 22)
community.general.ufw:
rule: allow
port: '22'
proto: tcp

- name: Allow HTTP (port 80)
community.general.ufw:
rule: allow
port: '80'
proto: tcp

- name: Allow HTTPS (port 443)
community.general.ufw:
rule: allow
port: '443'
proto: tcp

- name: Allow all outgoing traffic
community.general.ufw:
default: allow
direction: outgoing

- name: Deny all other incoming traffic
community.general.ufw:
default: deny
direction: incoming
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My experience with Ansible is limited. But would it make sense to group this in a role as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean. Would you like all of the fiewall rules and changes as a specific role?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created a new role and moved these tasks to that role.

Comment on lines 64 to 95
- name: Install Nvidia driver for GPU # This is not idempotent (need to re-work this).
ansible.builtin.command: ubuntu-drivers install --gpgpu
register: nvidia_install
changed_when: "'installed' in nvidia_install.stdout"

- name: Detect installed Nvidia server driver versions
ansible.builtin.command: bash -c "dpkg -l | awk '/nvidia-compute-utils-[0-9]+-server/{print $2}' | sort -V | tail -n 1"
register: nvidia_driver_pkg
changed_when: false

- name: Extract Nvidia driver version number
ansible.builtin.set_fact:
nvidia_driver_version: "{{ nvidia_driver_pkg.stdout | regex_search('[0-9]+') }}"

- name: Install matching Nvidia server-utils package
ansible.builtin.apt:
name: "nvidia-utils-{{ nvidia_driver_version }}-server"
state: present
update_cache: yes
when: nvidia_driver_version is defined and nvidia_driver_version | length > 0

- name: Reboot if Nvidia driver was installed # This is not idempotent due to the driver install step above (so reboot always runs :( ).
ansible.builtin.reboot:
msg: "Rebooting after Nvidia driver installation"
pre_reboot_delay: 10
when: "'installed' in nvidia_install.stdout"

- name: Check if nvidia-smi works
ansible.builtin.command: nvidia-smi
register: nvidia_smi
failed_when: nvidia_smi.rc != 0
changed_when: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above, all of these steps could be nicely bundled in a role.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have created a new role and moved these tasks to that role.

- role: geerlingguy.docker
vars:
docker_users:
- 'ubuntu'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this fail if we use a different user for ansible_ssh_user?

Copy link
Author

@sanjaysrikakulam sanjaysrikakulam Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I can update the readme so the users in the future using the playbook can update the value of the username accordingly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the readme as well as moved the value into a variable and added comments to make it clear.

Comment on lines +25 to +34
# - name: Ensure Koina server is running by curl health endpoint
# ansible.builtin.uri:
# url: "http://localhost:{{ koinahttp_docker_port }}/v2/health/ready"
# method: GET
# return_content: true
# status_code: 200
# register: koina_health_check
# retries: 3
# delay: 10
# until: koina_health_check.status == 200
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason this is included as comments?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the server startup and the loading of all the models takes more than 15 to 20 minutes, hence checking if this is ready via the task is not feasible, so I commented it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants