Skip to content

Add a healthcheck to the dockercompose for crackq to detect when GPU devices have disappeared #46

@hkelley

Description

@hkelley

In an attempt to address the NVIDIA GPU flukiness (the crackq container sometimes loses the devices - NVIDIA/nvidia-container-toolkit#48), I'm experimenting with:

  1. Adding a healthcheck to the crackq service in docker-compose to detect when the GPUs go missing
    crackq:
        build:
            context: ./build
            dockerfile: Dockerfile
        image: "nvidia-ubuntu"
        ports:
            - "127.0.0.1:8080:8080"
        depends_on:
            - redis
        healthcheck:
          test: hashcat -I | grep 'Backend Device'
          interval: 5m
          retries: 1
          start_period: 60s
          timeout: 30s
        networks:
            - crackq_net
  1. Once I'm confident the healthcheck is reliable, adding a service for https://hub.docker.com/r/willfarrell/autoheal/ to the docker-compose. This should be able to restart the crackq container. https://stackoverflow.com/questions/47088261/restarting-an-unhealthy-docker-container-based-on-healthcheck

I will update this issue as I make progress.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions