Skip to content
This repository was archived by the owner on Apr 24, 2023. It is now read-only.
This repository was archived by the owner on Apr 24, 2023. It is now read-only.

Scheduler silently fails on malformed ZK URLs #950

@PerilousApricot

Description

@PerilousApricot

Describe the bug
In a couple places [1] [2], the user is instructed to postfix the ZK connection string with a directory (zk node?) /cook. If the user does this, the scheduler for some reason will never connect to the mesos master.

[1] https://github.com/twosigma/Cook/blob/master/scheduler/docs/configuration.adoc
[2] https://github.com/twosigma/Cook/blob/master/scheduler/example-prod-config.edn#L15

To Reproduce
Download the latest Cook, build, and manually set the :zookeeper {: connection} config option to have a trailing /cook. The scheduler will begin some preparatory work, then seemingly hang, just periodically writing heartbeat messages to the log. I can turn this failure mode on and off by adding/removing that suffix.

Expected behavior
I'd expect an explicit crash in this case. I presume that the scheduler can't attempts to perform master election and fails because of the invalid ZK hostname. Since I never saw an error, and one of the final lines in the log is from Cook trying to find the mesos scheduler, I tried debugging that interaction, when the true failure was elsewhere.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions