Skip to content

Pick the first broker that's not down as preferred#34

Open
psycotica0-shopify wants to merge 1 commit intowvanbergen:masterfrom
psycotica0-shopify:fix_preferred_leader
Open

Pick the first broker that's not down as preferred#34
psycotica0-shopify wants to merge 1 commit intowvanbergen:masterfrom
psycotica0-shopify:fix_preferred_leader

Conversation

@psycotica0-shopify
Copy link
Copy Markdown

If a broker was down, but a partition had it at the head of its replica list, preferred_leader was nil, even when other replicas were up.

Comment thread lib/kazoo/partition.rb

def preferred_leader
@replicas.first
@replicas.compact.first
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the problem lies somewhere else.

Kafka stores the replicas in order of preference. The first one is the preferred replica. E.g. [1,2,3], in which 1 is the preferred one.

In this case, 1 doesn't get set because that broker is down. However, that doesn't change the fact that 1 is the preferred replica. The real issue is: why does it show up as nil when it is down?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I was AFK for a week.
replicas is passed into the constructor for partition in topic.
Around line 171 it gets the list of replica ids from each partition of the topic, and indexes into cluster.brokers[id] to try and turn the id into a broker.

If the cluster doesn't contain that broker object, then that returns nil.

Should it be filtered out at that level, and only up brokers be in the replicas array, or should the cluster know about the down broker, even if it's down?

Copy link
Copy Markdown
Owner

@wvanbergen wvanbergen Nov 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be that we still have a Broker object with the proper ID, but with an (un)available? method. That way we can do things like partition.preferred_replica.available?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants