我下午三点下线了controller节点1后,2被选为新的controller。但是发现有个topic A的partition1元数据信息不对,原先它的三个副本在2,3,4(2为leader)。但是看打印的日志中发现新controller 2 告诉6节点说topic A的partition1的副本是3,2,6,但并没有向2,3,4同步元数据的变更,导致6节点无故新增了一副本并向leader拉取一直失败,
以下是日志:各位大佬帮我看看这是为什么呢?感觉像是controller的下线导致元数据错乱了
6节点机器上的日志:
(两点的日志还正常)
[2021-03-16 14:12:39.979] TRACE [Broker id=6] Cached leader info UpdateMetadataPartitionState(topicName='A', p
artitionIndex=1, controllerEpoch=29, leader=2, leaderEpoch=13, isr=[2, 3, 4], zkVersion=25, replicas=[2, 3, 4], offlineReplicas=[
]) for partition A-1 in response to UpdateMetadata request sent by controller 1 epoch 29 with correlation id 0 (state.change.logger)
(这是重启controller后的日志)
[2021-03-16 15:00:46.600] TRACE [Broker id=6] Cached leader info UpdateMetadataPartitionState(topicName='A', p
artitionIndex=1, controllerEpoch=29, leader=2, leaderEpoch=13, isr=[2, 3, 4], zkVersion=25, replicas=[3, 2, 6], offlineReplicas=[
]) for partition A-1 in response to UpdateMetadata request sent by controller 2 epoch 30 with correlation id 0 (state.change.logger)
这个日志上还有一点疑惑的是controller epoch是30,但UpdateMetadataPartitionState里的controller epoch是29,和两点的一致。这块是不是有问题呢?
2节点机器上的日志:
[2021-03-16 15:07:26,164] WARN [ReplicaManager broker=2] Leader 2 failed to record follower 6's position 11679162 since the replica is not recognized to be one of the assigned replicas 3,4,2 for partition A-1. Empty records will be returned for this partition. (kafka.server.ReplicaManager)