在一个由3个节点组成的kafka集群里,有2个kafka节点一直重启失败,下面是其日志
[2023-09-15 06:16:30,240] INFO [Log partition=poc_alarm_queue-2, dir=/bitnami/kafka/data] Recovering unflushed segment 0 (kafka.log.Log)
[2023-09-15 06:16:30,240] INFO [Log partition=poc_alarm_queue-2, dir=/bitnami/kafka/data] Loading producer state till offset 0 with message format version 2 (kafka.log.Log)
[2023-09-15 06:16:31,981] INFO Terminating process due to signal SIGTERM (org.apache.kafka.common.utils.LoggingSignalHandler)
[2023-09-15 06:16:31,983] INFO [KafkaServer id=1] shutting down (kafka.server.KafkaServer)
[2023-09-15 06:16:31,987] ERROR [KafkaServer id=1] Fatal error during KafkaServer shutdown. (kafka.server.KafkaServer)
java.lang.IllegalStateException: Kafka server is still starting up, cannot shut down!
at kafka.server.KafkaServer.shutdown(KafkaServer.scala:602)
at kafka.server.KafkaServerStartable.shutdown(KafkaServerStartable.scala:54)
at kafka.Kafka$.$anonfun$main$3(Kafka.scala:80)
at kafka.utils.Exit$.$anonfun$addShutdownHook$1(Exit.scala:38)
at java.base/java.lang.Thread.run(Thread.java:834)
[2023-09-15 06:16:31,990] ERROR Halting Kafka. (kafka.server.KafkaServerStartable)
此外,kafka集群连接zk集群,zk集群看起来是正常的,用zkServer.sh status查看3个节点基本ok, 不过也有其客户端连接超时和失败的日志信息:
2023-09-15 06:13:03,026 [myid:3] - WARN [NIOWorkerThread-2:NIOServerCnxn@364] - Unexpected exception
EndOfStreamException: Unable to read additional data from client, it probably closed the socket: address = /192.168.4.12:31708, session = 0x3000004f3460053
at org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:163)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:326)
at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
2023-09-15 06:13:06,011 [myid:3] - INFO [CommitProcessor:3:LeaderSessionTracker@104] - Committing global session 0x10000043eed0061
2023-09-15 06:13:06,188 [myid:3] - INFO [CommitProcessor:3:LeaderSessionTracker@104] - Committing global session 0x2000003f1530054
2023-09-15 06:13:08,440 [myid:3] - INFO [SessionTracker:ZooKeeperServer@610] - Expiring session 0x10000043eed0060, timeout of 18000ms exceeded
2023-09-15 06:13:08,440 [myid:3] - INFO [RequestThrottler:QuorumZooKeeperServer@159] - Submitting global closeSession request for session 0x10000043eed0060
2023-09-15 06:13:20,441 [myid:3] - INFO [SessionTracker:ZooKeeperServer@610] - Expiring session 0x3000004f3460053, timeout of 18000ms exceeded
2023-09-15 06:13:20,441 [myid:3] - INFO [RequestThrottler:QuorumZooKeeperServer@159] - Submitting global closeSession request for session 0x3000004f3460053
2023-09-15 06:13:38,441 [myid:3] - INFO [SessionTracker:ZooKeeperServer@610] - Expiring session 0x2000003f1530053, timeout of 18000ms exceeded
2023-09-15 06:13:38,441 [myid:3] - INFO [RequestThrottler:QuorumZooKeeperServer@159] - Submitting global closeSession request for session 0x2000003f1530053
2023-09-15 06:13:50,441 [myid:3] - INFO [SessionTracker:ZooKeeperServer@610] - Expiring session 0x2000003f1530054, timeout of 18000ms exceeded
2023-09-15 06:13:50,441 [myid:3] - INFO [RequestThrottler:QuorumZooKeeperServer@159] - Submitting global closeSession request for session 0x2000003f1530054
2023-09-15 06:14:04,748 [myid:3] - INFO [CommitProcessor:3:LeaderSessionTracker@104] - Committing global session 0x3000004f3460054
2023-09-15 06:14:04,791 [myid:3] - INFO [RequestThrottler:QuorumZooKeeperServer@159] - Submitting global closeSession request for session 0x3000004f3460054
2023-09-15 06:14:04,903 [myid:3] - INFO [CommitProcessor:3:LeaderSessionTracker@104] - Committing global session 0x2000003f1530055
2023-09-15 06:14:28,637 [myid:3] - INFO [CommitProcessor:3:LeaderSessionTracker@104] - Committing global session 0x10000043eed0062
2023-09-15 06:14:28,807 [myid:3] - INFO [CommitProcessor:3:LeaderSessionTracker@104] - Committing global session 0x3000004f3460055
2023-09-15 06:14:48,441 [myid:3] - INFO [SessionTracker:ZooKeeperServer@610] - Expiring session 0x2000003f1530055, timeout of 18000ms exceeded
2023-09-15 06:14:48,441 [myid:3] - INFO [RequestThrottler:QuorumZooKeeperServer@159] - Submitting global closeSession request for session 0x2000003f1530055
2023-09-15 06:14:53,043 [myid:3] - WARN [NIOWorkerThread-1:NIOServerCnxn@364] - Unexpected exception
EndOfStreamException: Unable to read additional data from client, it probably closed the socket: address = /192.168.4.12:60734, session = 0x3000004f3460055
at org.apache.zookeeper.server.NIOServerCnxn.handleFailedRead(NIOServerCnxn.java:163)
at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:326)
at org.apache.zookeeper.server.NIOServerCnxnFactory$IOWorkRequest.doWork(NIOServerCnxnFactory.java:522)
at org.apache.zookeeper.server.WorkerService$ScheduledWorkRequest.run(WorkerService.java:154)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
2023-09-15 06:14:56,129 [myid:3] - INFO [CommitProcessor:3:LeaderSessionTracker@104] - Committing global session 0x2000003f1530056
2023-09-15 06:14:56,306 [myid:3] - INFO [CommitProcessor:3:LeaderSessionTracker@104] - Committing global session 0x2000003f1530057
2023-09-15 06:15:06,441 [myid:3] - INFO [SessionTracker:ZooKeeperServer@610] - Expiring session 0x3000004f3460055, timeout of 18000ms exceeded
2023-09-15 06:15:06,441 [myid:3] - INFO [RequestThrottler:QuorumZooKeeperServer@159] - Submitting global closeSession request for session 0x3000004f3460055
2023-09-15 06:15:40,441 [myid:3] - INFO [SessionTracker:ZooKeeperServer@610] - Expiring session 0x2000003f1530057, timeout of 18000ms exceeded
2023-09-15 06:15:40,441 [myid:3] - INFO [RequestThrottler:QuorumZooKeeperServer@159] - Submitting global closeSession request for session 0x2000003f1530057
请问这种情况一般如何定位解决?