librdkafka消费一段时间后报错REQTMOUT...Timed out FetchRequest in flight

wincent 发表于: 2023-02-28   最后更新时间: 2023-02-28 14:56:30   3,216 游览

kafka版本2.6.0

kafka正常运行,跨网域分布式部署,消费者远程消费broker消息,正常消费一段时间后报错:

%5|1677555150.963|REQTMOUT|rdkafka#consumer-2| [thrd:ssl://IP:9093/bootstrap]: ssl://IP:9093/0: Timed out FetchRequest in flight (after 61305ms, timeout #0)
%4|1677555150.963|REQTMOUT|rdkafka#consumer-2| [thrd:ssl://IP:9093/bootstrap]: ssl://IP:9093/0: Timed out 1 in-flight, 0 retry-queued, 0 out-queue, 0 partially-sent requests
%3|1677555150.967|FAIL|rdkafka#consumer-2| [thrd:ssl://IP:9093/bootstrap]: ssl://IP:9093/0: 1 request(s) timed out: disconnect (after 251161ms in state UP, 1 identical error(s) suppressed)
% ERROR CALLBACK: rdkafka#consumer-2: Local: Timed out: ssl://IP:9093/0: 1 request(s) timed out: disconnect (after 251161ms in state UP, 1 identical error(s) suppressed), error code [-185]

大概每60000ms报一次(1min),session.timeout.ms, heartbeat.interval.., request.timeout.ms等均采用的默认配置。

消费断断续续,时好时坏。

发表于 2023-02-28
添加评论

时好时坏,说明已经消费到消息了。

  1. 确认kafka集群是正常的(每个分区的leader都是ok的)
  2. 检查2个网络之间的连通性(是否丢包,是否有防火墙,安全策略等,kafka采用的是长连接,跨网超过一定时间可能会被安全策略kill掉)
  3. 增加拉取的等待时间 request.timeout.ms
  4. 减少拉取消息的数量 fetch.max.bytes

更多参数,参考Kafka Consumer配置

wincent -> 半兽人 1年前

已确认kafka集群状态正常,未中断过
两个网络长ping无中断且时延不高,telnet 9093端口能连通
增加了socket.timeout.ms和metadata.request.timeout.ms等配置,没效果
减少了fetch.max.bytes,没效果

消费客户端消费十来分钟,然后就开始刷

%5|1677654459.958|REQTMOUT|rdkafka#consumer-2| [thrd:ssl://IP:9093/bootstrap]: ssl://IP:9093/0: Timed out FetchRequest in flight (after 61041ms, timeout #0)
%5|1677654459.958|REQTMOUT|rdkafka#consumer-2| [thrd:ssl://IP:9093/bootstrap]: ssl://IP:9093/0: Timed out MetadataRequest in flight (after 60962ms, timeout #1)
%4|1677654459.958|REQTMOUT|rdkafka#consumer-2| [thrd:ssl://IP:9093/bootstrap]: ssl://IP:9093/0: Timed out 2 in-flight, 0 retry-queued, 0 out-queue, 0 partially-sent requests
%3|1677654459.961|FAIL|rdkafka#consumer-2| [thrd:ssl://IP:9093/bootstrap]: ssl://IP:9093/0: 2 request(s) timed out: disconnect (after 61044ms in state UP, 1 identical error(s) suppressed)
% ERROR CALLBACK: rdkafka#consumer-2: Local: Timed out: ssl://IP:9093/0: 2 request(s) timed out: disconnect (after 61044ms in state UP, 1 identical error(s) suppressed), error code [-185]

一段时间后又能消费到(少则半小时,多则两三个小时)

半兽人 -> wincent 1年前

还有一种可能,程序有bug,导致的。

wincent -> 半兽人 1年前

可是timeout一段时间后他能自主恢复啊,而且不起程序,用另一个demo去消费,也会timeout

半兽人 -> wincent 1年前

对的,就是因为会自主恢复我才这么判断的。我的意思是「用一个最原始的程序去消费,里面不要写任何业务逻辑」,去测试,我遇到过一些是自身逻辑导致的。

wincent -> 半兽人 1年前

对,我就是写了个demo,不掺杂业务逻辑,消费到的数据直接print出来,其他啥都没了。也超时。而且我用这个demo,直接在服务端消费,过会儿也会超时,崩溃ing

半兽人 -> wincent 1年前

同网段内,进行demo测试,是不是正常的?

wincent -> 半兽人 1年前

也是有时正常有时不正常。甚至就在本机上消费,也是这种情况。

半兽人 -> wincent 1年前

同网段,甚至本机上消费都有问题,基本上不是集群不正常,就是librdkafka客户端的问题了。
你用命令行消费测试一下吧。

wincent -> 半兽人 1年前

请教一下,我是ssl模式,命令行怎么消费啊?我只知道plaintext模式下怎么命令行消费

wincent -> 半兽人 1年前

我用命令行消费,为什么消费报错呢?

以下是 client-ssl.properties

security.protocol=SSL
ssl.keystore.location=/usr/local/etc/file-access/certs_ssl/client.keystore.jks
ssl.keystore.password=123456
ssl.key.password=123456
ssl.truststore.location=/usr/local/etc/file-access/certs_ssl/client.truststore.jks
ssl.truststore.password=123456
#ssl.client.auth=required
ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
ssl.keystore.type=JKS
ssl.truststore.type=JKS

命令行启动:

./bin/kafka-console-consumer.sh --bootstrap-server IP:9093 --topic topic_lanxin --consumer.config ./config/client-ssl.properties

命令行消费报错

[2023-03-02 14:12:15,275] ERROR [Consumer clientId=consumer-console-consumer-15662-1, groupId=console-consumer-15662] Connection to node -1 (/IP:9093) failed authentication due to: SSL handshake failed (org.apache.kafka.clients.NetworkClient)
[2023-03-02 14:12:15,276] WARN [Consumer clientId=consumer-console-consumer-15662-1, groupId=console-consumer-15662] Bootstrap broker IP:9093 (id: -1 rack: null) disconnected (org.apache.kafka.clients.NetworkClient)
[2023-03-02 14:12:15,334] ERROR Error processing message, terminating consumer process:  (kafka.tools.ConsoleConsumer$)
org.apache.kafka.common.errors.SslAuthenticationException: SSL handshake failed
Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem

客户端和服务端用的同一套证书,怎么会这样呢?求助

半兽人 -> wincent 1年前

主机名验证不正确,可参考:https://www.orchome.com/1299

wincent -> 半兽人 1年前

求救!现在情况是这样的,同时在远端启动demo、程序和命令行消费

  • demo:消费几分钟后报错timeout,重新运行,消费几分钟后报错
  • 程序:启动即报错timeout,半小时到两三小时后消费正常。消费正常时重启能正常消费,消费异 常是重启任然消费异常。
  • 命令行:消费5到35条消息后,一直卡住但不报错。ctrl C重启后又能消费几条,然后卡住。

在服务端用命令行消费时:

  • 命令行:消费5到35条消息后,一直卡住但不报错。ctrl C重启后又能消费几条,然后卡住。

重点来了:换一个test topic去消费,然后用脚本往test topic里面不停写数据,在远端用demo、命令行、程序消费均未出现如上的情况

半兽人 -> wincent 1年前

你消费者组的名字是不是共用了(demo、程序、命令),它们分别取走消费了。
你通过以下命令,看看具体的消费情况,可以发出来给我看看:

## 显示某个消费组的消费详情(0.10.1.0版本+)
bin/kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group my-gro

命令来自:kafka命令大全

wincent -> 半兽人 1年前

执行结果如下

[root@host-192-168-246-144 kafka_2.12-2.6.0]# ./bin/kafka-consumer-groups.sh --bootstrap-server host-192-168-246-144:9093 --describe --group rdkafka_consumer_example --command-config ./config/c1)
ent-ss1.properties
[2023-03-03 17:46:36,073] WARN The configuration 'ssl.keystore.type' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)
[2023-03-03 17:46:36,074]
[2023-03-03 17:46:36,074][2023-03-03 17:46:36,074][2023-03-03 17:46:36,074][2023-03-03 17:46:36,074]
WARN The configuration 'ssl.truststore.location' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)WARN The configuration 'ssl.keystore.password' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)MARN The configuration 'ssl.enabled.protocols' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)WARN The configuration 'ssl.key.password' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)] WARN The configuration 'ssl.keystore.location' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)[2023-03-03 17:46:36,874] MARN The configuration 'ssl.truststore.password" was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)[2023-03-03 17:46:36,074] WARN The configuration 'ssl.truststore.type' was supplied but isn't a known config.(org.apache.kafka.clients.admin.AdminClientConfig)
[2023-03-03 17:46:36,074] MARN The configuration 'ssl.endpoint.identification.algorithm' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)
Consumer group 'rdkafka_consumer_example' has no active members.
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
rdkafka_consumer_example topic_lanxin  0  44900  45995 1095 -  -  -
rdkafka_consumer_example test_wmy      0  359367  359387 0  -  -  -
wincent -> 半兽人 1年前

有尝试过使用不同的group.id,但是结果也是一段时间后不定时REQTMOUT

半兽人 -> wincent 1年前

解决了吗?显示某个消费组的消费详情你看了么

wincent -> 半兽人 1年前

仍未解决,消费组的消费详情如下:

[root@host-192-168-246-144 kafka_2.12-2.6.0]# ./bin/kafka-consumer-groups.sh --bootstrap-server host-192-168-246-144:9093 --describe --group rdkafka_consumer_example --command-config ./config/c1)
ent-ss1.properties
[2023-03-03 17:46:36,073] WARN The configuration 'ssl.keystore.type' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)
[2023-03-03 17:46:36,074]
[2023-03-03 17:46:36,074][2023-03-03 17:46:36,074][2023-03-03 17:46:36,074][2023-03-03 17:46:36,074]
WARN The configuration 'ssl.truststore.location' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)WARN The configuration 'ssl.keystore.password' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)MARN The configuration 'ssl.enabled.protocols' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)WARN The configuration 'ssl.key.password' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)] WARN The configuration 'ssl.keystore.location' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)[2023-03-03 17:46:36,874] MARN The configuration 'ssl.truststore.password" was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)[2023-03-03 17:46:36,074] WARN The configuration 'ssl.truststore.type' was supplied but isn't a known config.(org.apache.kafka.clients.admin.AdminClientConfig)
[2023-03-03 17:46:36,074] MARN The configuration 'ssl.endpoint.identification.algorithm' was supplied but isn't a known config. (org.apache.kafka.clients.admin.AdminClientConfig)
Consumer group 'rdkafka_consumer_example' has no active members.
GROUP                    TOPIC         PARTITION   CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
rdkafka_consumer_example topic_lanxin  0           44900          45995          1095  -          -       -
rdkafka_consumer_example test_wmy      0           359367         359387         0     -          -       -

有问题的是第一个“topic_lanxin”。
demo和代码都尝试过换成不同的group.id去消费,只能维持一会儿正常,过会儿还是会REQTMOUT

你的答案

查看kafka相关的其他问题或提一个您自己的问题