目前zookeeper获取监控指标已知的有两种方式:
上述两种方式获取的指标大体上是一致的。
通过下列命令来获取这些监控信息 echo commands | nc ip port
,如:
echo conf | nc 192.168.144.110 2181
现在把能获取到的监控大致列出来:
# 能够获取到zookeeper的配置信息,包括客户端端口
clientPort=2181
# 数据以及日志路径
dataDir=/data/zookeeper/data/version-2
dataLogDir=/data/zookeeper/logs/version-2
# 间隔单位时间
tickTime=2000
# 单台server与单个client端的连接数限制
maxClientCnxns=60
# 超时时间
minSessionTimeout=4000
maxSessionTimeout=40000
# serverId等等信息
serverId=3
initLimit=10
syncLimit=5
electionAlg=3
electionPort=3883
quorumPort=2882
peerType=0
Follower在启动过程中,会从Leader同步所有最新数据,然后确定自己能够对外服务的起始状态。Leader允许F在initLimit时间内完成这个工作。
在运行过程中,Leader负责与ZK集群中所有机器进行通信,例如通过一些心跳检测机制,来检测机器的存活状态。如果L发出心跳包在syncLimit之后,还没有从F那里收到响应,那么就认为这个F已经不在线了。
$ echo conf | nc 10.23.134.136 2181
clientPort=2181
dataDir=/data/zookeeper/data/version-2
dataLogDir=/data/zookeeper/logs/version-2
tickTime=2000
maxClientCnxns=60
minSessionTimeout=4000
maxSessionTimeout=40000
serverId=3
initLimit=10
syncLimit=5
electionAlg=3
electionPort=3883
quorumPort=2882
peerType=0
[root@10-23-134-136 zookeeper-3.4.9]# echo cons | nc 10.23.134.136 2181
/10.23.85.215:35330[1](queued=0,recved=182545,sent=182545,sid=0x35f04f1c5270134,lop=PING,est=1511928522848,to=30000,lcxid=0xb9,lzxid=0x200ee5e25,lresp=1513755154871,llat=0,minlat=0,avglat=0,maxlat=34)
/10.23.82.55:34912[1](queued=0,recved=222210,sent=222210,sid=0x35f04f1c527014f,lop=PING,est=1511531568580,to=30000,lcxid=0x106,lzxid=0x200ee5e25,lresp=1513755155979,llat=0,minlat=0,avglat=0,maxlat=65)
/10.23.219.193:60010[1](queued=0,recved=174375,sent=174632,sid=0x35f04f1c52701d5,lop=PING,est=1512031236706,to=30000,lcxid=0x846,lzxid=0x200ee5e2f,lresp=1513755161205,llat=0,minlat=0,avglat=0,maxlat=4)
/10.23.235.137:59582[1](queued=0,recved=399,sent=399,sid=0x35f04f1c5270228,lop=PING,est=1513751396322,to=30000,lcxid=0x16,lzxid=0x200ee5e2f,lresp=1513755162695,llat=1,minlat=0,avglat=0,maxlat=1)
/10.23.80.170:44620[1](queued=0,recved=8357,sent=8357,sid=0x35f04f1c5270220,lop=PING,est=1513674079298,to=30000,lcxid=0xfe,lzxid=0x200ee5e25,lresp=1513755155678,llat=1,minlat=0,avglat=0,maxlat=3)
只看最后的连接列表,其中一条的描述。
/10.23.85.215:35330[1](queued=0,recved=182545,sent=182545,sid=0x35f04f1c5270134,lop=PING,est=1511928522848,to=30000,lcxid=0xb9,lzxid=0x200ee5e25,lresp=1513755154871,llat=0,minlat=0,avglat=0,maxlat=34)
重置连接状态,重置关于链接/session的统计信息,是一个execute
操作 不是一个select
操作,执行后返回一个状态信息:
$ echo crst|nc 10.0.21.56 2181
Connection stats reset.
输出所有等待队列中的会话和临时节点的信息。
$ echo dump|nc 10.0.21.56 2181
0x24b3673bb140000:
/magpie/workerbeats/11011599
0x14b36741ee41de4:
/phenix/servers/px0000000816
/phenix/myGroups/1
0x14b049fe56b89e5:
/hbase/rs/hhz111,60021,1422454057830
0x4b305d40f92989:
/hbase/rs/hhz115,60021,1422521527024
0x14b36741ee41edc:
/magpie/workerbeats/3502573
0x24b3673bb141dc6:
/magpie/workerbeats/3002570
$ echo envi|nc 10.0.21.56 2181
zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
host.name=hhz112
java.version=1.7.0_60
java.vendor=Oracle Corporation
java.home=/export/servers/jdk1.7.0_60/jre
java.class.path=/export/servers/zookeeper-3.4.6/bin/../build/classes:/export/servers/zookeeper-3.4.6/bin/../build/lib/*.jar:/export/servers/zookeeper-3.4.6/bin/../lib/slf4j-log4j12-1.6.1.jar:/export/servers/zookeeper-3.4.6/bin/../lib/slf4j-api-1.6.1.jar:/export/servers/zookeeper-3.4.6/bin/../lib/netty-3.7.0.Final.jar:/export/servers/zookeeper-3.4.6/bin/../lib/log4j-1.2.16.jar:/export/servers/zookeeper-3.4.6/bin/../lib/jline-0.9.94.jar:/export/servers/zookeeper-3.4.6/bin/../zookeeper-3.4.6.jar:/export/servers/zookeeper-3.4.6/bin/../src/java/lib/*.jar:/export/servers/zookeeper-3.4.6/bin/../conf:/export/servers/zookeeper-3.4.6/bin/../build/classes:/export/servers/zookeeper-3.4.6/bin/../build/lib/*.jar:/export/servers/zookeeper-3.4.6/bin/../lib/slf4j-log4j12-1.6.1.jar:/export/servers/zookeeper-3.4.6/bin/../lib/slf4j-api-1.6.1.jar:/export/servers/zookeeper-3.4.6/bin/../lib/netty-3.7.0.Final.jar:/export/servers/zookeeper-3.4.6/bin/../lib/log4j-1.2.16.jar:/export/servers/zookeeper-3.4.6/bin/../lib/jline-0.9.94.jar:/export/servers/zookeeper-3.4.6/bin/../zookeeper-3.4.6.jar:/export/servers/zookeeper-3.4.6/bin/../src/java/lib/*.jar:/export/servers/zookeeper-3.4.6/bin/../conf:.:/export/servers/jdk1.6.0_25/lib/dt.jar:/export/servers/jdk1.6.0_25/lib/tools.jar
java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
java.io.tmpdir=/tmp
java.compiler=<NA>
os.name=Linux
os.arch=amd64
os.version=2.6.32-358.el6.x86_64
user.name=hhz
user.home=/home/hhz
user.dir=/export/servers/zookeeper-3.4.6
查询当前server状态是否正常 若正常返回imok
imok
同样是一个execute
操作而不是select
,重置server状态:
$ echo srst|nc 10.0.21.56 2181
Server stats reset.
$ echo srvr | nc 192.168.144.110 2181
Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT
Latency min/avg/max: 0/0/182
Received: 97182
Sent: 97153
Connections: 22
Outstanding: 8
Zxid: 0x68000af381
Mode: follower
Node count: 101065
一些状态信息和连接信息,是前面一些信息的组合:
Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT
Clients:
/192.168.147.102:56168[1](queued=0,recved=41,sent=41)
/192.168.144.102:34378[1](queued=0,recved=54,sent=54)
/192.168.162.16:43108[1](queued=0,recved=40,sent=40)
/192.168.144.107:39948[1](queued=0,recved=1421,sent=1421)
/192.168.162.16:43112[1](queued=0,recved=54,sent=54)
/192.168.162.16:43107[1](queued=0,recved=54,sent=54)
/192.168.162.16:43110[1](queued=0,recved=53,sent=53)
/192.168.144.98:34702[1](queued=0,recved=41,sent=41)
/192.168.144.98:34135[1](queued=0,recved=61,sent=65)
/192.168.162.16:43109[1](queued=0,recved=54,sent=54)
/192.168.147.102:56038[1](queued=0,recved=165313,sent=165314)
/192.168.147.102:56039[1](queued=0,recved=165526,sent=165527)
/192.168.147.101:44124[1](queued=0,recved=162811,sent=162812)
/192.168.147.102:39271[1](queued=0,recved=41,sent=41)
/192.168.144.107:45476[1](queued=0,recved=166422,sent=166423)
/192.168.144.103:45100[1](queued=0,recved=54,sent=54)
/192.168.162.16:43133[0](queued=0,recved=1,sent=0)
/192.168.144.107:39945[1](queued=0,recved=1825,sent=1825)
/192.168.144.107:39919[1](queued=0,recved=325,sent=325)
/192.168.144.106:47163[1](queued=0,recved=17891,sent=17891)
/192.168.144.107:45488[1](queued=0,recved=166554,sent=166555)
/172.17.36.11:32728[1](queued=0,recved=54,sent=54)
/192.168.162.16:43115[1](queued=0,recved=54,sent=54)
Latency min/avg/max: 0/0/599
Received: 224869
Sent: 224817
Connections: 23
Outstanding: 0
Zxid: 0x68000af707
Mode: follower
Node count: 101081
有watch path的连接数 以及watch的path数 和 watcher数
13 connections watching 102 paths
Total watches:172
连接监听的所有path:(考虑吧cons命令 信息整合)
0x24b3673bb14001f
/hbase/root-region-server
/hbase/master
path被那些连接监听:(考虑把cons命令 信息整合)
/dubbo/FeedInterface/configurators
0x4b3673ce4a1a4d
/dubbo/UserInterface/providers
0x14b36741ee41b17
0x4b3673ce4a1a4d
0x24b3673bb1401d2
0x4b3673ce4a1ab7
用于监控 zookeeper server
健康状态的各种指标:
$ echo mntr | nc 192.168.144.110 2181
zk_version 3.4.6-1569965, built on 02/20/2014 09:09 GMT
zk_avg_latency 0
zk_max_latency 2155
zk_min_latency 0
zk_packets_received 64610660
zk_packets_sent 64577070
zk_num_alive_connections 42
zk_outstanding_requests 0
zk_server_state leader
zk_znode_count 101125
zk_watch_count 315
zk_ephemerals_count 633
zk_approximate_data_size 27753592
zk_open_file_descriptor_count 72
zk_max_file_descriptor_count 4096
zk_followers 2
zk_synced_followers 2
zk_pending_syncs 0
以上是基于3.4版本的zookeeper four letter words 能拿出的所有信息指标。