社区微信群开通啦,扫一扫抢先加入社区官方微信群
社区微信群
Kafka uses Yammer Metrics for metrics reporting in both the server and the client. This can be configured to report stats using pluggable stats reporters to hook up to your monitoring system.
Kafka使用Yammer Metrics(度量,也可称为指标)(在服务器和客户端之间的指标报告)。可以配置使用可插拔的记录统计连接到你的监控系统。
The easiest way to see the available metrics to fire up jconsole and point it at a running kafka client or server; this will all browsing all metrics with JMX.
最简单的方式是通过查看可用的指标来激活jconsole并将其指向正在运行的kafka客户端或服务器(将使用JMX游览所有的指标);
We pay particular we do graphing and alerting on the following metrics:
我们特别支持对以下指标进行图形化和警报:
Description | Mbean name | Normal value |
---|---|---|
Message in rate 消息比率 |
kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec | |
Byte in rate 字节比率 |
kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec | |
Request rate 请求比率 |
kafka.network:type=RequestMetrics,name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower} | |
Byte out rate 字节输出比率 |
kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec | |
Log flush rate and time 日志冲洗比率和时间 |
kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs | |
# of under replicated partitions (|ISR| < |all replicas|) 关于副本分区 (|ISR| < |all replicas|) |
kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions | 0 |
Is controller active on broker 在broker上控制活跃 |
kafka.controller:type=KafkaController,name=ActiveControllerCount | only one broker in the cluster should have 1 急群中仅1个应该有1 |
Leader election rate leader选举比率 |
kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs | non-zero when there are broker failures 非零,当broker失败 |
Unclean leader election rate
Unclean leader |
kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec | 0 |
Partition counts 分区总数 |
kafka.server:type=ReplicaManager,name=PartitionCount | mostly even across brokers
大部分甚至跨broker |
Leader replica counts leader副本数 |
kafka.server:type=ReplicaManager,name=LeaderCount | mostly even across brokers
大部分甚至跨broker |
ISR shrink rate ISR收缩比率 |
kafka.server:type=ReplicaManager,name=IsrShrinksPerSec | If a broker goes down, ISR for some of the partitions will shrink. When that broker is up again, ISR will be expanded once the replicas are fully caught up. Other than that, the expected value for both ISR shrink rate and expansion rate is 0. |
ISR expansion rate ISR膨胀比率 |
kafka.server:type=ReplicaManager,name=IsrExpandsPerSec | See above |
Max lag in messages btw follower and leader replicas 跟随者和leader副本的最大消息落后 |
kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica | < replica.lag.max.messages |
Lag in messages per follower replica 每个跟随者副本的消息落后 |
kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.w]+),topic=([-.w]+),partition=([0-9]+) | < replica.lag.max.messages |
Requests waiting in the producer purgatory 生产者purgatory请求告警 |
kafka.server:type=ProducerRequestPurgatory,name=PurgatorySize | non-zero if ack=-1 is used 非零,如果ack=-1 |
Requests waiting in the fetch purgatory 拉取purgatory的请求告警 |
kafka.server:type=FetchRequestPurgatory,name=PurgatorySize | size depends on fetch.wait.max.ms in the consumer |
Request total time 请求总时间 |
kafka.network:type=RequestMetrics,name=TotalTimeMs,request={Produce|FetchConsumer|FetchFollower} | broken into queue, local, remote and response send time 分成队列,本地,远程和响应发送时间 |
Time the request waiting in the request queue 在请求队列中等待请求的时间 |
kafka.network:type=RequestMetrics,name=QueueTimeMs,request={Produce|FetchConsumer|FetchFollower} | |
Time the request being processed at the leader leader处理请求的时间 |
kafka.network:type=RequestMetrics,name=LocalTimeMs,request={Produce|FetchConsumer|FetchFollower} | |
Time the request waits for the follower 跟随者请求等待的时间 |
kafka.network:type=RequestMetrics,name=RemoteTimeMs,request={Produce|FetchConsumer|FetchFollower} | non-zero for produce requests when ack=-1 当ack=-1,生产请求非零 non-zero for produce requests when ack=-1 |
Time to send the response 响应发送的时间 |
kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request={Produce|FetchConsumer|FetchFollower} | |
Number of messages the consumer lags behind the producer by 消息数,消费者落后于消生产者 |
kafka.consumer:type=ConsumerFetcherManager,name=MaxLag,clientId=([-.w]+) | |
The average fraction of time the network processors are idle 网络处理闲置的平均分数 |
kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent | between 0 and 1, ideally > 0.3 0和1之间,理想地 > 0.3 |
The average fraction of time the request handler threads are idle 请求处理线程闲置的平均分数 |
kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent |
between 0 and 1, ideally > 0.3 0和1之间,理想地 > 0.3 |
生产者/消费者/连接的共同监控指标
The following metrics are available on producer/consumer/connector instances. For specific metrics, please see following sections.
以下指标可用于生产者/消费者/连接器实例。有关具体的指标。请查看以下部分。
METRIC/ATTRIBUTE NAME |
DESCRIPTION |
MBEAN NAME |
connection-close-rate |
Connections closed per second in the window. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+) |
connection-creation-rate |
New connections established per second in the window. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+) |
network-io-rate |
The average number of network operations (reads or writes) on all connections per second. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+) |
outgoing-byte-rate |
The average number of outgoing bytes sent per second to all servers. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+) |
request-rate |
The average number of requests sent per second. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+) |
request-size-avg |
The average size of all requests in the window. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+) |
request-size-max |
The maximum size of any request sent in the window. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+) |
incoming-byte-rate |
Bytes/second read off all sockets. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+) |
response-rate |
Responses received sent per second. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+) |
select-rate |
Number of times the I/O layer checked for new I/O to perform per second. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+) |
io-wait-time-ns-avg |
The average length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+) |
io-wait-ratio |
The fraction of time the I/O thread spent waiting. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+) |
io-time-ns-avg |
The average length of time for I/O per select call in nanoseconds. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+) |
io-ratio |
The fraction of time the I/O thread spent doing I/O. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+) |
connection-count |
The current number of active connections. |
kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+) |
生产者/消费者/连接的broker指标
The following metrics are available on producer/consumer/connector instances. For specific metrics, please see following sections.
以下可用于生产者/消费者/连接器实例。有关具体指标,请参阅以下部分。
METRIC/ATTRIBUTE NAME |
DESCRIPTION |
MBEAN NAME |
outgoing-byte-rate |
The average number of outgoing bytes sent per second for a node. |
kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.w]+),node-id=([0-9]+) |
request-rate |
The average number of requests sent per second for a node. |
kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.w]+),node-id=([0-9]+) |
request-size-avg |
The average size of all requests in the window for a node. |
kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.w]+),node-id=([0-9]+) |
request-size-max |
The maximum size of any request sent in the window for a node. |
kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.w]+),node-id=([0-9]+) |
incoming-byte-rate |
The average number of responses received per second for a node. |
kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.w]+),node-id=([0-9]+) |
request-latency-avg |
The average request latency in ms for a node. |
kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.w]+),node-id=([0-9]+) |
request-latency-max |
The maximum request latency in ms for a node. |
kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.w]+),node-id=([0-9]+) |
response-rate |
Responses received sent per second for a node. |
kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.w]+),node-id=([0-9]+) |
生产者监控
The following metrics are available on producer instances.
以下指数可用于生产实例。
METRIC/ATTRIBUTE NAME |
DESCRIPTION |
MBEAN NAME |
waiting-threads |
The number of user threads blocked waiting for buffer memory to enqueue their records. |
kafka.producer:type=producer-metrics,client-id=([-.w]+) |
buffer-total-bytes |
The maximum amount of buffer memory the client can use (whether or not it is currently used). |
kafka.producer:type=producer-metrics,client-id=([-.w]+) |
buffer-available-bytes |
The total amount of buffer memory that is not being used (either unallocated or in the free list). |
kafka.producer:type=producer-metrics,client-id=([-.w]+) |
bufferpool-wait-time |
The fraction of time an appender waits for space allocation. |
kafka.producer:type=producer-metrics,client-id=([-.w]+) |
batch-size-avg |
The average number of bytes sent per partition per-request. |
kafka.producer:type=producer-metrics,client-id=([-.w]+) |
batch-size-max |
The max number of bytes sent per partition per-request. |
kafka.producer:type=producer-metrics,client-id=([-.w]+) |
compression-rate-avg |
The average compression rate of record batches. |
kafka.producer:type=producer-metrics,client-id=([-.w]+) |
record-queue-time-avg |
The average time in ms record batches spent in the record accumulator. |
kafka.producer:type=producer-metrics,client-id=([-.w]+) |
record-queue-time-max |
The maximum time in ms record batches spent in the record accumulator. |
kafka.producer:type=producer-metrics,client-id=([-.w]+) |
request-latency-avg |
The average request latency in ms. |
kafka.producer:type=producer-metrics,client-id=([-.w]+) |
request-latency-max |
The maximum request latency in ms. |
kafka.producer:type=producer-metrics,client-id=([-.w]+) |
record-send-rate |
The average number of records sent per second. |
kafka.producer:type=producer-metrics,client-id=([-.w]+) |
records-per-request-avg |
The average number of records per request. |
kafka.producer:type=producer-metrics,client-id=([-.w]+) |
record-retry-rate |
The average per-second number of retried record sends. |
kafka.producer:type=producer-metrics,client-id=([-.w]+) |
record-error-rate |
The average per-second number of record sends that resulted in errors. |
kafka.producer:type=producer-metrics,client-id=([-.w]+) |
record-size-max |
The maximum record size. |
kafka.producer:type=producer-metrics,client-id=([-.w]+) |
record-size-avg |
The average record size. |
kafka.producer:type=producer-metrics,client-id=([-.w]+) |
requests-in-flight |
The current number of in-flight requests awaiting a response. |
kafka.producer:type=producer-metrics,client-id=([-.w]+) |
metadata-age |
The age in seconds of the current producer metadata being used. |
kafka.producer:type=producer-metrics,client-id=([-.w]+) |
record-send-rate |
The average number of records sent per second for a topic. |
kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+) |
byte-rate |
The average number of bytes sent per second for a topic. |
kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+) |
compression-rate |
The average compression rate of record batches for a topic. |
kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+) |
record-retry-rate |
The average per-second number of retried record sends for a topic. |
kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+) |
record-error-rate |
The average per-second number of record sends that resulted in errors for a topic. |
kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+) |
produce-throttle-time-max |
The maximum time in ms a request was throttled by a broker. |
kafka.producer:type=producer-topic-metrics,client-id=([-.w]+) |
produce-throttle-time-avg |
The average time in ms a request was throttled by a broker. |
kafka.producer:type=producer-topic-metrics,client-id=([-.w]+) |
The following metrics are available on new consumer instances.
以下指标适用于新的消费者实例。
Consumer Group Metrics
消费者组指标
METRIC/ATTRIBUTE NAME |
DESCRIPTION |
MBEAN NAME |
commit-latency-avg |
The average time taken for a commit request |
kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+) |
commit-latency-max |
The max time taken for a commit request |
kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+) |
commit-rate |
The number of commit calls per second |
kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+) |
assigned-partitions |
The number of partitions currently assigned to this consumer |
kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+) |
heartbeat-response-time-max |
The max time taken to receive a response to a heartbeat request |
kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+) |
heartbeat-rate |
The average number of heartbeats per second |
kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+) |
join-time-avg |
The average time taken for a group rejoin |
kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+) |
join-time-max |
The max time taken for a group rejoin |
kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+) |
join-rate |
The number of group joins per second |
kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+) |
sync-time-avg |
The average time taken for a group sync |
kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+) |
sync-time-max |
The max time taken for a group sync |
kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+) |
sync-rate |
The number of group syncs per second |
kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+) |
last-heartbeat-seconds-ago |
The number of seconds since the last controller heartbeat |
kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+) |
METRIC/ATTRIBUTE NAME |
DESCRIPTION |
MBEAN NAME |
fetch-size-avg |
The average number of bytes fetched per request |
kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+) |
fetch-size-max |
The maximum number of bytes fetched per request |
kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+) |
bytes-consumed-rate |
The average number of bytes consumed per second |
kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+) |
records-per-request-avg |
The average number of records in each request |
kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+) |
records-consumed-rate |
The average number of records consumed per second |
kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+) |
fetch-latency-avg |
The average time taken for a fetch request |
kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+) |
fetch-latency-max |
The max time taken for a fetch request |
kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+) |
fetch-rate |
The number of fetch requests per second |
kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+) |
records-lag-max |
The maximum lag in terms of number of records for any partition in this window |
kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+) |
fetch-throttle-time-avg |
The average throttle time in ms |
kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+) |
fetch-throttle-time-max |
The maximum throttle time in ms |
kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+) |
topic级别拉取指标
METRIC/ATTRIBUTE NAME |
DESCRIPTION |
MBEAN NAME |
fetch-size-avg |
The average number of bytes fetched per request for a specific topic. |
kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+),topic=([-.w]+) |
fetch-size-max |
The maximum number of bytes fetched per request for a specific topic. |
kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+),topic=([-.w]+) |
bytes-consumed-rate |
The average number of bytes consumed per second for a specific topic. |
kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+),topic=([-.w]+) |
records-per-request-avg |
The average number of records in each request for a specific topic. |
kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+),topic=([-.w]+) |
records-consumed-rate |
The average number of records consumed per second for a specific topic. |
kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+),topic=([-.w]+) |
其他方面
We recommend monitoring GC time and other stats and various server stats such as CPU utilization, I/O service time, etc. On the client side, we recommend monitoring the message/byte rate (global and per topic), request rate/size/time, and on the consumer side, max lag in messages among all partitions and min fetch request rate. For a consumer to keep up, max lag needs to be less than a threshold and min fetch rate needs to be larger than 0.
我们建议监控GC时间和其他统计信息以及各种服务器状态,例如CPU利用率,I/O服务时间等。客户端方面,我们建议监控消息/字节速率(全局和每个topic),请求速率/大小/ 时间,并且在消费者方面,在所有分区之间的消息中的最大滞后和最小获取请求速率。 对于消费者来说,最大落后需要小于阈值,并且最少拉取速率需要大于0。
审计
The final alerting we do is on the correctness of the data delivery. We audit that every message that is sent is consumed by all consumers and measure the lag for this to occur. For important topics we alert if a certain completeness is not achieved in a certain time period. The details of this are discussed in KAFKA-260.
我们最后提醒的是数据传输的正确性。 我们审核发送的每条消息都由所有消费者消费,并估算发生这种情况的落后。 对于重要的topic,我们提醒,如果在一定时间内没有达到某种完整性。 详细内容在KAFKA-260中讨论。
The following metrics are available on new producer instances.
Metric/Attribute name | Description | Mbean name |
---|---|---|
waiting-threads | The number of user threads blocked waiting for buffer memory to enqueue their records | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
buffer-total-bytes | The maximum amount of buffer memory the client can use (whether or not it is currently used). | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
buffer-available-bytes | The total amount of buffer memory that is not being used (either unallocated or in the free list). | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
bufferpool-wait-time | The fraction of time an appender waits for space allocation. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
batch-size-avg | The average number of bytes sent per partition per-request. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
batch-size-max | The max number of bytes sent per partition per-request. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
compression-rate-avg | The average compression rate of record batches. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
record-queue-time-avg | The average time in ms record batches spent in the record accumulator. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
record-queue-time-max | The maximum time in ms record batches spent in the record accumulator | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
request-latency-avg | The average request latency in ms | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
request-latency-max | The maximum request latency in ms | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
record-send-rate | The average number of records sent per second. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
records-per-request-avg | The average number of records per request. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
record-retry-rate | The average per-second number of retried record sends | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
record-error-rate | The average per-second number of record sends that resulted in errors | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
record-size-max | The maximum record size | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
record-size-avg | The average record size | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
requests-in-flight | The current number of in-flight requests awaiting a response. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
metadata-age | The age in seconds of the current producer metadata being used. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
connection-close-rate | Connections closed per second in the window. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
connection-creation-rate | New connections established per second in the window. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
network-io-rate | The average number of network operations (reads or writes) on all connections per second. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
outgoing-byte-rate | The average number of outgoing bytes sent per second to all servers. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
request-rate | The average number of requests sent per second. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
request-size-avg | The average size of all requests in the window. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
request-size-max | The maximum size of any request sent in the window. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
incoming-byte-rate | Bytes/second read off all sockets | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
response-rate | Responses received sent per second. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
select-rate | Number of times the I/O layer checked for new I/O to perform per second | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
io-wait-time-ns-avg | The average length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
io-wait-ratio | The fraction of time the I/O thread spent waiting. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
io-time-ns-avg | The average length of time for I/O per select call in nanoseconds. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
io-ratio | The fraction of time the I/O thread spent doing I/O | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
connection-count | The current number of active connections. | kafka.producer:type=producer-metrics,client-id=([-.w]+) |
outgoing-byte-rate | The average number of outgoing bytes sent per second for a node. | kafka.producer:type=producer-node-metrics,client-id=([-.w]+),node-id=([0-9]+) |
request-rate | The average number of requests sent per second for a node. | kafka.producer:type=producer-node-metrics,client-id=([-.w]+),node-id=([0-9]+) |
request-size-avg | The average size of all requests in the window for a node. | kafka.producer:type=producer-node-metrics,client-id=([-.w]+),node-id=([0-9]+) |
request-size-max | The maximum size of any request sent in the window for a node. | kafka.producer:type=producer-node-metrics,client-id=([-.w]+),node-id=([0-9]+) |
incoming-byte-rate | The average number of responses received per second for a node. | kafka.producer:type=producer-node-metrics,client-id=([-.w]+),node-id=([0-9]+) |
request-latency-avg | The average request latency in ms for a node. | kafka.producer:type=producer-node-metrics,client-id=([-.w]+),node-id=([0-9]+) |
request-latency-max | The maximum request latency in ms for a node. | kafka.producer:type=producer-node-metrics,client-id=([-.w]+),node-id=([0-9]+) |
response-rate | Responses received sent per second for a node. | kafka.producer:type=producer-node-metrics,client-id=([-.w]+),node-id=([0-9]+) |
record-send-rate | The average number of records sent per second for a topic. | kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+) |
byte-rate | The average number of bytes sent per second for a topic. | kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+) |
compression-rate | The average compression rate of record batches for a topic. | kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+) |
record-retry-rate | The average per-second number of retried record sends for a topic | kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+) |
record-error-rate | The average per-second number of record sends that resulted in errors for a topic. | kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+) |
We recommend monitor GC time and other stats and various server stats such as CPU utilization, I/O service time, etc. On the client side, we recommend monitor the message/byte rate (global and per topic), request rate/size/time, and on the consumer side, max lag in messages among all partitions and min fetch request rate. For a consumer to keep up, max lag needs to be less than a threshold and min fetch rate needs to be larger than 0.
KafkaOffsetMonitor
是用来实时监控Kafka集群中的consumer以及在队列中的位置(偏移量)。
你可以查看当前的消费者组,每个topic队列的所有partition的消费情况。可以很快地知道每个partition中的消息是否很快被消费以及相应的队列消息增长速度等信息。这些可以debug kafka的producer和consumer,你完全知道你的系统将会发生什么。
这个web管理平台保留的partition offset和consumer滞后的历史数据(具体数据保存多少天我们可以在启动的时候配置),所以你可以很轻易了解这几天consumer消费情况。
KafkaOffsetMonitor这款软件是用Scala代码编写的,消息等历史数据是保存在名为offsetapp.db数据库文件中,该数据库是SQLLite文件,非常的轻量级。虽然我们可以在启动KafkaOffsetMonitor程序的时候指定数据更新的频率和数据保存的时间,但是不建议更新很频繁,或者保存大量的数据,因为在KafkaOffsetMonitor图形展示的时候会出现图像展示过慢,或者是直接导致内存溢出了。
所有的关于消息的偏移量、kafka集群的数量等信息都是从Zookeeper中获取到的,日志大小是通过计算得到的。
图中参数含义解释如下:
topic:创建时topic名称
partition:分区编号
offset:表示该parition已经消费了多少条message
logSize:表示该partition已经写了多少条message
Lag:表示有多少条message没有被消费。
Owner:表示消费者
Created:该partition创建时间
Last Seen:消费状态刷新最新时间。
kafka能灵活地管理offset,可以选择任意存储和格式来保存offset。KafkaOffsetMonitor目前支持以下流行的存储格式。
KafkaOffsetMonitor每个运行的实例只能支持单一类型的存储格式。
可以到github下载KafkaOffsetMonitor
源码。
https://github.com/quantifind/KafkaOffsetMonitor
编译KafkaOffsetMonitor命令:
sbt/sbt assembly
不过不建议你自己去下载,因为编译的jar包里引入的都是外部的css和js,所以打开必须联网,都是国外的地址,你编译的时候还要修改js路径,我已经搞定了,你直接下载就好了。
百度云盘:https://pan.baidu.com/s/1kUZJrCV
编译完之后,将会在KafkaOffsetMonitor根目录下生成一个类似KafkaOffsetMonitor-assembly-0.3.0-SNAPSHOT.jar
的jar文件。这个文件包含了所有的依赖,我们可以直接启动它:
java -cp KafkaOffsetMonitor-assembly-0.3.0-SNAPSHOT.jar
com.quantifind.kafka.offsetapp.OffsetGetterWeb
--offsetStorage kafka
--zk zk-server1,zk-server2
--port 8080
--refresh 10.seconds
--retain 2.days
启动方式2,创建脚本,因为您可能不是一个kafka集群。用脚本可以启动多个。
vim mobile_start_en.sh
nohup java -Xms512M -Xmx512M -Xss1024K -XX:PermSize=256m -XX:MaxPermSize=512m -cp KafkaOffsetMonitor-assembly-0.3.0-SNAPSHOT.jar com.quantifind.kafka.offsetapp.OffsetGetterWeb
--offsetStorage kafka
--zk 127.0.0.1:2181
--port 8080
--refresh 10.seconds
--retain 2.days 1>mobile-logs/stdout.log 2>mobile-logs/stderr.log &
各个参数的含义:
如果觉得我的文章对您有用,请随意打赏。你的支持将鼓励我继续创作!