【kafka系列教程41】kafka监控 - Go语言中文社区

【kafka系列教程41】kafka监控


Monitoring

Kafka uses Yammer Metrics for metrics reporting in both the server and the client. This can be configured to report stats using pluggable stats reporters to hook up to your monitoring system.

Kafka使用Yammer Metrics(度量,也可称为指标)(在服务器和客户端之间的指标报告)。可以配置使用可插拔的记录统计连接到你的监控系统。

 

The easiest way to see the available metrics to fire up jconsole and point it at a running kafka client or server; this will all browsing all metrics with JMX.

最简单的方式是通过查看可用的指标来激活jconsole并将其指向正在运行的kafka客户端或服务器(将使用JMX游览所有的指标);

 

We pay particular we do graphing and alerting on the following metrics:

我们特别支持对以下指标进行图形化和警报:

 

 

Description Mbean name Normal value
Message in rate 
消息比率
kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec  
Byte in rate 
字节比率
kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec  
Request rate 
请求比率
kafka.network:type=RequestMetrics,name=RequestsPerSec,request={Produce|FetchConsumer|FetchFollower}  
Byte out rate 
字节输出比率
kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec  
Log flush rate and time 
日志冲洗比率和时间
kafka.log:type=LogFlushStats,name=LogFlushRateAndTimeMs  
# of under replicated partitions (|ISR| < |all replicas|)
关于副本分区 (|ISR| < |all replicas|)
kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions 0
Is controller active on broker
在broker上控制活跃
kafka.controller:type=KafkaController,name=ActiveControllerCount only one broker in the cluster should have 1
急群中仅1个应该有1
Leader election rate
leader选举比率
kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs non-zero when there are broker failures
非零,当broker失败
Unclean leader election rate

Unclean leader
选举的比率

kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec 0
Partition counts
分区总数
kafka.server:type=ReplicaManager,name=PartitionCount mostly even across brokers

大部分甚至跨broker

Leader replica counts
leader副本数
kafka.server:type=ReplicaManager,name=LeaderCount mostly even across brokers

大部分甚至跨broker

ISR shrink rate
ISR收缩比率
kafka.server:type=ReplicaManager,name=IsrShrinksPerSec If a broker goes down, ISR for some of the partitions will shrink. When that broker is up again, ISR will be expanded once the replicas are fully caught up. Other than that, the expected value for both ISR shrink rate and expansion rate is 0.
ISR expansion rate
ISR膨胀比率
kafka.server:type=ReplicaManager,name=IsrExpandsPerSec See above
Max lag in messages btw follower and leader replicas
跟随者和leader副本的最大消息落后
kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica < replica.lag.max.messages
Lag in messages per follower replica
每个跟随者副本的消息落后
kafka.server:type=FetcherLagMetrics,name=ConsumerLag,clientId=([-.w]+),topic=([-.w]+),partition=([0-9]+) < replica.lag.max.messages
Requests waiting in the producer purgatory
生产者purgatory请求告警
kafka.server:type=ProducerRequestPurgatory,name=PurgatorySize non-zero if ack=-1 is used
非零,如果ack=-1
Requests waiting in the fetch purgatory
拉取purgatory的请求告警
kafka.server:type=FetchRequestPurgatory,name=PurgatorySize size depends on fetch.wait.max.ms in the consumer
 
Request total time
请求总时间
kafka.network:type=RequestMetrics,name=TotalTimeMs,request={Produce|FetchConsumer|FetchFollower} broken into queue, local, remote and response send time
分成队列,本地,远程和响应发送时间
Time the request waiting in the request queue
在请求队列中等待请求的时间
kafka.network:type=RequestMetrics,name=QueueTimeMs,request={Produce|FetchConsumer|FetchFollower}  
Time the request being processed at the leader
leader处理请求的时间
kafka.network:type=RequestMetrics,name=LocalTimeMs,request={Produce|FetchConsumer|FetchFollower}  
Time the request waits for the follower
跟随者请求等待的时间
kafka.network:type=RequestMetrics,name=RemoteTimeMs,request={Produce|FetchConsumer|FetchFollower} non-zero for produce requests when ack=-1
当ack=-1,生产请求非零
 

non-zero for produce requests when ack=-1
当ack=-1,生产请求非零

Time to send the response
响应发送的时间
kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request={Produce|FetchConsumer|FetchFollower}  
Number of messages the consumer lags behind the producer by
消息数,消费者落后于消生产者
kafka.consumer:type=ConsumerFetcherManager,name=MaxLag,clientId=([-.w]+)  
The average fraction of time the network processors are idle
网络处理闲置的平均分数
kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent between 0 and 1, ideally > 0.3
0和1之间,理想地 > 0.3
The average fraction of time the request handler threads are idle
请求处理线程闲置的平均分数
kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent

between 0 and 1, ideally > 0.3

0和1之间,理想地 > 0.3

 

 

 

Common monitoring metrics for producer/consumer/connect

生产者/消费者/连接的共同监控指标

 

The following metrics are available on producer/consumer/connector instances. For specific metrics, please see following sections.

以下指标可用于生产者/消费者/连接器实例。有关具体的指标。请查看以下部分。

 

METRIC/ATTRIBUTE NAME

DESCRIPTION

MBEAN NAME

connection-close-rate

Connections closed per second in the window.
窗口每秒关闭的连接。

kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+)

connection-creation-rate

New connections established per second in the window.
窗口每秒建立的新连接。

kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+)

network-io-rate

The average number of network operations (reads or writes) on all connections per second.
所有连接每秒的平均网络操作数(读取或写入)。

kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+)

outgoing-byte-rate

The average number of outgoing bytes sent per second to all servers.
每秒向所有服务器发送的传出字节的平均数。

kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+)

request-rate

The average number of requests sent per second.
每秒发送请求的平均数。

kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+)

request-size-avg

The average size of all requests in the window.
窗口所有请求的平均大小。

kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+)

request-size-max

The maximum size of any request sent in the window.
窗口发送请求的最大值。

kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+)

incoming-byte-rate

Bytes/second read off all sockets.
字节/秒读取所有socket。

kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+)

response-rate

Responses received sent per second.
每秒响应收到的发送

kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+)

select-rate

Number of times the I/O layer checked for new I/O to perform per second.
I/O层每秒检查新I/O执行的次数。

kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+)

io-wait-time-ns-avg

The average length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds.
I/O线程花费在等待以纳秒为单位准备好读取或写入的socket的平均时间长度。

kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+)

io-wait-ratio

The fraction of time the I/O thread spent waiting.
I/O线程花费等待的时间的比例。

kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+)

io-time-ns-avg

The average length of time for I/O per select call in nanoseconds.
每个选择调用的I/O的平均时间长度(以纳秒为单位)。

kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+)

io-ratio

The fraction of time the I/O thread spent doing I/O.
I/O线程用于执行I/O的时间比例。

kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+)

connection-count

The current number of active connections.
当前活跃的连接数

kafka.[producer|consumer|connect]:type=[producer|consumer|connect]-metrics,client-id=([-.w]+)

 

Common Per-broker metrics for producer/consumer/connect

生产者/消费者/连接的broker指标

 

The following metrics are available on producer/consumer/connector instances. For specific metrics, please see following sections.

以下可用于生产者/消费者/连接器实例。有关具体指标,请参阅以下部分。
 

METRIC/ATTRIBUTE NAME

DESCRIPTION

MBEAN NAME

outgoing-byte-rate

The average number of outgoing bytes sent per second for a node.
每个节点每秒传出字节的平均数。

kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.w]+),node-id=([0-9]+)

request-rate

The average number of requests sent per second for a node.
每个节点每秒发送的平均请求数。

kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.w]+),node-id=([0-9]+)

request-size-avg

The average size of all requests in the window for a node.
每个节点窗口所有请求平均大小。

kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.w]+),node-id=([0-9]+)

request-size-max

The maximum size of any request sent in the window for a node.
每个节点窗口发送请求最大值。

kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.w]+),node-id=([0-9]+)

incoming-byte-rate

The average number of responses received per second for a node.
每个节点接收响应的平均时间。

kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.w]+),node-id=([0-9]+)

request-latency-avg

The average request latency in ms for a node.
节点等待平均请求延迟(毫秒)

kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.w]+),node-id=([0-9]+)

request-latency-max

The maximum request latency in ms for a node.
节点的请求最大延迟。

kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.w]+),node-id=([0-9]+)

response-rate

Responses received sent per second for a node.
节点每秒接收发送的响应。

kafka.producer:type=[consumer|producer|connect]-node-metrics,client-id=([-.w]+),node-id=([0-9]+)

 

 

Producer monitoring

生产者监控

 

The following metrics are available on producer instances.

以下指数可用于生产实例。

 

METRIC/ATTRIBUTE NAME

DESCRIPTION

MBEAN NAME

waiting-threads

The number of user threads blocked waiting for buffer memory to enqueue their records.
用户线程数,阻塞等待缓冲内存消息入队。

kafka.producer:type=producer-metrics,client-id=([-.w]+)

buffer-total-bytes

The maximum amount of buffer memory the client can use (whether or not it is currently used).
客户端可以使用的最大缓冲区内存(无论目前是否使用)

kafka.producer:type=producer-metrics,client-id=([-.w]+)

buffer-available-bytes

The total amount of buffer memory that is not being used (either unallocated or in the free list).
未使用的缓冲内存总量(未分配或在空闲列表中)。

kafka.producer:type=producer-metrics,client-id=([-.w]+)

bufferpool-wait-time

The fraction of time an appender waits for space allocation.
appender等待空间分配的时间比率。

kafka.producer:type=producer-metrics,client-id=([-.w]+)

batch-size-avg

The average number of bytes sent per partition per-request.
每个分区每个请求发送的平均字节数

kafka.producer:type=producer-metrics,client-id=([-.w]+)

batch-size-max

The max number of bytes sent per partition per-request.
每个分区每个请求发送的最大字节数

kafka.producer:type=producer-metrics,client-id=([-.w]+)

compression-rate-avg

The average compression rate of record batches.
消息批次的平均压缩比率

kafka.producer:type=producer-metrics,client-id=([-.w]+)

record-queue-time-avg

The average time in ms record batches spent in the record accumulator.
消息累加器花费消息批次的平均时间(毫秒)。

kafka.producer:type=producer-metrics,client-id=([-.w]+)

record-queue-time-max

The maximum time in ms record batches spent in the record accumulator.
消息累加器花费消息批次的最大时间(毫秒)。

kafka.producer:type=producer-metrics,client-id=([-.w]+)

request-latency-avg

The average request latency in ms.
请求平均延迟(毫秒)

kafka.producer:type=producer-metrics,client-id=([-.w]+)

request-latency-max

The maximum request latency in ms.
最大请求延迟(毫秒)

kafka.producer:type=producer-metrics,client-id=([-.w]+)

record-send-rate

The average number of records sent per second.
每秒发送的消息平均数。

kafka.producer:type=producer-metrics,client-id=([-.w]+)

records-per-request-avg

The average number of records per request.
每个请求的平均消息数

kafka.producer:type=producer-metrics,client-id=([-.w]+)

record-retry-rate

The average per-second number of retried record sends.
每秒重试消息发送的平均数。

kafka.producer:type=producer-metrics,client-id=([-.w]+)

record-error-rate

The average per-second number of record sends that resulted in errors.
引起错误的消息发送的每秒平均数。

kafka.producer:type=producer-metrics,client-id=([-.w]+)

record-size-max

The maximum record size.
最大消息大小

kafka.producer:type=producer-metrics,client-id=([-.w]+)

record-size-avg

The average record size.
平均消息大小

kafka.producer:type=producer-metrics,client-id=([-.w]+)

requests-in-flight

The current number of in-flight requests awaiting a response.
等待响应的当前请求数。

kafka.producer:type=producer-metrics,client-id=([-.w]+)

metadata-age

The age in seconds of the current producer metadata being used.
当前生产者元数据已使用的时间(以秒为单位)。

kafka.producer:type=producer-metrics,client-id=([-.w]+)

record-send-rate

The average number of records sent per second for a topic.
topic每秒发送的平均消息数。

kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+)

byte-rate

The average number of bytes sent per second for a topic.
topic每秒发送的平均字节数

kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+)

compression-rate

The average compression rate of record batches for a topic.
topic的消息批次的平均压缩比率。

kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+)

record-retry-rate

The average per-second number of retried record sends for a topic.
topic发送重试消息的每秒平均数

kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+)

record-error-rate

The average per-second number of record sends that resulted in errors for a topic.
topic引起错误的发送每秒平均数。

kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+)

produce-throttle-time-max

The maximum time in ms a request was throttled by a broker.
broker限制请求的最打时间(以毫秒为单位)

kafka.producer:type=producer-topic-metrics,client-id=([-.w]+)

produce-throttle-time-avg

The average time in ms a request was throttled by a broker.
broker限制请求的平均时间(以毫秒为单位)

kafka.producer:type=producer-topic-metrics,client-id=([-.w]+)

New consumer monitoring
新消费者监控

 

The following metrics are available on new consumer instances.

以下指标适用于新的消费者实例。

 

Consumer Group Metrics
消费者组指标

METRIC/ATTRIBUTE NAME

DESCRIPTION

MBEAN NAME

commit-latency-avg

The average time taken for a commit request
提交请求所需的平均时间

kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+)

commit-latency-max

The max time taken for a commit request
提交请求所需的最大时间

kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+)

commit-rate

The number of commit calls per second
每秒调用提交数

kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+)

assigned-partitions

The number of partitions currently assigned to this consumer
当前分配给此消费者的分区数

kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+)

heartbeat-response-time-max

The max time taken to receive a response to a heartbeat request
接收心跳请求响应所需的最大时间

kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+)

heartbeat-rate

The average number of heartbeats per second
每秒心跳的平均数

kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+)

join-time-avg

The average time taken for a group rejoin
group重新加入所需要的平均时间

kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+)

join-time-max

The max time taken for a group rejoin
group重新加入的最大时间

kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+)

join-rate

The number of group joins per second
每秒加入的group数

kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+)

sync-time-avg

The average time taken for a group sync
group同步所需的平均时间

kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+)

sync-time-max

The max time taken for a group sync
group同步所需的最大时间

kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+)

sync-rate

The number of group syncs per second
每秒group同步数

kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+)

last-heartbeat-seconds-ago

The number of seconds since the last controller heartbeat
上次控制器心跳之后的秒数

kafka.consumer:type=consumer-coordinator-metrics,client-id=([-.w]+)

 

Consumer Fetch Metrics
消费者拉取指标

METRIC/ATTRIBUTE NAME

DESCRIPTION

MBEAN NAME

fetch-size-avg

The average number of bytes fetched per request
每个请求拉取的平均字节数

kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+)

fetch-size-max

The maximum number of bytes fetched per request
每次请求拉取的最大字节数

kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+)

bytes-consumed-rate

The average number of bytes consumed per second
每秒消费的平均字节数

kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+)

records-per-request-avg

The average number of records in each request
每个请求的平均消息数

kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+)

records-consumed-rate

The average number of records consumed per second
每秒消费的消息平均数

kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+)

fetch-latency-avg

The average time taken for a fetch request
拉取请求所需的平均时间

kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+)

fetch-latency-max

The max time taken for a fetch request
拉取请求所需的最大时间

kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+)

fetch-rate

The number of fetch requests per second
每秒拉取请求数

kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+)

records-lag-max

The maximum lag in terms of number of records for any partition in this window
此窗口中任何分区消息数的最大落后

kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+)

fetch-throttle-time-avg

The average throttle time in ms
平均限制时间(毫秒)

kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+)

fetch-throttle-time-max

The maximum throttle time in ms
最大限流时间(毫秒)

kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+)

 

Topic-level Fetch Metrics

topic级别拉取指标

 

METRIC/ATTRIBUTE NAME

DESCRIPTION

MBEAN NAME

fetch-size-avg

The average number of bytes fetched per request for a specific topic.
每个分区针对特定topic拉取的平均字节数

kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+),topic=([-.w]+)

fetch-size-max

The maximum number of bytes fetched per request for a specific topic.
每个分区针对特定topic拉取的最大数

kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+),topic=([-.w]+)

bytes-consumed-rate

The average number of bytes consumed per second for a specific topic.
特定topic每秒消费的平均字节数

kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+),topic=([-.w]+)

records-per-request-avg

The average number of records in each request for a specific topic.
特定topic每个请求的平均消息数 

kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+),topic=([-.w]+)

records-consumed-rate

The average number of records consumed per second for a specific topic.
特定topic每秒消费的平均消息数

kafka.consumer:type=consumer-fetch-manager-metrics,client-id=([-.w]+),topic=([-.w]+)

Others

其他方面

We recommend monitoring GC time and other stats and various server stats such as CPU utilization, I/O service time, etc. On the client side, we recommend monitoring the message/byte rate (global and per topic), request rate/size/time, and on the consumer side, max lag in messages among all partitions and min fetch request rate. For a consumer to keep up, max lag needs to be less than a threshold and min fetch rate needs to be larger than 0.

我们建议监控GC时间和其他统计信息以及各种服务器状态,例如CPU利用率,I/O服务时间等。客户端方面,我们建议监控消息/字节速率(全局和每个topic),请求速率/大小/ 时间,并且在消费者方面,在所有分区之间的消息中的最大滞后和最小获取请求速率。 对于消费者来说,最大落后需要小于阈值,并且最少拉取速率需要大于0。

 

Audit

审计

The final alerting we do is on the correctness of the data delivery. We audit that every message that is sent is consumed by all consumers and measure the lag for this to occur. For important topics we alert if a certain completeness is not achieved in a certain time period. The details of this are discussed in KAFKA-260.
我们最后提醒的是数据传输的正确性。 我们审核发送的每条消息都由所有消费者消费,并估算发生这种情况的落后。 对于重要的topic,我们提醒,如果在一定时间内没有达到某种完整性。 详细内容在KAFKA-260中讨论。



 

kafka新生产监控

New producer monitoring

The following metrics are available on new producer instances.

Metric/Attribute name Description Mbean name
waiting-threads The number of user threads blocked waiting for buffer memory to enqueue their records kafka.producer:type=producer-metrics,client-id=([-.w]+)
buffer-total-bytes The maximum amount of buffer memory the client can use (whether or not it is currently used). kafka.producer:type=producer-metrics,client-id=([-.w]+)
buffer-available-bytes The total amount of buffer memory that is not being used (either unallocated or in the free list). kafka.producer:type=producer-metrics,client-id=([-.w]+)
bufferpool-wait-time The fraction of time an appender waits for space allocation. kafka.producer:type=producer-metrics,client-id=([-.w]+)
batch-size-avg The average number of bytes sent per partition per-request. kafka.producer:type=producer-metrics,client-id=([-.w]+)
batch-size-max The max number of bytes sent per partition per-request. kafka.producer:type=producer-metrics,client-id=([-.w]+)
compression-rate-avg The average compression rate of record batches. kafka.producer:type=producer-metrics,client-id=([-.w]+)
record-queue-time-avg The average time in ms record batches spent in the record accumulator. kafka.producer:type=producer-metrics,client-id=([-.w]+)
record-queue-time-max The maximum time in ms record batches spent in the record accumulator kafka.producer:type=producer-metrics,client-id=([-.w]+)
request-latency-avg The average request latency in ms kafka.producer:type=producer-metrics,client-id=([-.w]+)
request-latency-max The maximum request latency in ms kafka.producer:type=producer-metrics,client-id=([-.w]+)
record-send-rate The average number of records sent per second. kafka.producer:type=producer-metrics,client-id=([-.w]+)
records-per-request-avg The average number of records per request. kafka.producer:type=producer-metrics,client-id=([-.w]+)
record-retry-rate The average per-second number of retried record sends kafka.producer:type=producer-metrics,client-id=([-.w]+)
record-error-rate The average per-second number of record sends that resulted in errors kafka.producer:type=producer-metrics,client-id=([-.w]+)
record-size-max The maximum record size kafka.producer:type=producer-metrics,client-id=([-.w]+)
record-size-avg The average record size kafka.producer:type=producer-metrics,client-id=([-.w]+)
requests-in-flight The current number of in-flight requests awaiting a response. kafka.producer:type=producer-metrics,client-id=([-.w]+)
metadata-age The age in seconds of the current producer metadata being used. kafka.producer:type=producer-metrics,client-id=([-.w]+)
connection-close-rate Connections closed per second in the window. kafka.producer:type=producer-metrics,client-id=([-.w]+)
connection-creation-rate New connections established per second in the window. kafka.producer:type=producer-metrics,client-id=([-.w]+)
network-io-rate The average number of network operations (reads or writes) on all connections per second. kafka.producer:type=producer-metrics,client-id=([-.w]+)
outgoing-byte-rate The average number of outgoing bytes sent per second to all servers. kafka.producer:type=producer-metrics,client-id=([-.w]+)
request-rate The average number of requests sent per second. kafka.producer:type=producer-metrics,client-id=([-.w]+)
request-size-avg The average size of all requests in the window. kafka.producer:type=producer-metrics,client-id=([-.w]+)
request-size-max The maximum size of any request sent in the window. kafka.producer:type=producer-metrics,client-id=([-.w]+)
incoming-byte-rate Bytes/second read off all sockets kafka.producer:type=producer-metrics,client-id=([-.w]+)
response-rate Responses received sent per second. kafka.producer:type=producer-metrics,client-id=([-.w]+)
select-rate Number of times the I/O layer checked for new I/O to perform per second kafka.producer:type=producer-metrics,client-id=([-.w]+)
io-wait-time-ns-avg The average length of time the I/O thread spent waiting for a socket ready for reads or writes in nanoseconds. kafka.producer:type=producer-metrics,client-id=([-.w]+)
io-wait-ratio The fraction of time the I/O thread spent waiting. kafka.producer:type=producer-metrics,client-id=([-.w]+)
io-time-ns-avg The average length of time for I/O per select call in nanoseconds. kafka.producer:type=producer-metrics,client-id=([-.w]+)
io-ratio The fraction of time the I/O thread spent doing I/O kafka.producer:type=producer-metrics,client-id=([-.w]+)
connection-count The current number of active connections. kafka.producer:type=producer-metrics,client-id=([-.w]+)
outgoing-byte-rate The average number of outgoing bytes sent per second for a node. kafka.producer:type=producer-node-metrics,client-id=([-.w]+),node-id=([0-9]+)
request-rate The average number of requests sent per second for a node. kafka.producer:type=producer-node-metrics,client-id=([-.w]+),node-id=([0-9]+)
request-size-avg The average size of all requests in the window for a node. kafka.producer:type=producer-node-metrics,client-id=([-.w]+),node-id=([0-9]+)
request-size-max The maximum size of any request sent in the window for a node. kafka.producer:type=producer-node-metrics,client-id=([-.w]+),node-id=([0-9]+)
incoming-byte-rate The average number of responses received per second for a node. kafka.producer:type=producer-node-metrics,client-id=([-.w]+),node-id=([0-9]+)
request-latency-avg The average request latency in ms for a node. kafka.producer:type=producer-node-metrics,client-id=([-.w]+),node-id=([0-9]+)
request-latency-max The maximum request latency in ms for a node. kafka.producer:type=producer-node-metrics,client-id=([-.w]+),node-id=([0-9]+)
response-rate Responses received sent per second for a node. kafka.producer:type=producer-node-metrics,client-id=([-.w]+),node-id=([0-9]+)
record-send-rate The average number of records sent per second for a topic. kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+)
byte-rate The average number of bytes sent per second for a topic. kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+)
compression-rate The average compression rate of record batches for a topic. kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+)
record-retry-rate The average per-second number of retried record sends for a topic kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+)
record-error-rate The average per-second number of record sends that resulted in errors for a topic. kafka.producer:type=producer-topic-metrics,client-id=([-.w]+),topic=([-.w]+)

We recommend monitor GC time and other stats and various server stats such as CPU utilization, I/O service time, etc. On the client side, we recommend monitor the message/byte rate (global and per topic), request rate/size/time, and on the consumer side, max lag in messages among all partitions and min fetch request rate. For a consumer to keep up, max lag needs to be less than a threshold and min fetch rate needs to be larger than 0.

 

审核


The final alerting we do is on the correctness of the data delivery. We audit that every message that is sent is consumed by all consumers and measure the lag for this to occur. For important topics we alert if a certain completeness is not achieved in a certain time period. The details of this are discussed in KAFKA-260.
我们对数据传输的正确性进行最后的提醒,我们审核了所发送的消息全部由消费者消费的,并且衡量这种情况发生的延迟。对于重要的topic,如果在一定时间内某些完整性未能达到,我们进行告警。具体细节在KAFKA-260进行讨论。

KafkaOffsetMonitor:监控消费者和延迟的队列

KafkaOffsetMonitor是用来实时监控Kafka集群中的consumer以及在队列中的位置(偏移量)。

你可以查看当前的消费者组,每个topic队列的所有partition的消费情况。可以很快地知道每个partition中的消息是否很快被消费以及相应的队列消息增长速度等信息。这些可以debug kafka的producer和consumer,你完全知道你的系统将会发生什么。

这个web管理平台保留的partition offset和consumer滞后的历史数据(具体数据保存多少天我们可以在启动的时候配置),所以你可以很轻易了解这几天consumer消费情况。

KafkaOffsetMonitor这款软件是用Scala代码编写的,消息等历史数据是保存在名为offsetapp.db数据库文件中,该数据库是SQLLite文件,非常的轻量级。虽然我们可以在启动KafkaOffsetMonitor程序的时候指定数据更新的频率和数据保存的时间,但是不建议更新很频繁,或者保存大量的数据,因为在KafkaOffsetMonitor图形展示的时候会出现图像展示过慢,或者是直接导致内存溢出了。

所有的关于消息的偏移量、kafka集群的数量等信息都是从Zookeeper中获取到的,日志大小是通过计算得到的。

消费者组列表

screenshot

消费组的topic列表

screenshot

图中参数含义解释如下:

topic:创建时topic名称
partition:分区编号
offset:表示该parition已经消费了多少条message
logSize:表示该partition已经写了多少条message
Lag:表示有多少条message没有被消费。
Owner:表示消费者
Created:该partition创建时间
Last Seen:消费状态刷新最新时间。

topic的历史位置

screenshot

Offset存储位置

kafka能灵活地管理offset,可以选择任意存储和格式来保存offset。KafkaOffsetMonitor目前支持以下流行的存储格式。

  • kafka0.8版本以前,offset默认存储在zookeeper中(基于Zookeeper)
  • kafka0.9版本以后,offset默认存储在内部的topic中(基于Kafka内部的topic)
  • Storm Kafka Spout(默认情况下基于Zookeeper)

KafkaOffsetMonitor每个运行的实例只能支持单一类型的存储格式。

下载

可以到github下载KafkaOffsetMonitor源码。

https://github.com/quantifind/KafkaOffsetMonitor

编译KafkaOffsetMonitor命令:

sbt/sbt assembly

不过不建议你自己去下载,因为编译的jar包里引入的都是外部的css和js,所以打开必须联网,都是国外的地址,你编译的时候还要修改js路径,我已经搞定了,你直接下载就好了。

百度云盘:https://pan.baidu.com/s/1kUZJrCV

启动

编译完之后,将会在KafkaOffsetMonitor根目录下生成一个类似KafkaOffsetMonitor-assembly-0.3.0-SNAPSHOT.jar的jar文件。这个文件包含了所有的依赖,我们可以直接启动它:

java -cp KafkaOffsetMonitor-assembly-0.3.0-SNAPSHOT.jar 
     com.quantifind.kafka.offsetapp.OffsetGetterWeb 
     --offsetStorage kafka 
     --zk zk-server1,zk-server2 
     --port 8080 
     --refresh 10.seconds 
     --retain 2.days

启动方式2,创建脚本,因为您可能不是一个kafka集群。用脚本可以启动多个。

vim mobile_start_en.sh
        nohup java -Xms512M -Xmx512M -Xss1024K -XX:PermSize=256m -XX:MaxPermSize=512m -cp KafkaOffsetMonitor-assembly-0.3.0-SNAPSHOT.jar com.quantifind.kafka.offsetapp.OffsetGetterWeb 
       --offsetStorage kafka
       --zk 127.0.0.1:2181  
       --port 8080      
       --refresh 10.seconds      
       --retain 2.days 1>mobile-logs/stdout.log 2>mobile-logs/stderr.log &

各个参数的含义:

  • offsetStorage:有效的选项是"zookeeper","kafka","storm"。0.9版本以后,offset存储的位置在kafka。
  • zk: zookeeper的地址
  • prot 端口号
  • refresh 刷新频率,更新到DB。
  • retain 保留DB的时间
  • dbName 在哪里存储记录(默认'offsetapp')



 

版权声明:本文来源CSDN,感谢博主原创文章,遵循 CC 4.0 by-sa 版权协议,转载请附上原文出处链接和本声明。
原文链接:https://blog.csdn.net/dcm19920115/article/details/93389356
站方申明:本站部分内容来自社区用户分享,若涉及侵权,请联系站方删除。

0 条评论

请先 登录 后评论

官方社群

GO教程

推荐文章

猜你喜欢