社区微信群开通啦,扫一扫抢先加入社区官方微信群
社区微信群
- Scalar: This is the simplest type, it is a single numeric value with no group associated with it. Keep in mind that an empty group, “{}” is still a group.
- NumberSet: A number set is a group of tagged numeric values with one value per unique grouping. As a special case, a scalar may be used in place of a numberSet with a single member with an empty group.
- SeriesSet: A series is an array of timestamp-value pairs and an associated group.
从高到低如下:
1. () ,一元运算符 ! 和 -
1. *,/,%
1. +,-
1. ==,!=,>,>=,<,<=
1. &&
1. ||
q("avg:os.mem.used{host=*}", "1m", "")
result列显示对应主机内存使用情况,是一个数值集合结果。结果如下:
avg(q("avg:os.mem.used{host=vs123}", "1m", ""))
结果如下:
max(q("avg:os.mem.used{host=vs123}", "1m", ""))
结果如下:
q("avg:os.mem.used{host=vs123}", "1m", "")
sum(q("avg:os.mem.used{host=vs123}", "1m", ""))
avg(q("avg:os.mem.used{host=vs12*}", "1m", ""))
使用转换函数之后:
t(avg(q("avg:os.mem.used{host=vs12*}", "1m", "")),"")
filter(q("sum:os.cpu{host=regexp(^vs)}", "1m", ""),limit(sort(avg(q("sum:os.cpu{host=regexp(^vs)}", "1m", "")),"desc"),10))
预警配置中分为alert、template、lookup、notification、macro五个部分,每个部分要以“{}”包围,基本的预警需要包括template、alert、notification(邮件配置)三部分。
定义规则:以“
模板用于以一定的格式发送预警消息,如:使用邮件发送预警通知时,邮件主题以及内容将会匹配特定的模板,以设置好的样式发送预警邮件。
简单模板示例:
#模板名称:unknownTemp
template unknownTemp {
#模板主题
subject = {{.Name}}: {{.Group | len}} unknown alerts
#模板内容(与HTML类似)
body = `
<p>Time: {{.Time}}
<p>Name: {{.Name}}
<p>Alerts: {{range .Group}}
<br>{{.}}
{{end}}`
}
alert部分写预警表达式,触发发送邮件、日志等触发器。
可使用的参数:
notification email {
#可以添加多个邮件地址,以逗号分隔就好
email = email.email1@example.com, email.email2@example.com
print = true
}
alert{
……
#匹配notification
critNotification = email
warnNotification = email
}
alert{
ignoreUnknown = true
}
template cpuTemplate {
subject = {{.Last.Status}}: {{.Alert.Name}} on {{.Group.host}}
body = `<p>Notes:{{.Alert.Vars.notes }}</p>
<p>Alert: {{.Alert.Name}} triggered on {{.Group.host}}
<hr>
<p><strong>Computation</strong>
<table>
{{range .Computations}}
<tr><td><a href="{{$.Expr .Text}}">{{.Text}}</a></td><td>{{.Value}}</td></tr>
{{end}}
</table>
<p><strong>All Hosts CPU Information</strong>
<p>(Red color means unhealthy,green color means healthy)</p>
<table>
{{range $f := .EvalAll .Alert.Vars.avgcpu}}
<tr><td>{{ $f.Group.host}}</td>
{{if gt $f.Value 70.0}}
<td style="color: red;">
{{else}}
<td style="color: green;">
{{end}}
{{ $f.Value | printf "%.0f" }}</td></tr>
{{end}}
</table>
<hr>
{{ .GraphAll .Alert.Vars.filteResult }}
<hr>
<p><strong>Relevant Tags</strong>
<table>
{{range $k, $v := .Group}}
<tr><td>{{$k}}</td><td>:</td><td>{{$v}}</td></tr>
{{end}}
</table>
<p>Attention: The time in the graph is <font color="red">UTC</font> time</p>
<p>The X axis means the time from now to {{.Alert.Vars.queryTime}} ago.</p>`
}
alert cpu.is.too.high {
template = cpuTemplate
$notes = This alert monitors the percentage of cpu against the cpu limit in haproxy (maxconn) and alerts when we are getting close to that limit and will need to raise that limit. This alert was created due to a socket outage we experienced for that reason
$queryTime = 1h
$limit = 10
$metric = q("sum:rate{counter,,1}:os.cpu{host=regexp(^vs)}", "$queryTime", "")
$avgcpu = avg($metric)
$orderCPU = limit(sort($avgcpu, "desc"), $limit)
$filteResult = filter($metric, $orderCPU)
crit = $avgcpu > 80
warn = $avgcpu > 70
ignoreUnknown = true
critNotification = email
warnNotification = email
}
template diskTemplate {
subject = {{.Last.Status}}: {{.Alert.Name}} on {{.Group.host}}
body = `<p>Notes:{{.Alert.Vars.notes }}</p>
<p>Alert: {{.Alert.Name}} triggered on {{.Group.host}}
<hr>
<p><strong>Computation</strong>
<table>
{{range .Computations}}
<tr><td><a href="{{$.Expr .Text}}">{{.Text}}</a></td><td>{{.Value}}</td></tr>
{{end}}
</table>
<p><strong>All Hosts Disk Information</strong>
<p>(Red color means unhealthy,green color means healthy)</p>
<table>
{{range $f := .EvalAll .Alert.Vars.avgDiskPercent}}
<tr><td>{{ $f.Group.host}}</td>
{{if lt $f.Value 10.0}}
<td style="color: red;">
{{else}}
<td style="color: green;">
{{end}}
{{ $f.Value | printf "%.0f" }}</td></tr>
{{end}}
</table>
<hr>
{{ .GraphAll .Alert.Vars.filteResult }}
<hr>
<p><strong>Relevant Tags</strong>
<table>
{{range $k, $v := .Group}}
<tr><td>{{$k}}</td><td>:</td><td>{{$v}}</td></tr>
{{end}}
</table>
<p>Attention: The time in the graph is <font color="red">UTC</font> time</p>
<p>The X axis means the time from now to {{.Alert.Vars.queryTime}} ago.</p>`
}
alert disk.free.space.is.too.small {
template = diskTemplate
$notes = This alert monitors the percentage of disk free space
$queryTime = 1h
$limit = 10
$diskPercentFree = q("avg:os.disk.fs.percent_free{host=regexp(^vs)}", "$queryTime", "")
$avgDiskPercent = avg($diskPercentFree)
$orderDisk = limit(sort($avgDiskPercent, "asc"), $limit)
$filteResult = filter($diskPercentFree, $orderDisk)
ignoreUnknown = true
crit = $avgDiskPercent < 5
warn = $avgDiskPercent < 10
critNotification = email
warnNotification = email
}
template memroyTemplate {
body = `{{if .Alert.Vars.notes}}
<p>Notes: {{.Alert.Vars.notes}}
{{end}}
{{if .Group.host}}
{{end}}
<hr>
<p><strong>Alert definition:</strong>
<table>
<tr>
<td>Name:</td>
<td>{{replace .Alert.Name "." " " -1}}</td></tr>
<tr>
<td>Warn:</td>
<td>{{.Alert.Warn}}</td></tr>
<tr>
<td>Crit:</td>
<td>{{.Alert.Crit}}</td></tr>
</table>
<hr>
<p><strong>All Hosts Memory Information</strong>
<p>(Red color means unhealthy,green color means healthy)</p>
<table>
{{range $f := .EvalAll .Alert.Vars.avgfree}}
<tr><td>{{ $f.Group.host}}</td>
{{if lt $f.Value 30.0}}
<td style="color: red;">
{{else}}
<td style="color: green;">
{{end}}
{{ $f.Value | printf "%.0f" }}</td></tr>
{{end}}
</table>
<p><strong>Tags</strong>
<table>
{{range $k, $v := .Group}}
{{if eq $k "host"}}
<tr><td>{{$k}}</td><td>:</td><td><a href="{{$.HostView $v}}">{{$v}}</a></td></tr>
{{else}}
<tr><td>{{$k}}</td><td>{{$v}}</td></tr>
{{end}}
{{end}}
</table>
<p><strong>Computation</strong>
<table>
{{range .Computations}}
<tr><td><a href="{{$.Expr .Text}}">{{.Text}}</a></td><td>{{.Value}}</td></tr>
{{end}}
</table>
<hr>
{{ .GraphAll .Alert.Vars.filteResult }}
<hr>
<p>Attention: The time in the graph is <font color="red">UTC</font> time</p>
<p>The X axis means the time from now to {{.Alert.Vars.queryTime}} ago.</p>`
subject = {{.Last.Status}}: {{replace .Alert.Name "." " " -1}}: {{.Eval .Alert.Vars.avgfree | printf "%.2f"}}{{if .Alert.Vars.unit_string}}{{.Alert.Vars.unit_string}}{{end}} on {{.Group.host}}
}
alert os.low.memory {
template = memroyTemplate
$notes = In Linux, Buffers and Cache are considered "Free Memory".This alert monitors the percentage of memory free space.
$unit_string = % Free Memory
$queryTime = 1h
$limit = 10
$memory = q("avg:os.mem.percent_free{host=regexp(^vs)}", "$queryTime", "")
$avgfree = avg($memory)
$orderMemory = limit(sort($avgfree, "asc"), $limit)
$filteResult = filter($memory, $orderMemory)
ignoreUnknown = true
crit = $avgfree < 20
warn = $avgfree < 30
critNotification = email
warnNotification = email
}
template unknownTemp {
subject = {{.Name}}: {{.Group | len}} unknown alerts
body = `
<p>Time: {{.Time}}
<p>Name: {{.Name}}
<p>Alerts: {{range .Group}}
<br>{{.}}
{{end}}`
}
unknownTemplate = unknownTemp
smtpHost = mail.example.com:25
emailFrom = username@163.com
smtpUsername= username@163.com
smtpPassword= password
notification email {
email = example1@example1.com, example2@example2.com
print = true
}
如果觉得我的文章对您有用,请随意打赏。你的支持将鼓励我继续创作!