普拉多VX

人生一路,不问来时,不知归期

0%

zabbix 监控报警(邮件)

邮件报警

邮件报警对于运维来说还是比较实用的,免费且方便配置,只是如果报警邮件过多可能会导致报警信息遗漏。这点需要运维知悉!

1.登录qq邮箱-账户-获取独立密码

2.添加报警媒介 QQemail

3.设置用户报警媒介并且指定消息接收人

4.启用trigger actions

新版本把actions的分类移动到了上面进行选择

测试报警结果

手动测试我们的报警规则

1
2
3
4
5
6
7
8
9
10
11
12
13
[root@iZ2zecgq3cou36re3sxh4bZ ~]# zabbix_sender -s "webserver.roddypy.com" -z 47.93.184.140 -p 10051 -k "UsersCount" -o 601 -vv
zabbix_sender [23894]: DEBUG: answer [{"response":"success","info":"processed: 1; failed: 0; total: 1; seconds spent: 0.000029"}]
Response from "47.93.184.140:10051": "processed: 1; failed: 0; total: 1; seconds spent: 0.000029"
sent: 1; skipped: 0; total: 1
[root@iZ2zecgq3cou36re3sxh4bZ ~]# zabbix_sender -s "webserver.roddypy.com" -z 47.93.184.140 -p 10051 -k "UsersCount" -o 603 -vv
zabbix_sender [23899]: DEBUG: answer [{"response":"success","info":"processed: 1; failed: 0; total: 1; seconds spent: 0.000078"}]
Response from "47.93.184.140:10051": "processed: 1; failed: 0; total: 1; seconds spent: 0.000078"
sent: 1; skipped: 0; total: 1
[root@iZ2zecgq3cou36re3sxh4bZ ~]# zabbix_sender -s "webserver.roddypy.com" -z 47.93.184.140 -p 10051 -k "UsersCount" -o 604 -vv
zabbix_sender [23901]: DEBUG: answer [{"response":"success","info":"processed: 1; failed: 0; total: 1; seconds spent: 0.000087"}]
Response from "47.93.184.140:10051": "processed: 1; failed: 0; total: 1; seconds spent: 0.000087"
sent: 1; skipped: 0; total: 1
[root@iZ2zecgq3cou36re3sxh4bZ ~]#

仪表盘已经提示出问题

打开邮箱确认下报警信息

发现邮件为啥是html的文本格式呢,貌似我们在报警媒介那个地方选择过。现在调整成html看看。

修改后的结果

1
2
3
4
5
6
Problem started at 15:36:56 on 2020.09.22
Problem name: UsersCount
Host: webserver.roddypy.com
Severity: Average
Operational data: 608 个
Original problem ID: 74

修改报警模版

默认模版为以上英文信息,同时主题和内容我们详细信息不多。我们是否可以设置一个新的模版呢,答案可以。

在报警媒介-模版中可以设置-记着点更新

注意:如果以下信息作为模版请选择Message format为 ”文本”,如果以html格式请选择html模版

文本

报警通知模版

1
2
3
4
5
6
7
8
9
10
11
12
主题:
Problem: {EVENT.NAME}故障{TRIGGER.STATUS},服务器:{HOSTNAME1}发生: {TRIGGER.NAME}故障!
内容:

告警主机:{HOSTNAME1}
告警时间:{EVENT.DATE} {EVENT.TIME}
告警等级:{TRIGGER.SEVERITY}
告警信息: {TRIGGER.NAME}
告警项目:{TRIGGER.KEY1}
问题详情:{ITEM.NAME}:{ITEM.VALUE}
当前状态:{TRIGGER.STATUS}:{ITEM.VALUE1}
事件ID:{EVENT.ID}

报警恢复模版

1
2
3
4
5
6
7
8
9
10
11
主题:
恢复{TRIGGER.STATUS}, 服务器:{HOSTNAME1}: {TRIGGER.NAME}已恢复!
内容:
告警主机:{HOSTNAME1}<b>故障问题已恢复</b>
告警时间:{EVENT.DATE} {EVENT.TIME}
告警等级:{TRIGGER.SEVERITY}
告警信息: {TRIGGER.NAME}
告警项目:{TRIGGER.KEY1}
问题详情:{ITEM.NAME}:{ITEM.VALUE}
当前状态:{TRIGGER.STATUS}:{ITEM.VALUE1}
事件ID:{EVENT.ID}

html

报警通知模版

1
2
3
4
5
6
7
8
9
10
11
12
主题:
Problem: {EVENT.NAME}故障{TRIGGER.STATUS},服务器:{HOSTNAME1}发生: {TRIGGER.NAME}故障!
内容:

<b>告警主机:</b>{HOSTNAME1}<br>
<b>告警时间:</b>{EVENT.DATE} {EVENT.TIME}<br>
<b>告警等级:</b>{TRIGGER.SEVERITY}<br>
<b>告警信息: </b>{TRIGGER.NAME}<br>
<b>告警项目:</b>{TRIGGER.KEY1}<br>
<b>问题详情:</b>{ITEM.NAME}:{ITEM.VALUE}<br>
<b>当前状态:</b>{TRIGGER.STATUS}:{ITEM.VALUE1}<br>
<b>事件ID:</b>{EVENT.ID}<br>

报警恢复模版

1
2
3
4
5
6
7
8
9
10
11
12
13
主题:
恢复{TRIGGER.STATUS}, 服务器:{HOSTNAME1}: {TRIGGER.NAME}已恢复!

内容:
<b>故障问题已恢复</b><br>
<b>告警主机:</b>{HOSTNAME1}<br>
<b>告警时间:</b>{EVENT.DATE} {EVENT.TIME}<br>
<b>告警等级:</b>{TRIGGER.SEVERITY}<br>
<b>告警信息:</b> {TRIGGER.NAME}<br>
<b>告警项目:</b>{TRIGGER.KEY1}<br>
<b>问题详情:</b>{ITEM.NAME}:{ITEM.VALUE}<br>
<b>当前状态:</b>{TRIGGER.STATUS}:{ITEM.VALUE1}<br>
<b>事件ID:</b>{EVENT.ID}<br>

结果:

故障日志排查

如果配置了报警仪表盘有提示,但是没有收到报警,这个时候就要去查看zabbix_server.log日志

最开始我就是配置端口错误

1
2
3
4
5
6
7
8
9
[root@iZ2ze7k1pc9lk7pay5rcawZ ~]# tailf /var/log/zabbix/zabbix_server.log
28695:20200922:135040.315 housekeeper [deleted 0 hist/trends, 0 items/triggers, 0 events, 0 problems, 0 sessions, 0 alarms, 0 audit, 0 records in 0.002269 sec, idle for 1 hour(s)]
28695:20200922:145040.534 executing housekeeper
28695:20200922:145040.537 housekeeper [deleted 0 hist/trends, 0 items/triggers, 0 events, 0 problems, 0 sessions, 0 alarms, 0 audit, 0 records in 0.002225 sec, idle for 1 hour(s)]
28719:20200922:151620.874 watchdog: 1 recipient(s) found for database down messages
28722:20200922:151922.945 failed to send email: Couldn't connect to server: Failed connect to mail.qq.com:25; Operation now in progress
28720:20200922:152007.968 failed to send email: Couldn't connect to server: Failed connect to mail.qq.com:25; Operation now in progress
28721:20200922:152052.989 failed to send email: Couldn't connect to server: Failed connect to mail.qq.com:25; Operation now in progress
[root@iZ2ze7k1pc9lk7pay5rcawZ ~]#

以上方式由平台提供的email配置,还有一种方法是通过py脚本来发邮件,两种方式都可以。哪个方便用哪个即可,目前邮件报警还是业内比较常用的报警方式之一

参考