博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
阿里云环境迁移记录 - 服务监控及报警
阅读量:6246 次
发布时间:2019-06-22

本文共 8353 字,大约阅读时间需要 27 分钟。

服务监控的方案有很多,譬如naigos,zabbix这种,不但可以监控服务,还可以监控cpu、内存、磁盘、网络流量、服务端口等,关于naigos和zabbix的搭建配置,需要另外篇幅介绍,这里使用服务器自身的定时任务+脚本+邮件功能完成一个简单的监控。

Part1 邮件服务搭建

安装mailx

yum -y install mailx

############################

##qq个人邮箱配置
############################

vim /etc/mail.rc

添加如下配置:

set from=xxxxxx@qq.comset smtp=smtps://smtp.qq.com:465set smtp-auth-user=xxxxxx@qq.comset smtp-auth-password=你的 QQ 邮箱授权码 (登录qq邮箱到账户设置中,打开smtp服务时,提示的验证码,此验证码非密码。)set smtp-auth=login#set smtp-use-starttls 这里是不需要配置的,很多地方没说明,配置了反而会验证失败,所以我注释掉;set ssl-verify=ignoreset nss-config-dir=/root/.certs

##创建证书

mkdir -p /root/.certs/cd /root/.certs/echo -n | openssl s_client -connect smtp.qq.com:465 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > ~/.certs/qq.crtcertutil -A -n "GeoTrust SSL CA" -t "C,," -d ~/.certs -i ~/.certs/qq.crtcertutil -A -n "GeoTrust Global CA" -t "C,," -d ~/.certs -i ~/.certs/qq.crtcertutil -L -d /root/.certs##认证certutil -A -n "GeoTrust SSL CA - G3" -t "Pu,Pu,Pu" -d ./ -i qq.crt

#返回如下提示即可:

Notice: Trust flag u is set automatically if the private key is present.

#发送主题为“邮箱测试”,内容为当前目录下 message_fiel.txt 文件内容到 xxxx@qq.com 邮箱。

mailx -s "邮箱测试" xxxx@qq.com < message_file.txt

############################

##qq企业邮箱配置
############################

vim /etc/mail.rc

添加如下配置:

set from=noreply@88gongxiang.comset smtp=smtps://smtp.exmail.qq.com:465set smtp-auth-user=noreply@88gongxiang.comset smtp-auth-password=*****(登录密码,不同于个人邮箱的授权码)set smtp-auth=loginset ssl-verify=ignoreset nss-config-dir=/etc/pki/nssdb/

cd /etc/pki/nssdb/

#生成证书

echo -n | openssl s_client -connect smtp.qq.com:465 | sed -ne '/-BEGIN CERTIFICATE-/,/-END CERTIFICATE-/p' > /etc/pki/nssdb/qq.crt certutil -A -n "GeoTrust SSL CA" -t "C,," -d /etc/pki/nssdb/ -i /etc/pki/nssdb/qq.crt certutil -A -n "GeoTrust Global CA" -t "C,," -d /etc/pki/nssdb/ -i /etc/pki/nssdb/qq.crt certutil -L -d /etc/pki/nssdb/ certutil -A -n "GeoTrust SSL CA - G3" -t "Pu,Pu,Pu" -d ./ -i qq.crt #认证

同样,认证完会返回如下提示:

Notice: Trust flag u is set automatically if the private key is present.

##测试

echo "this email come from centos 172.26.27.71"|mail -v -s "mysql check test" weiyonglai@88gongxiang.com

Part2 监控脚本准备

  1. mysql监控脚本

大致想法:mysql监控脚本分别运行在两个实例上,如果当前实例宕机,则重启本机mysql服务,如果其他服务器上的mysql连接不上则邮件通知。

#!/bin/bashnotify_addr='weiyonglai@88gongxiang.com,393369540@qq.com'error_log="/opt/script/logs/check_mysql.err"###定义一个简单判断mysql是否可用的函数function excute_query {    echo -e "`date "+%F  %H:%M:%S"`    -----checking mysql instance $1 by querying -----" >> ${error_log}    /usr/local/mysql/bin/mysql -uroot -p88gongxiangMYSQL -h $1 --port 30468 -e "select 1;" 2>> ${error_log}}###定义无法执行查询,且mysql服务异常时的处理函数function service_error {    echo -e "`date "+%F  %H:%M:%S"`    -----mysql service error,notify manager now-----" >> ${error_log}    systemctl restart mysql.service    echo "$1 无法连接并被重启"|mail -s "MYSQL $1 实例正在被重启, 请及时登录查看状态!" ${notify_addr} 2>> ${error_log}    echo -e "\n---------------------------------------------------------\n" >> ${error_log}}###定义无法执行查询,但mysql服务正常的处理函数function query_error {    echo -e "`date "+%F  %H:%M:%S"`    -----mysql instance $1 query error, retry after 30s-----" >> ${error_log}    sleep 30    excute_query $1    if [ $? -ne 0 ];then        echo -e "`date "+%F  %H:%M:%S"`    -----mysql instance $1 still can't execute query-----" >> ${error_log}        echo "mysql isntance $1 is down"|mail -s "MYSQL $1 无法连接查询, 请及时处理!from(172.26.27.70)" ${notify_addr} 2>> ${error_log}    else        echo -e "`date "+%F  %H:%M:%S"`    -----mysql instance $1 query ok after 10s-----" >> ${error_log}        echo -e "\n---------------------------------------------------------\n" >> ${error_log}    fi}###监控本机mysql状态excute_query 172.26.27.70if [ $? -ne 0 ];then     systemctl status mysql.service  &>/dev/null    if [ $? -ne 0 ];then        service_error 172.26.27.70    else        query_error 172.26.27.70    fi  else     echo -e "\n-----------mysql instance 172.26.27.70 is ok for query-------------\n" >> ${error_log}fi###监控备机mysql状态excute_query 172.26.27.71if [ $? -ne 0 ];then      query_error 172.26.27.71    else     echo -e "\n-----------mysql instance 172.26.27.71 is ok for query-------------\n" >> ${error_log}fi
  1. mongo监控脚本

大致思想:通过mongo命令登录或者mongostat判断节点是否正常运行。

notify_addr='weiyonglai@88gongxiang.com,393369540@qq.com'error_log="/opt/script/logs/check_mongo.err"###定义一个简单判断mysql是否可用的函数function connect_db {    echo -e "`date "+%F  %H:%M:%S"`    -----checking mongo instance $1 by login -----" >> ${error_log}    echo "db.serverStatus().mem" | /usr/local/mongodb/bin/mongo admin -uroot -p88gongxiangds --host $1 --port 20467  2>> ${error_log}}function replication_stat_query {    echo -e "`date "+%F  %H:%M:%S"`    -----checking mongo instance $1 by mongostat -----" >> ${error_log}    /usr/local/mongodb/bin/mongostat --uri=mongodb://suroot:88gongxiangds@$1:20467/admin  2>> ${error_log}}###定义无法执行查询,且mysql服务异常时的处理函数function service_error {    echo -e "`date "+%F  %H:%M:%S"`    -----mongo service $1 error,notify manager now-----" >> ${error_log}    ##/usr/local/mongo/bin/mongod -f /etc/mongo.conf --shutdown    echo "$1 mongo连接失败,请及时处理"|mail -s "Mongo $1 实例无法连接, 请及时登录处理!from(172.26.27.70)" ${notify_addr} 2>> ${error_log}    echo -e "\n---------------------------------------------------------\n" >> ${error_log}}###监控本机mongo node 状态function monitor_node {connect_db $1 if [ $? -ne 0 ];then    service_error $1 #else #replication_stat_query $1 #if [ $? -ne 0 ];then #service_error $1 #else #echo -e "\n-----------mongostat of node $1 is ok! -------------\n" >> ${error_log}           echo -e "\n-----------mongo connection to  node $1 is ok! -------------\n" >> ${error_log} #fifi}###监控本机mongo node 状态monitor_node 172.26.27.70monitor_node 172.26.27.71monitor_node 172.26.27.72
  1. redis监控脚本

大致思想: 通过redis-cli登录并检索clusterinfo是否enable来判断该节点及集群是否正常工作。

#!/bin/bashnotify_addr='weiyonglai@88gongxiang.com,393369540@qq.com'error_log="/opt/script/logs/check_redis.err"###定义无法执行查询,且mysql服务异常时的处理函数function service_error {    echo -e "`date "+%F  %H:%M:%S"`    -----redis service $1:$2 error,notify manager now-----" >> ${error_log}    ##/usr/local/mongo/bin/mongod -f /etc/mongo.conf --shutdown    echo "$1 redis连接异常,请及时处理"|mail -s "Redis $1:$2 节点连接失败, 请及时登录处理!(from 172.26.27.70)" ${notify_addr} 2>> ${error_log}    echo -e "\n---------------------------------------------------------\n" >> ${error_log}}###监控redis 状态function monitor_node {echo -e "`date "+%F  %H:%M:%S"`    -----checking mongo redis  $1:$2 by cli -----" >> ${error_log}/usr/local/bin/redis-cli -h $1 -p $2 -a 88gongxiangrds info |grep  cluster_enabledif [ $? -ne 0 ];then    service_error $1 $2    echo -e "\n-----------redis connection to  node $1:$2 is ok! -------------\n" >> ${error_log}fi}###监控本机mongo node 状态monitor_node 172.26.27.70 6239monitor_node 172.26.27.70 6339monitor_node 172.26.27.71 6239monitor_node 172.26.27.71 6339monitor_node 172.26.27.72 6239monitor_node 172.26.27.72 6339
  1. rabbitmq监控脚本

通过rabbitmqctl查看集群状态或者节点状态

#!/bin/bashnotify_addr='weiyonglai@88gongxiang.com,393369540@qq.com'error_log="/opt/script/logs/check_redis.err"###定义无法执行查询,且mysql服务异常时的处理函数function service_error {    echo -e "`date "+%F  %H:%M:%S"`    -----rabbitmq service  error,notify manager now-----" >> ${error_log}    #ps -ef | grep ^rabbitmq | awk '{print $2}' | xargs kill -9    #service rabbitmq-server start    echo "$1 rabbitmq服务异常, 请及时处理"|mail -s " $1 RabbitMQ服务异常, 请及时登录处理!(from $2)" ${notify_addr} 2>> ${error_log}    echo -e "\n---------------------------------------------------------\n" >> ${error_log}}###监控rabbitmq 状态function monitor_node {echo -e "`date "+%F  %H:%M:%S"`    -----checking mongo redis  $1:$2 by cli -----" >> ${error_log}#/usr/lib/rabbitmq/bin/rabbitmqctl  cluster_status |grep cluster_name /usr/sbin/rabbitmqctl cluster_status | grep cluster_nameif [ $? -ne 0 ];then    service_error $1 $2    echo -e "\n-----------redis connection to  node $1:$2 is ok! -------------\n" >> ${error_log}fi}monitor_node 172.26.27.72 172.26.27.72

Part3 定时任务配置

crontab -e*/1 * * * *    /opt/script/check_mysql.sh  > /opt/script/logs/cron_result.log 2>&1*/3 * * * *    /opt/script/check_mongo.sh  > /opt/script/logs/cron_result.log 2>&1*/5 * * * *    /opt/script/check_redis.sh  > /opt/script/logs/cron_result.log 2>&1*/5 * * * *    /opt/script/check_rabbitmq.sh  > /opt/script/logs/cron_result.log 2>&1

转载于:https://blog.51cto.com/10705830/2351870

你可能感兴趣的文章
快速排序
查看>>
排版与缩写
查看>>
C#使用xpath查找xml节点信息
查看>>
简单的语句统计所有用户表尺寸大小
查看>>
作业四:个人项目---小学四则运算
查看>>
漂亮的按钮样式-button
查看>>
post请求方式的翻页爬取内容及思考
查看>>
VC++ MFC如何生成一个可串行化的类
查看>>
php 变量引用,函数引用
查看>>
NET生成缩略图
查看>>
微软企业库5.0 学习之路——第二步、使用VS2010+Data Access模块建立多数据库项目...
查看>>
渗流稳定性分析(MATLAB实现)
查看>>
POJ2253 Frogger(最短路径)
查看>>
动画总结?
查看>>
HDU 2044 一只小蜜蜂 *
查看>>
Java 斜杠 与 反斜杠
查看>>
垂直居中
查看>>
idea下maven项目,样式css、js更新后,页面不显示更新内容
查看>>
bzoj 1001 平面图转对偶图 最短路求图最小割
查看>>
php 记住密码自动登录
查看>>