GNU Linux Server (tested on CentOS Linux 7) – monitor software raid – mail notification on failure

06.Sep.2019

Administration / Server, cool tested GNU Linux Apps, Fedora / RedHat / CentOS, hardware fail, storage / NAS / QNAP

monitoring the integrity of a RAID is critical for fast replacement of failed harddisks (ideally one has at least one spare drive already in the machine) if not – it would be great if hard- and software-raid would send a mail – so admins can reacton on time before dataloss happens.

watch out: shingled hdd are not good for RAID!

1.) create a new mail address like server@domain.com

# tested on
hostnamectl 
  Operating System: CentOS Linux 7 (Core)
       CPE OS Name: cpe:/o:centos:centos:7
      Architecture: x86-64

2.) hit the terminal:

smart:

to include smart report on the drives:

# install stuff
yum install smartmontools.x86_64 sendemail
vim /scripts/monitor/smart.sh
smartctl -i /dev/sda |tail -n +3| tee /var/log/smart.log;
smartctl -H /dev/sda |tail -n +3| tee -a /var/log/smart.log;
smartctl -i /dev/sdb |tail -n +3| tee -a /var/log/smart.log;
smartctl -H /dev/sdb |tail -n +3| tee -a /var/log/smart.log;
:wq # save and quit
chmod + /scripts/monitor/smart.sh

# this config file might be at a different place in one's linux distribution
find / -name mdadm.conf
vim /etc/mdadm.conf

DEVICE partitions
MAILADDR server@domain.com
MAILFROM server@domain.com
ARRAY /dev/md/0  metadata=1.2 UUID=79ca7eed:4918b281:6dc9eca8:ac59ab2e name=rescue:0
ARRAY /dev/md/1  metadata=1.2 UUID=ae11003b:ae851ec7:6bbab8f0:63bda310 name=rescue:1
ARRAY /dev/md/2  metadata=1.2 UUID=7243e6cb:b0b4c802:853aef9b:5b12027d name=rescue:2
ARRAY /dev/md/3  metadata=1.2 UUID=d2c6c88a:8420f0a0:5f9b7d1a:a0d19152 name=rescue:3
PROGRAM /scripts/monitor/raid_status_mail.sh
:wq # write and quit

# check if mdadm is running
ps aux | grep mdadm 
root      1205  0.0  0.0   7328  1456 ?        Ss   Sep04   0:00 /sbin/mdadm --monitor --scan -f --pid-file=/var/run/mdadm/mdadm.pid

# not using tls because not leaving the system
vim /scripts/monitor/raid_status_mail.sh

#!/bin/bash
# send mails via external-mail-server that supports smpt-authentication via password
# behind -xu needs to be the username that is used to smpt-login to the server, in this case it is equal to sender's mail address, but does not necessarily need to be the same!

LOGFILE=/scripts/monitor/raid_status_mail_$(date '+%Y-%m').log;

# read raid status to file
echo "===== raid status script started on: $(date '+%Y-%m-%d-%H:%M:%S') ====="|tee -a $LOGFILE

# run smart check on both drives

echo "logfile used: $LOGFILE";

echo "=== smart status of all drives ==="| tee -a $LOGFILE

for x in {a..z}
do
if test $(ls /dev |grep sd$x |wc -l) != 0; then
	/usr/sbin/smartctl -H /dev/sd$sd$x | tee -a $LOGFILE
fi
done

echo "=== raid status ==="| tee -a $LOGFILE
cat /proc/mdstat | tee -a  $LOGFILE
df -h | tee -a $LOGFILE

echo "== backup raid config from /etc/mdadm.conf to /etc/mdadm.conf.backup ==" | tee -a $LOGFILE
cp -rv /etc/mdadm.conf /etc/mdadm.conf.backup;

echo "== general info about the system ==" | tee -a $LOGFILE
lshw -class disk | tee -a $LOGFILE

echo "=== print nice overview over all harddisks and filesystems: ==="
lsblk -o "NAME,MAJ:MIN,RM,SIZE,RO,FSTYPE,MOUNTPOINT,UUID"| tee -a $LOGFILE

mypublicip=$(dig -4 TXT +short o-o.myaddr.l.google.com @ns1.google.com | awk -F'"' '{ print $2}');
echo " == server's public ip: $mypublicip =="| tee -a $LOGFILE
mail_subject="RAID STATUS of $(hostname)@$mypublicip";

sendemail -o tls=yes -s ip.of.ones.mail.server -f server@domain.com -t server@domain.com -u "$mail_subject" -xu server@domain.com -xp "df873odf7d" -m "$mail_subject" -a $LOGFILE

chmod +x /scripts/monitor/raid_status_mail.sh

# highly recommend running diagnostics "monitor all logs" script on mail server and raid server (who shall send)
vim /scripts/monitor/mon_all_logs.sh &
find /var/log/ -type f \( -name "*" \) ! -path '*.gz*' -exec tail -n0 -f "$file" {} +

# simulate RAID failure (this should trigger the sending of one's test mail)
mdadm --monitor --scan --test -1
.... wait.... wait.... 
ps uax|grep mdadm
# got now two mdadm processes
root       328  0.0  0.0   7332  2404 pts/3    S+   23:28   0:00 mdadm --monitor --scan --test -1
root      1205  0.0  0.0   7328  1500 ?        Ss   Sep04   0:00 /sbin/mdadm --monitor --scan -f --pid-file=/var/run/mdadm/mdadm.pid

# diagnostics "monitor all logs" script should show on raid(mail sending)server:
sendemail[32375]: Email was sent successfully!
# this message will not be logged to any /var/log/file
# CONGRATULATIONS :) check one's inbox
# one can now abort the second test mdadm process with
Ctrl+C

# crontab it
crontab -e
0 0 * * * /scripts/monitor/raid_status_mail.sh # daily midnight raid and harddisk smart status report

TLS:

the script should work like this, it gave the message:

*******************************************************************
 Using the default of SSL_verify_mode of SSL_VERIFY_NONE for client
 is deprecated! Please set SSL_verify_mode to SSL_VERIFY_PEER
 possibly with SSL_ca_file|SSL_ca_path for verification.
 If you really don't want to verify the certificate and keep the
 connection open to Man-In-The-Middle attacks please set
 SSL_verify_mode explicitly to SSL_VERIFY_NONE in your application.
*******************************************************************
  at /bin/sendemail line 1933.
Sep 08 23:47:24 domain sendemail[8781]: Email was sent successfully!

# in this file default behaviour is defined as SSL_VERIFY_NONE
cat /usr/share/perl5/vendor_perl/IO/Socket/SSL.pm | grep -B 5 "SL_verify_mode => SSL_VERIFY_NONE"

    # default for SSL_verify_mode should be SSL_VERIFY_PEER for client
    # for now we keep the default of SSL_VERIFY_NONE but complain, if 
    # somebody uses this implicit default
    # SSL_verify_mode => undef,  # set to undef to enable secure default
    SSL_verify_mode => SSL_VERIFY_NONE,

# but when one changes the line
sendemail[9720]: ERROR => TLS setup failed:
SSL connect attempt failed with unknown error error:14090086:
SSL routines:ssl3_get_server_certificate:certificate verify failed

very much recommended to encrypt mail transport when leaving the system.

sendemails says it needs those packages and then one can use tls=on or tls=auto

perl-Crypt-SSLeay.x86_64 : Crypt::SSLeay - OpenSSL glue that provides LWP https support
perl-Net-SSLeay.x86_64 : Perl extension for using OpenSSL

Links:

http://caspian.dotconf.net/menu/Software/SendEmail/

https://cromwell-intl.com/open-source/sendmail-ssl.html

man pages:

sendemail.man.txt

mdadm.man.txt

liked this article?

only together we can create a truly free world
plz support dwaves to keep it up & running!
(yes the info on the internet is (mostly) free but beer is still not free (still have to work on that))
really really hate advertisement
contribute: whenever a solution was found, blog about it for others to find!
talk about, recommend & link to this blog and articles
thanks to all who contribute!

admin