FEEDBACK
X

The software RAID monitoring and recovery in Linux

The software RAID 1 management in RHEL

The status monitoring

Information on all RAID-arrays:

   # more /proc/mdstat

or

   # cat /proc/mdstat

or

   # watch -n .1 cat /proc/mdstat

Information on a specific disk partition:

   # mdadm -E /dev/sd<a-b><1-10>

for example:

   # mdadm -E /dev/sdb2

The rebuilding

The disk partition rebuild one at a time after "non-lethal" failure:

   # mdadm -a /dev/md<0-6> /dev/sd<a-b><1-10>

for example:

   # mdadm -a /dev/md0 /dev/sdb1

One should be careful with the partition numbers

If an error occurs, remove the component from the RAID by using a command:

   # mdadm -r /dev/md0 /dev/sdb1

it does not work all the time - the device can be busy.

Replacing the drive

1. Shut down the computer and replace the drive

2. Turn on the computer and identify partitions on both drives:

   # fdisk /dev/sd<a-b> -l

3. Using fdisk, create partitions on the new disk, identical to the original

You should flag the necessary partition on the new drive (sda1 and sdb1) as a boot one prior to mirroring.

You cannot mirror the swap partitions in the software RAID

4. Run the Status monitoring and rebuilding

Alert settings

The monitoring is performed hourly using crond.

In the /etc/cron.hourly folder, there is a mdRAIDmon file containing the command:

   # mdadm --monitor --scan -1 --mail=
 This email address is being protected from spambots. You need JavaScript enabled to view it.
 

The --test key is added in order to check the message distribution:

   # mdadm --monitor --scan -1 --mail=
 This email address is being protected from spambots. You need JavaScript enabled to view it.
  --test

Placing the job file into the folder, you must set up the execute permissions

If required more often, the simplest way is to add a line, using the notation with "/" to /etc/crontab, for example:

   */5 * * * * root run-parts /etc/cron.my5min

Of course, you can try other ways of job scheduling with atd or batch.

Create the /etc/cron.my5min folder and place the mdRAIDmon file there

The drive fault simulation was easy for me - the server SR1425BK1 - with a HotSwap basket