Linux 101 : Repairing a Linux RAID array - the "mdadm" tool -



As RAID could fail on a Linux system, the system administrator needs to get notified about the issue.

System administrators could setup the "mdadm" - the program that manages the RAID array - to send a notification via email to the root user using the "MAILADDR" parameter in the configuration file of the "mdadm" program: - "/etc/mdadm/mdadm.conf" -.


After each change to the configuration file, we would need to run the below command to update the initramfs with the recent changes:


The "initramfs" is a "temporary" root filesystem image that the system uses to load elements necessary for booting, like the driver modules, before the "real" filesystem gets mounted. 

We could display the "/proc/mdstat" file to check the status of our RAID array:



The disk "sdb1" is tagged with a "failed" sign (F).

The RAID array has now, only two running disks as we can see ([3/2] [UU_]).

We will need to remove the failed disk "sdb1" from the RAID array "/dev/md0".
We could do so using the below command:


Remark:

If we want to replace a drive that is not failing - for whatever reason -, we will need to tag it as "failed" as follows:


Then we could proceed with the swapping.

The "/proc/mdstat" file reflects all the changes done to the RAID array. 

We only see two disks now after removing the failed one, "sdb1":


The replacement drive needs to be of the same size or bigger than the failed one, so we could create a partition on it that matches the size of the other partitions in the array.

After physically replacing the failed drive and partitioning it, we could add it to the RAID array "md0" as follows:


The "mdadm" program will rebuild the data on the new drive.

We could "watch" the progress of the rebuilding process by periodically checking the changes in the "mdstat" file using the below command:


-n :
 time in seconds - three in our example -.

Comments

Leave as a comment:

Archive