Computer Bicycling Photography GIMP Cars What Else Is New Home

Fedora Guide

Setting up software RAID 1 in Fedora 16

These steps should work in any rpm based Linux environment, but this is specific towards Fedora 16, especially the parts regarding Grub 2. I managed to get this to work on CentOS 6.2 using grub 1 commands to copy the bootloader over but I had other issues and decided to use Fedora 16 for my base OS on my workstation & VM server host.

I also made this RAID 1 mirror array while installing a clean copy of the OS. If you haven't played around with RAID and configuring disks I strongly suggest starting with a fresh system with any important data backed up as the drives will be erased to initialize the array. There is a way to create a RAID 1 mirror from an existing system, but that is not what I did as this was a fresh setup anyway. This machine was running Windows 7 and has a Highpoint Rocketraid 2300 card installed with a three drive 2TB RAID 5 array and one hot spare drive. To be sure I didn't make any mistakes I backed up the important data from that to other drives in other machines. Then I powered the computer down and pulled the card from the system so I couldn't accidentally delete my large storage array.

Setting up the array is pretty straightforward. Go through the anaconda installer and pick Basic Storage Devices. RAID is considered basic, specialized storage is for network drives. Continue normally and when you get to the Installation Type you need to select “Create Custom Layout” in order to setup any software RAID. If you let it pick automatically it defaults to jbod (just a bunch of disks) or as it's sometimes referred to a “linear” raid. This will mean you have NO redundancy which defeats the purpose!

I used this guide to setup my partitions in my array: http://wiki.centos.org/HowTos/SoftwareRAIDonCentOS5

To create a software RAID 1 first make a RAID partition of the same size and type on both drives you are mirroring. Delete any partitions that may be setup already. Then click the create button, select software raid. First I made what will become the /boot partition, so on each drive I made a 502MB software RAID partition on each drive. Then I made matching 10242MB software RAID partitions on each drive for my swap space. Finally I made one last set of software RAID partitions that filled the rest of the drive space (I had to type the number in for an exact size, anaconda crashed when I left it to fill max space).

The reason for the odd numbers is the partition data takes space. Fedora 16 defaults to 500MB for the boot partition due to Grub 2. The partition table takes space so I made the partitions 502MB. This way when the RAID device is made the outcome will be a 500MB mirror. 10242MB will become 10240MB, or 10GB. (I have 8GB RAM in this machine and recommended swap space size is N + 2GB where N = RAM.) To create the mirror click on create again and this time select RAID Device. Here you can select the type of drive to create (including a LVM) and what partitions to use. To create the boot partition select /boot from the drop down list as the mount point, then check-box the two 502MB partitions. Repeat for swap and the third will just be /. I don't need a home mount point for what I am doing.


After configuring all the RAID drives your screen should look similar to mine. (Sorry for the lines, this is connected to an old CRT and the idea of taking a pic for my record was a last minute idea.)


Next is the boot loader. Linux does not support booting directly from a software RAID. Therefore let the default of installing the boot loader onto the Master Boot Record (MBR) of the first drive stay. The system will boot into the mirror, but the boot loader is not redundant in the case of losing that one drive the system will no longer boot. This will be fixed next.

Grub 2 bug:

After installing Fedora 16 and taking all that time to configure the software RAID1 array and partitions, I was miffed that grub gave an unknown device error and my system would not boot!

A DuckDuckGo search revealed a bug where grub 2 will not work if /boot is configured on a software RAID 1 partition! Sheesh. The “official” solution is to make the /boot partition a standard partition on one drive and mirror the rest. That would make for a surprise when your system wouldn't boot because you lost a single drive and grub was only installed on that one drive! Definitely not how I want to configure, I want grub to be updated through yum and simultaneously copied onto the mirrored drive when a kernel update is performed so there is not surprise in the event of a drive failure that contains the /boot partition.

From the Fedora wiki: http://fedoraproject.org/wiki/Common_F16_bugs

Cannot boot with /boot partition on a software RAID array

link to this item - Bugzilla: #750794

Attempting to boot after installing Fedora 16 with the /boot partition on a software RAID array will fail, as the software RAID modules for the grub2 bootloader are not installed. Having the /boot partition on a RAID array has never been a recommended configuration for Fedora, but up until Fedora 16 it has usually worked.

To work around this issue, do not put the /boot partition on the RAID array. Create a simple BIOS boot partition and a /boot partition on one of the disks, and place the other system partitions on the RAID array. Alternatively, you can install the appropriate grub2 modules manually from anaconda's console before rebooting from the installer, or from rescue mode. Edit the file /mnt/sysimage/boot/grub2/grub.cfg and add the lines:

insmod raid
insmod mdraid09
insmod mdraid1x

Now run these commands:
chroot /mnt/sysimage
grub2-install /dev/sda
grub2-install /dev/sdb

Adjust the device names as appropriate to the disks used in your system.

I added the new lines right above the timeout line, so mine looks like this:

### BEGIN /etc/grub.d/00_header ###
if [ -s $prefix/grubenv ]; then
load_env
fi
set default="0"
if [ "${prev_saved_entry}" ]; then
set saved_entry="${prev_saved_entry}"
save_env saved_entry
set prev_saved_entry=
save_env prev_saved_entry
set boot_once=true
fi

function savedefault {
if [ -z "${boot_once}" ]; then
saved_entry="${chosen}"
save_env saved_entry
fi
}

function load_video {
insmod vbe
insmod vga
insmod video_bochs
insmod video_cirrus
}

### I ADDED THESE 3 LINES FOR RAID1 FUNCITION###

insmod raid
insmod mdraid09
insmod mdraid1x

set timeout=2
### END /etc/grub.d/00_header ###

BTW, at this time if you are tired of the 5 sec timer being to long you can drop that down, I made it two, as seen above. I suggest you not make the value zero so in the case a kernel update fails you can simply select the older kernel version.

Working with mdadm.

After getting the raid setup in anaconda, then configuring grub2 to see the mirrored /boot partition, and updating the mbr on both drives it was finally time to test the redundancy to full functionality.

To simulate a failed drive I shut down the computer and physically removed the sata cable from sdb. Upon boot grub worked and the system booted normally! You can run a sudo cat /proc/mdstat to see how it is reported sdb no longer exists.

I repeated the test by powering off the computer again. This time I plugged sdb back in and instead unplugged sda. Upon power up again the computer booted normally. Success! This means the mirrored raid did it's job.

I powered off the computer again, and plugged sda's sata cable back in. The computer booted normally, this is a good sign. Too good.

I ran another sudo cat /proc/mdstat. As seen here two of the software raid partitions on disk sda are missing.

[root@workstation retarp]# cat /proc/mdstat
Personalities : [raid1]

md12 : active raid1 sdb1[1]
477382520 blocks super 1.2 [2/1] [_U]
bitmap: 3/4 pages [12KB], 65536KB chunk

md11 : active raid1 sda2[0] sdb2[1]
10486712 blocks super 1.2 [2/2] [UU]

md10 : active raid1 sdb3[1] 514036 blocks super 1.0 [2/1] [_U]

unused devices:

sda1 and sda3 are missing from the raid array. Notice md11 has [UU] where md10 and md12 have [_U]. We can use fdisk to verify the partitions on sda:

[root@workstation retarp]# fdisk /dev/sda -l

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0006720d

Device Boot Start End Blocks Id System
/dev/sda1 2048 954769407 477383680 fd Linux raid autodetect
/dev/sda2 954769408 975745023 10487808 fd Linux raid autodetect
/dev/sda3 * 975745024 976773119 514048 fd Linux raid autodetect

Partitions on sdb should be identical and they are:

[root@workstation retarp]# fdisk /dev/sdb -l
Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders, total 976773168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0008ddf2

Device Boot Start End Blocks Id System
/dev/sdb1 2048 954769407 477383680 fd Linux raid autodetect
/dev/sdb2 954769408 975745023 10487808 fd Linux raid autodetect
/dev/sdb3 * 975745024 976773119 514048 fd Linux raid autodetect

Now to try and add sda to the array:

[root@workstation retarp]# mdadm /dev/md10 --add /dev/sda3
mdadm reports an error:
mdadm: /dev/sda3 reports being an active member for /dev/md10, but a --re-add fails.
mdadm: not performing --add as that would convert /dev/sda3 in to a spare.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sda3" first.

I wasn't sure but I decided to go ahead and made sda3 a spare since I couldn't remove it or add it. This turned out to be the solution. First step is to delete the superblock to clear the drive:

[root@workstation retarp]# mdadm --zero-superblock /dev/sda3

Then I added it back to the array:

[root@workstation retarp]# mdadm /dev/md10 --add /dev/sda3
mdadm: added /dev/sda3

The drive should just add and rebuild automatically, I verified this was so with the command: mdadm --detail /dev/md10

[root@workstation retarp]# mdadm --detail /dev/md10
/dev/md10:
Version : 1.0
Creation Time : Tue Jan 3 17:06:51 2012
Raid Level : raid1
Array Size : 514036 (502.07 MiB 526.37 MB)
Used Dev Size : 514036 (502.07 MiB 526.37 MB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Update Time : Tue Jan 3 02:42:27 2012
State : clean, degraded, recovering
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1

Rebuild Status : 91% complete

Name : workstation.retarp.com:10 (local to host workstation.retarp.com)
UUID : 71919c10:c6d4e9e5:f42b121a:59d30683
Events : 72
Number Major Minor RaidDevice State
2 8 3 0 spare rebuilding /dev/sda3
1 8 19 1 active sync /dev/sdb3

I then did the same with md12:

[root@workstation retarp]# cat /proc/mdstat
Personalities : [raid1]
md12 : active raid1 sdb1[1]
477382520 blocks super 1.2 [2/1] [_U]
bitmap: 3/4 pages [12KB], 65536KB chunk
md11 : active raid1 sda2[0] sdb2[1]
10486712 blocks super 1.2 [2/2] [UU]
md10 : active raid1 sda3[2] sdb3[1]
514036 blocks super 1.0 [2/2] [UU]

unused devices:

Above you can see the fixed and now healthy md10 array.

Viewing the details confirms the missing partition on md12:

[root@workstation retarp]# mdadm --detail /dev/md12
/dev/md12:
Version : 1.2
Creation Time : Tue Jan 3 17:06:41 2012
Raid Level : raid1
Array Size : 477382520 (455.27 GiB 488.84 GB)
Used Dev Size : 477382520 (455.27 GiB 488.84 GB)
Raid Devices : 2
Total Devices : 1
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Tue Jan 3 03:01:41 2012
State : active, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

Name : workstation.retarp.com:12 (local to host workstation.retarp.com)
UUID : bd68e1b0:bd908077:cffd0922:6d9ff451
Events : 1779

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 17 1 active sync /dev/sdb1

On this drive just adding the disk to array did the trick:

[root@workstation retarp]# mdadm /dev/md12 --add /dev/sda1
mdadm: re-added /dev/sda1

Verified:

[root@workstation retarp]# mdadm --detail /dev/md12
/dev/md12:
Version : 1.2
Creation Time : Tue Jan 3 17:06:41 2012
Raid Level : raid1
Array Size : 477382520 (455.27 GiB 488.84 GB)
Used Dev Size : 477382520 (455.27 GiB 488.84 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Tue Jan 3 03:03:59 2012
State : active, degraded, recovering
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1

Rebuild Status : 7% complete

Name : workstation.retarp.com:12 (local to host workstation.retarp.com)
UUID : bd68e1b0:bd908077:cffd0922:6d9ff451
Events : 1790

Number Major Minor RaidDevice State
0 8 1 0 spare rebuilding /dev/sda1
1 8 17 1 active sync /dev/sdb1

Success! Now it's happy:

[root@workstation retarp]# mdadm --detail /dev/md12
/dev/md12:
Version : 1.2
Creation Time : Tue Jan 3 17:06:41 2012
Raid Level : raid1
Array Size : 477382520 (455.27 GiB 488.84 GB)
Used Dev Size : 477382520 (455.27 GiB 488.84 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Tue Jan 3 03:04:36 2012
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Name : workstation.retarp.com:12 (local to host workstation.retarp.com)
UUID : bd68e1b0:bd908077:cffd0922:6d9ff451
Events : 1805

Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1

A final check of the array shows all is well again:

[root@workstation retarp]# cat /proc/mdstat
Personalities : [raid1]
md12 : active raid1 sda1[0] sdb1[1]
477382520 blocks super 1.2 [2/2] [UU]
bitmap: 0/4 pages [0KB], 65536KB chunk

md11 : active raid1 sda2[0] sdb2[1]
10486712 blocks super 1.2 [2/2] [UU]

md10 : active raid1 sda3[2] sdb3[1]
514036 blocks super 1.0 [2/2] [UU]

unused devices:

There we go, after a few hours and a lot of time reading online I have a ready to go software RAID 1. Now in case of a hard drive failure, my system won't go down, and it will still boot!

Like this page? See the rest of my Fedora guides!