Posts tagged software raid
preface and setup layout
After having had installed a fresh system based on Kali Linux with software raid and LVM, I had some fun.
The setup consisted of four hdd's, partitions for
swap and some others for testing purposes, mostly btrfs,
/boot was on a ext2.
First two harddisks were designated as the system RAID, second two were to be the data RAID.
The harddisks were plugged into the SATA ports 1 to 4 in the right order (ports can always be identified via the prints on the motherboard), which was a good idea as we will see later. Out of habit I also took a photo of the partitioning scheme during the install when I was done, as this was a more complex setup. Both RAID levels were RAID1, nothing fancy.
Each of the RAID devices was in turn used as a LVM volume group, and each of the partitions mentioned above were a single logical volume.
/boot was a LVM partition on top of a software raid.
Well, I simply hoped this would boot after this setup was chosen. ;)
excourse on used RAID levels
On a sidenote, usually I only put RAID1 (mirrored) and RAID10 (striped sets of mirrors) levels to use. RAID5 allows one disk to fail in the array, RAID6 two. With the current sizes of two, three or even four terabytes, and six also being already shipped, just think of the amount of time needed to rebuild a RAID5 with 10TB, which should take quite a while, when two TB already take days to finish.
Considering most people do not mix harddisks but just take them one after another out of the box the mailman sent them, these are very likely quite similar. Same model, from the same production unit or time slot, with likely similar life expectancies. Rebuilds further take their toll on the hardware, as they impose an intense workload upon the disks. Besides, in a RAID 10 data is copied straight from one disk to its partner, whereas in a RAID5 ALL disks are read, plus parity has to be calculated. This fucks up the performance of the drives during rebuilds.
I do not feel good about a rebuild stressing the array over a time span of like weeks which it takes the system until it finishes, sitting only on top of a lousy RAID5 during the process, where another missing disk means all is lost.
A RAID5, where the failure probability increases with each disk, as does the time to rebuild. RAID6 will mitigate this somehow, but just think of the time and work the rebuild takes. And if your data goes down the drain, think of what the customer will tell you when he's missing 20TB?
A RAID10 with two failed disks is already among my experiences, both went out in quick order in that case, like within two days.
Lucky me, their were on different legs of the RAID0. So what did the situation feel like?
All data was backupped. The backups are actively being tested and thus working most of the time. All storage capacity summed up to just six TB. And these were only 2TB drives, which were synced within days, not within two weeks, compared to if it had been a RAID5/6 setup.
I still dread the memory, it was a Hypervisor for several customers. Brave new virtualized world.
You may ask, why no external storage? Getting a dothill or an EMC2 storage is simply several thousand euros, and why not use an already existing 8bay Server with local storage? RAID1 for the system, leaves six disks for data, with two TB drives sums up to 6TB capacity, which is a nice use case for slightly aged hardware.
Besides, these setups can also be sold more easily, they are simply cost efficient. Plus you do not have two digit terabyte amounts of data to sync.
Here's a link from Jan 2014 to show the level of importance storage already had last year.
Some time after this posting I found some additional info on this, from someone I have never heard of:
(You may have to have
mdadm installed, though I do not know for sure.)
But I disgress, back to the story.
boot failed, ofc
Booting the system afterwards failed with errors, and the root partition was formatted as btrfs, too.
That the RAID status was not ok was a minor issue, as the RAID was just not synced yet.
fsck: fsck.btrfs: not found fsck: error2 while executing fsck.btrfs for /root/rundev fsck: died with exit status 8 failed (code 8).
was really a problem.
btrfs-tools package being missing was the culprit.
This could be found out through having a look at the fsck tools, seeing that not btrfs stuff is present, and googling the problem.
Google also helps for finding the right package name, we have to install.
get to know the storage geometry
Reboot with a live disk, and having written down / photographed my layout previously, I knew where to start.
Figure this out, in case you have not watched your cabling or no info on partitions or software raids:
root@kali:~# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 233.8G 0 disk `-sda1 8:1 0 232.9G 0 part sdb 8:16 0 233.8G 0 disk `-sdb1 8:17 0 232.9G 0 part sdc 8:32 0 298.1G 0 disk `-sdc1 8:33 0 298G 0 part sdh 8:112 0 3.8G 0 disk |-sdh1 8:113 0 2.9G 0 part /lib/live/mount/medium `-sdh2 8:114 0 61.9M 0 part /media/Kali Live sdi 8:128 0 372.6G 0 disk `-sdi1 8:129 0 298G 0 part sr0 11:0 1 1024M 0 rom loop0 7:0 0 2.6G 1 loop /lib/live/mount/rootfs/filesystem.squashfs root@kali:~#
Knowing we have two software raids,
sdb1 seem to be related, as are
The actual device sizes don't help you much, as mixed hardware was used.
Something you'll also encounter out there in the wild, standard procedure.
This may lead to interesting situations:
Like at 4am in the night, with you of course being on call:
You already pulled out and set up new hardware and then you realize the system just won't boot. You can either restore too-many terabytes from backup, or just get the system back in order. This is your problem at hand, 'GO! I cannot tell you anything, I have no clue of the setup either...'
Fun times. ;) But back to the broken install.
Mounting a RAID drive directly won't work:
root@kali:~# mkdir asdf root@kali:~# mount /dev/sda1 asdf mount: unknown filesystem type 'linux_raid_member'
root@kali:~# mdadm -E /dev/sda1 bash: madadm: command not found
When it is installed, that is. After all, this is the live stick I used to setup the installation, so it must be somewhere:
root@kali:~# find / -iname mdadm /lib/live/mount/rootfs/filesystem.squashfs/usr/share/bash-completion/completions/mdadm /lib/live/mount/medium/pool/main/m/mdadm /usr/share/bash-completion/completions/mdadm root@kali:~# /lib/live/mount/medium/pool/main/m/mdadm bash: /lib/live/mount/medium/pool/main/m/mdadm: Is a directory root@kali:~# ls -alh /lib/live/mount/medium/pool/main/m/mdadm total 749k dr-xr-xr-x 1 root root 2.0K Mar 12 18:26 . dr-xr-xr-x 1 root root 2.0K Mar 12 18:26 .. -r--r--r-- 1 root root 192K Mar 12 18:26 mdadm-udeb_3.2.5-5_i386.udeb -r--r--r-- 1 root root 553K Mar 12 18:26 mdadm_3.2.5-5_i386.deb
Lets install the debian package:
root@kali:~# dpgk -i /lib/live/mount/medium/pool/main/m/mdadm/mdadm_3.2.5-5_i386.deb
Now back to the problem:
root@kali:~# mdadm -E /dev/sda1 /dev/sda1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : ab74df56:e0745791:d5cc011e:3792070a Name : vdr:0 Creation Time : Sat May 2 15:32:29 2015 Raid Level : raid1 Raid Devices : 2 Available Dev Size : 488017920 (232.71 GiB 249.87 GB) Array Size : 244008768 (232.70 GiB 249.86 GB) Used Dev Size : 488017536 (232.70 GiB 249.86 GB) Data Offset : 262144 sectors Super offset : 8 sectors State : active Device UUID : 6f44a60f:d035d2d9:643a3f9c:a5bb21ef Update Time : Sat May 2 16:24:42 2015 Checksum : 5a0897b6 - correct Events : 12 Device role : Active device 0 Array State : AA ('A' == active, '.' == missing)
This looks certainly better. For fun, you can look up the others if you know your layout, if you don't the layout you will have to anyway.
Try this, copy-paste, its the easiest way:
mdadm -E /dev/sd?? | grep -i -e /dev/ -e name -e device\ role -e raid\ devices -e state
Gives me this nice overview:
mdadm: No md superblock detected on /dev/sdh1. /dev/sda1: Name : vdr:0 Raid Devices : 2 Device role : Active device 0 Array State : AA ('A' == active, '.' == missing) /dev/sdb1: Name : vdr:0 Raid Devices : 2 Device role : Active device 1 Array State : AA ('A' == active, '.' == missing) /dev/sdc1: Name : vdr:1 Raid Devices : 2 Device role : Active device 0 Array State : AA ('A' == active, '.' == missing) /dev/sdh1: /dev/sdi1: Name : vdr:1 Raid Devices : 2 Device role : Active device 1 Array State : AA ('A' == active, '.' == missing)
Name is the array name, by the way, followed by the number of the array.
get the raid up so you can work on it
-A will assemble the raid,
-R makes it available as soon as it has enough drives to run,
-S stops it again.
You can only assemble fitting parts anyway:
root@kali:~# mdadm -A -R /dev/md0 /dev/sda1 /dev/sdi1 mdadm: superblock on /dev/sdi1 doesn't match others - assembly arborted
root@kali:~# mdadm -A -R /dev/md0 /dev/sda1 /dev/sdb1 mdadm: /dev/md0 has been started with 2 drives. root@kali:~# mdadm -A -R /dev/md1 /dev/sdc1 /dev/sdi1 mdadm: /dev/md1 has been started with 2 drives.
This is better. Now we have the raids back up:
root@kali:~# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 233.8G 0 disk `-sda1 8:1 0 232.9G 0 part `-md0 9:0 0 232.7G 0 raid1 sdb 8:16 0 233.8G 0 disk `-sdb1 8:17 0 232.9G 0 part `-md0 9:0 0 232.7G 0 raid1 sdc 8:32 0 298.1G 0 disk `-sdc1 8:33 0 298G 0 part `-md1 9:1 0 232.7G 0 raid1 sdh 8:112 0 3.8G 0 disk |-sdh1 8:113 0 2.9G 0 part /lib/live/mount/medium `-sdh2 8:114 0 61.9M 0 part /media/Kali Live sdi 8:128 0 372.6G 0 disk `-sdi1 8:129 0 298G 0 part `-md1 9:1 0 232.7G 0 raid1 sr0 11:0 1 1024M 0 rom loop0 7:0 0 2.6G 1 loop /lib/live/mount/rootfs/filesystem.squashfs root@kali:~#
In my case, I'd only need the
md0 device, as I know that root is on there.
But this is handled as if we knew nothing about, for illustration purposes.
Now have a look at what pandora's box has in store for you:
root@kali:~# mkdir asdf0 root@kali:~# mount /dev/md0 asdf0 mount: unknown filesystem type 'LVM2_member'
Oh well. Deja vu.
get LVM back up so you can work on it
Get an overview with
root@kali:~# pvscan PV dev/md1 VG vg_data lvm2 [297.96 GiB / 111.70 GiB free] PV dev/md0 VG vg_system lvm2 [232.70 GiB / 34.81 GiB free] Total: 2 [530.66 GiB] / in user: 2 [530.66 GiB] / in no VG: 0 [0 ] root@kali:~# lvscan inactive '/dev/vg_data/lv_data_var_backup' [93.13 GiB] inherit inactive '/dev/vg_data/lv_data_var_nfs' [93.13 GiB] inherit inactive '/dev/vg_system/lv_system_boot' [476.00 MiB] inherit inactive '/dev/vg_system/lv_system_root' [46.56 GiB] inherit inactive '/dev/vg_system/lv_system_var' [74.50 GiB] inherit inactive '/dev/vg_system/lv_system_var_test' [74.50 GiB] inherit inactive '/dev/vg_system/lv_system_swap' [1.86 GiB] inherit
For more information there are also
lvdisplay, which are like this:
root@kali:~# lvdisplay --- Logical volume --- LV Path /dev/vg_data/lv_data_var_backup LV Name lv_data_var_backup VG Name vg_data LV UUID f0C3o2-XUB1-5xkq-om5W-w0Kh-YwcX-752gkE LV Write Access read/write LV Creation host, time vdr, 2015-05-02 15:38:48 +0000 LV Status NOW available LV Size 93.13 GiB Current LE 23841 Segments 1 Allocation inherit Read ahead sectors auto --- Logical volume --- ...
There is also
lvs, providing rather short output, too, so you have even more options to chose from.
In our case life is easy, since I have the habit to name the Logical Volumes like
so there is less to keep in mind and to mix up. Plus you know which LV is home to what mountpoint. As far as I am concerned, there do no conventions exist?
Above all the logical volumes were marked as 'inactive', so we first have to activate them:
root@kali:~# vgchange -a y 2 logical volume(s) in volume group "vg_data" now active 5 logical volume(s) in volume group "vg_system" now active root@kali:~# lvscan ACTIVE '/dev/vg_data/lv_data_var_backup' [93.13 GiB] inherit ACTIVE '/dev/vg_data/lv_data_var_nfs' [93.13 GiB] inherit ACTIVE '/dev/vg_system/lv_system_boot' [476.00 MiB] inherit ACTIVE '/dev/vg_system/lv_system_root' [46.56 GiB] inherit ACTIVE '/dev/vg_system/lv_system_var' [74.50 GiB] inherit ACTIVE '/dev/vg_system/lv_system_var_test' [74.50 GiB] inherit ACTIVE '/dev/vg_system/lv_system_swap' [1.86 GiB] inherit
To disable would be, in case you'd need it:
vgchange -a n.
These commands can also be used for single volume groups, this is done by passing the VG name as a parameter.
mount and chroot into the installation to repair it
Now lets just mount the LV's needed to fix the install:
mkdir asdf-root mount /dev/vg_system/lv_system_root asdf-root chroot asdf-root
When trying to install the btrfs tools, another error occurs:
root@kali:/# apt-get install btrfs-tools -y E: Could not open lock file /var/lib/dpkg/log -open (2: No such file or directory) E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?
Well, after a quick look at
ls -alh and seeing it was empty, we might have to mount another LV.
exit, and then lets mount the other missing LV, and chroot into it again:
mount /dev/vg_system/lv_system_var asdf-root/var chroot asdf-root
ls -alh /var shows us something, and we should be able to
apt-get install btrfs-tools -y.
Followed by an
reboot plus removing the boot stick, the system should work now.
If still there were problems, mounting the 'system' partition might help. Since Kali is debian-based, see this:
mount -o bind /dev /mnt/rescue/dev mount -o bind /dev/pts /mnt/rescue/dev/pts mount -o bind /proc /mnt/rescue/proc mount -o bind /run /mnt/rescue/run mount -o bind /sys /mnt/rescue/sys
Reboot actually worked, only problem was, after the Grub welcome message on top, and before the grub menu:
error: fd0 read error.
... several times. Grub still boots, so this is not really an issue, not yet at least.
Grub can natively boot of raids, but I strongly suspect if
/dev/sda dies, the system will not boot, as it seems the bootloader is just installed on one partition.
grub and booting off software raid devices
Verifying this is easy: Turn off the computer and remove the SATA cable to the first hdd. Sure enough more things broke:
GRUB loading... Welcome to GRUB! error: out of partition. Entering rescue mode... grub rescue>
Awwww. To double test, lets power off the machine, plug in the first disk again and pull the plug on the second disk, and sure enough again, it worked this time.
So, fixing this is probably not as easy as the other stuff until now?
You cannot google much, or rather google much but not have good hits, its gross with this problem.
A solution I found, here: (Beware, its in German.)
grub-mkdevicemap -n update-grub grub-install /dev/sda grub-install /dev/sdb
Maybe just installing grub to the second disk might have been enough after all, but now sadly this doesn't cut it either.
Looks like I should not trust debian-based installers any more than I trust redhat-based ones. (Which I absolutely don't, since any slightly more complex setup will fail in anaconda...)
final result: all is in vain
Now it seems, the problem can only be fixed with a manual reinstall, as there are several caveats when running a bootable software raid.
The RAID superblock, which contains all information on how the RAID is constructed, and which is written on each of the member disks, will be created in the v1.2 of the metadata, which will be written to the head of the disk. This creates problems with grub.
When doing a manual install,
mdadm even asks if metadata version 0.90 should be used for a bootable device.
Oh well, fuck installers.
There will be another post coming, where the partitioning will be done by hand.
View posts from 2017-02, 2017-01, 2016-12, 2016-11, 2016-10, 2016-09, 2016-08, 2016-07, 2016-06, 2016-05, 2016-04, 2016-03, 2016-02, 2016-01, 2015-12, 2015-11, 2015-10, 2015-09, 2015-08, 2015-07, 2015-06, 2015-05, 2015-04, 2015-03, 2015-02, 2015-01, 2014-12, 2014-11, 2014-10, 2014-09, 2014-08, 2014-07, 2014-06, 2014-05, 2014-04, 2014-03, 2014-01, 2013-12, 2013-11, 2013-10