Posts tagged btrfs

BTRFS: filesystem rebalance issues
posted on 2016-04-13 06:38

Sometimes a btrfs filesystem will show wrong usage when df -h is used. This can easily be several GB's. (In my particular case I had 1GB space left, after a rebalance it were 6GB...)

Usually a

btrfs filesystem balance /<path-if-needed>

is all you need to do.


root@sjas:/# btrfs fi bal /
ERROR: error during balancing '/': No space left on device
There may be more info in syslog - try dmesg | tail

May stop you short.


root@sjas:/# btrfs fi balance start -dusage=10 /
Done, had to relocate 0 out of 33 chunks

Little more explanation provided by the man page:

    usage=<percent>, usage=<range>
       Balances only block groups with usage under the given percentage. The value of 0 is allowed and will clean up
       completely unused block groups, this should not require any new work space allocated. You may want to use usage=0 in
       case balance is returnin ENOSPC and your filesystem is not too full.

       The argument may be a single value or a range. The single value N means at most N percent used, equivalent to ..N
       range syntax. Kernels prior to 4.4 accept only the single value format. The minimum range boundary is inclusive,
       maximum is exclusive.

Afterwards try re-running btrfs fi bal / for possibly even more available space, or use a bigger value for the usage flag.

btrfs subvolume folder list
posted on 2016-03-27 12:16

A list of good folder candidates for being placed within subvolumes being separate from the root filesystem is here:

  • /boot/grub2/*
  • /opt
  • /srv
  • /tmp
  • /usr/local
  • /var/crash
  • /var/lib/{mailman,named,pgsql,mysql}
  • /var/log
  • /var/opt
  • /var/spool
  • /var/tmp
linux software raid, raid levels, LVM, btrfs and Kali Linux
posted on 2015-05-02 16:27:18

preface and setup layout

After having had installed a fresh system based on Kali Linux with software raid and LVM, I had some fun. The setup consisted of four hdd's, partitions for /boot, /, /var , swap and some others for testing purposes, mostly btrfs, /boot was on a ext2. First two harddisks were designated as the system RAID, second two were to be the data RAID.

The harddisks were plugged into the SATA ports 1 to 4 in the right order (ports can always be identified via the prints on the motherboard), which was a good idea as we will see later. Out of habit I also took a photo of the partitioning scheme during the install when I was done, as this was a more complex setup. Both RAID levels were RAID1, nothing fancy.

Each of the RAID devices was in turn used as a LVM volume group, and each of the partitions mentioned above were a single logical volume. So /boot was a LVM partition on top of a software raid.

Well, I simply hoped this would boot after this setup was chosen. ;)

excourse on used RAID levels

On a sidenote, usually I only put RAID1 (mirrored) and RAID10 (striped sets of mirrors) levels to use. RAID5 allows one disk to fail in the array, RAID6 two. With the current sizes of two, three or even four terabytes, and six also being already shipped, just think of the amount of time needed to rebuild a RAID5 with 10TB, which should take quite a while, when two TB already take days to finish.

Considering most people do not mix harddisks but just take them one after another out of the box the mailman sent them, these are very likely quite similar. Same model, from the same production unit or time slot, with likely similar life expectancies. Rebuilds further take their toll on the hardware, as they impose an intense workload upon the disks. Besides, in a RAID 10 data is copied straight from one disk to its partner, whereas in a RAID5 ALL disks are read, plus parity has to be calculated. This fucks up the performance of the drives during rebuilds.

I do not feel good about a rebuild stressing the array over a time span of like weeks which it takes the system until it finishes, sitting only on top of a lousy RAID5 during the process, where another missing disk means all is lost.

A RAID5, where the failure probability increases with each disk, as does the time to rebuild. RAID6 will mitigate this somehow, but just think of the time and work the rebuild takes. And if your data goes down the drain, think of what the customer will tell you when he's missing 20TB?

A RAID10 with two failed disks is already among my experiences, both went out in quick order in that case, like within two days.

Lucky me, their were on different legs of the RAID0. So what did the situation feel like?

All data was backupped. The backups are actively being tested and thus working most of the time. All storage capacity summed up to just six TB. And these were only 2TB drives, which were synced within days, not within two weeks, compared to if it had been a RAID5/6 setup.

I still dread the memory, it was a Hypervisor for several customers. Brave new virtualized world.

You may ask, why no external storage? Getting a dothill or an EMC2 storage is simply several thousand euros, and why not use an already existing 8bay Server with local storage? RAID1 for the system, leaves six disks for data, with two TB drives sums up to 6TB capacity, which is a nice use case for slightly aged hardware.

Besides, these setups can also be sold more easily, they are simply cost efficient. Plus you do not have two digit terabyte amounts of data to sync.

Here's a link from Jan 2014 to show the level of importance storage already had last year.

Some time after this posting I found some additional info on this, from someone I have never heard of:

zless /usr/share/doc/mdadm/RAID5_versus_RAID10.txt.gz

(You may have to have mdadm installed, though I do not know for sure.)

But I disgress, back to the story.

boot failed, ofc

Booting the system afterwards failed with errors, and the root partition was formatted as btrfs, too.

That the RAID status was not ok was a minor issue, as the RAID was just not synced yet.


fsck: fsck.btrfs: not found
fsck: error2 while executing fsck.btrfs for /root/rundev
fsck: died with exit status 8
failed (code 8).

was really a problem.

Sadly, the btrfs-tools package being missing was the culprit. This could be found out through having a look at the fsck tools, seeing that not btrfs stuff is present, and googling the problem. Google also helps for finding the right package name, we have to install.


get to know the storage geometry

Reboot with a live disk, and having written down / photographed my layout previously, I knew where to start.

Figure this out, in case you have not watched your cabling or no info on partitions or software raids:

root@kali:~# lsblk
sda      8:0    0 233.8G  0 disk
`-sda1   8:1    0 232.9G  0 part
sdb      8:16   0 233.8G  0 disk
`-sdb1   8:17   0 232.9G  0 part
sdc      8:32   0 298.1G  0 disk
`-sdc1   8:33   0   298G  0 part
sdh      8:112  0   3.8G  0 disk
|-sdh1   8:113  0   2.9G  0 part /lib/live/mount/medium
`-sdh2   8:114  0  61.9M  0 part /media/Kali Live
sdi      8:128  0 372.6G  0 disk
`-sdi1   8:129  0   298G  0 part
sr0     11:0    1  1024M  0 rom
loop0    7:0    0   2.6G  1 loop /lib/live/mount/rootfs/filesystem.squashfs

Knowing we have two software raids, sda1 and sdb1 seem to be related, as are sdc1 and sdi1. The actual device sizes don't help you much, as mixed hardware was used. Something you'll also encounter out there in the wild, standard procedure.

This may lead to interesting situations:

Like at 4am in the night, with you of course being on call:
You already pulled out and set up new hardware and then you realize the system just won't boot. You can either restore too-many terabytes from backup, or just get the system back in order. This is your problem at hand, 'GO! I cannot tell you anything, I have no clue of the setup either...'

Fun times. ;) But back to the broken install.

Mounting a RAID drive directly won't work:

root@kali:~# mkdir asdf
root@kali:~# mount /dev/sda1 asdf
mount: unknown filesystem type 'linux_raid_member'

mdadm helps:

root@kali:~# mdadm -E /dev/sda1
bash: madadm: command not found

When it is installed, that is. After all, this is the live stick I used to setup the installation, so it must be somewhere:

root@kali:~# find / -iname mdadm
root@kali:~# /lib/live/mount/medium/pool/main/m/mdadm
bash: /lib/live/mount/medium/pool/main/m/mdadm: Is a directory
root@kali:~# ls -alh /lib/live/mount/medium/pool/main/m/mdadm
total 749k
dr-xr-xr-x 1 root root 2.0K Mar 12 18:26 .
dr-xr-xr-x 1 root root 2.0K Mar 12 18:26 ..
-r--r--r-- 1 root root 192K Mar 12 18:26 mdadm-udeb_3.2.5-5_i386.udeb
-r--r--r-- 1 root root 553K Mar 12 18:26 mdadm_3.2.5-5_i386.deb

Lets install the debian package:

root@kali:~# dpgk -i /lib/live/mount/medium/pool/main/m/mdadm/mdadm_3.2.5-5_i386.deb 

Now back to the problem:

root@kali:~# mdadm -E /dev/sda1
             Magic : a92b4efc
           Version : 1.2
       Feature Map : 0x0
        Array UUID : ab74df56:e0745791:d5cc011e:3792070a
              Name : vdr:0
     Creation Time : Sat May  2 15:32:29 2015
        Raid Level : raid1
      Raid Devices : 2

Available Dev Size : 488017920 (232.71 GiB 249.87 GB)
        Array Size : 244008768 (232.70 GiB 249.86 GB)
     Used Dev Size : 488017536 (232.70 GiB 249.86 GB)
       Data Offset : 262144 sectors
      Super offset : 8 sectors
             State : active
       Device UUID : 6f44a60f:d035d2d9:643a3f9c:a5bb21ef

       Update Time : Sat May  2 16:24:42 2015
          Checksum : 5a0897b6 - correct
            Events : 12

       Device role : Active device 0
       Array State : AA ('A' == active, '.' == missing)

This looks certainly better. For fun, you can look up the others if you know your layout, if you don't the layout you will have to anyway.

Try this, copy-paste, its the easiest way:

mdadm -E /dev/sd?? | grep -i -e /dev/ -e name -e device\ role -e raid\ devices -e state

Gives me this nice overview:

mdadm: No md superblock detected on /dev/sdh1.
              Name : vdr:0
      Raid Devices : 2
       Device role : Active device 0
       Array State : AA ('A' == active, '.' == missing)
              Name : vdr:0
      Raid Devices : 2
       Device role : Active device 1
       Array State : AA ('A' == active, '.' == missing)
              Name : vdr:1
      Raid Devices : 2
       Device role : Active device 0
       Array State : AA ('A' == active, '.' == missing)
              Name : vdr:1
      Raid Devices : 2
       Device role : Active device 1
       Array State : AA ('A' == active, '.' == missing)

Name is the array name, by the way, followed by the number of the array.

get the raid up so you can work on it

-A will assemble the raid, -R makes it available as soon as it has enough drives to run, -S stops it again. You can only assemble fitting parts anyway:

root@kali:~# mdadm -A -R /dev/md0 /dev/sda1 /dev/sdi1
mdadm: superblock on /dev/sdi1 doesn't match others - assembly arborted


root@kali:~# mdadm -A -R /dev/md0 /dev/sda1 /dev/sdb1
mdadm: /dev/md0 has been started with 2 drives.
root@kali:~# mdadm -A -R /dev/md1 /dev/sdc1 /dev/sdi1
mdadm: /dev/md1 has been started with 2 drives.

This is better. Now we have the raids back up:

root@kali:~# lsblk
sda      8:0    0 233.8G  0 disk
`-sda1   8:1    0 232.9G  0 part
  `-md0  9:0    0 232.7G  0 raid1
sdb      8:16   0 233.8G  0 disk
`-sdb1   8:17   0 232.9G  0 part
  `-md0  9:0    0 232.7G  0 raid1
sdc      8:32   0 298.1G  0 disk
`-sdc1   8:33   0   298G  0 part
  `-md1  9:1    0 232.7G  0 raid1
sdh      8:112  0   3.8G  0 disk
|-sdh1   8:113  0   2.9G  0 part /lib/live/mount/medium
`-sdh2   8:114  0  61.9M  0 part /media/Kali Live
sdi      8:128  0 372.6G  0 disk
`-sdi1   8:129  0   298G  0 part
  `-md1  9:1    0 232.7G  0 raid1
sr0     11:0    1  1024M  0 rom
loop0    7:0    0   2.6G  1 loop /lib/live/mount/rootfs/filesystem.squashfs

In my case, I'd only need the md0 device, as I know that root is on there. But this is handled as if we knew nothing about, for illustration purposes.

Now have a look at what pandora's box has in store for you:

root@kali:~# mkdir asdf0
root@kali:~# mount /dev/md0 asdf0
mount: unknown filesystem type 'LVM2_member'

Oh well. Deja vu.

get LVM back up so you can work on it

Get an overview with pvscan, vgscan and lvscan:

root@kali:~# pvscan
  PV dev/md1   VG vg_data     lvm2 [297.96 GiB / 111.70 GiB free]
  PV dev/md0   VG vg_system   lvm2 [232.70 GiB / 34.81 GiB free]
  Total: 2 [530.66 GiB] / in user: 2 [530.66 GiB] / in no VG: 0 [0   ]

root@kali:~# lvscan
  inactive          '/dev/vg_data/lv_data_var_backup' [93.13 GiB] inherit
  inactive          '/dev/vg_data/lv_data_var_nfs' [93.13 GiB] inherit
  inactive          '/dev/vg_system/lv_system_boot' [476.00 MiB] inherit
  inactive          '/dev/vg_system/lv_system_root' [46.56 GiB] inherit
  inactive          '/dev/vg_system/lv_system_var' [74.50 GiB] inherit
  inactive          '/dev/vg_system/lv_system_var_test' [74.50 GiB] inherit
  inactive          '/dev/vg_system/lv_system_swap' [1.86 GiB] inherit

For more information there are also pvdisplay, vgdisplay and lvdisplay, which are like this:

root@kali:~# lvdisplay
  --- Logical volume ---
  LV Path                /dev/vg_data/lv_data_var_backup
  LV Name                lv_data_var_backup
  VG Name                vg_data
  LV UUID                f0C3o2-XUB1-5xkq-om5W-w0Kh-YwcX-752gkE
  LV Write Access        read/write
  LV Creation host, time vdr, 2015-05-02 15:38:48 +0000
  LV Status              NOW available
  LV Size                93.13 GiB
  Current LE             23841
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto

  --- Logical volume ---


There is also pvs, vgs and lvs, providing rather short output, too, so you have even more options to chose from.

In our case life is easy, since I have the habit to name the Logical Volumes like


so there is less to keep in mind and to mix up. Plus you know which LV is home to what mountpoint. As far as I am concerned, there do no conventions exist?

Above all the logical volumes were marked as 'inactive', so we first have to activate them:

root@kali:~# vgchange -a y
    2 logical volume(s) in volume group "vg_data" now active
    5 logical volume(s) in volume group "vg_system" now active

root@kali:~# lvscan
  ACTIVE            '/dev/vg_data/lv_data_var_backup' [93.13 GiB] inherit
  ACTIVE            '/dev/vg_data/lv_data_var_nfs' [93.13 GiB] inherit
  ACTIVE            '/dev/vg_system/lv_system_boot' [476.00 MiB] inherit
  ACTIVE            '/dev/vg_system/lv_system_root' [46.56 GiB] inherit
  ACTIVE            '/dev/vg_system/lv_system_var' [74.50 GiB] inherit
  ACTIVE            '/dev/vg_system/lv_system_var_test' [74.50 GiB] inherit
  ACTIVE            '/dev/vg_system/lv_system_swap' [1.86 GiB] inherit

To disable would be, in case you'd need it: vgchange -a n.

These commands can also be used for single volume groups, this is done by passing the VG name as a parameter.

mount and chroot into the installation to repair it

Now lets just mount the LV's needed to fix the install:

mkdir asdf-root
mount /dev/vg_system/lv_system_root asdf-root
chroot asdf-root

When trying to install the btrfs tools, another error occurs:

root@kali:/# apt-get install btrfs-tools -y
E: Could not open lock file /var/lib/dpkg/log -open (2: No such file or directory)
E: Unable to lock the administration directory (/var/lib/dpkg/), are you root?

Well, after a quick look at /var via ls -alh and seeing it was empty, we might have to mount another LV.

Exit the chroot via exit, and then lets mount the other missing LV, and chroot into it again:

mount /dev/vg_system/lv_system_var asdf-root/var
chroot asdf-root

Now ls -alh /var shows us something, and we should be able to apt-get install btrfs-tools -y.

Followed by an exit and reboot plus removing the boot stick, the system should work now.

If still there were problems, mounting the 'system' partition might help. Since Kali is debian-based, see this:

mount -o bind /dev /mnt/rescue/dev
mount -o bind /dev/pts /mnt/rescue/dev/pts
mount -o bind /proc /mnt/rescue/proc
mount -o bind /run /mnt/rescue/run 
mount -o bind /sys /mnt/rescue/sys

another test

Reboot actually worked, only problem was, after the Grub welcome message on top, and before the grub menu:

error: fd0 read error.

... several times. Grub still boots, so this is not really an issue, not yet at least.

Grub can natively boot of raids, but I strongly suspect if /dev/sda dies, the system will not boot, as it seems the bootloader is just installed on one partition.

grub and booting off software raid devices

Verifying this is easy: Turn off the computer and remove the SATA cable to the first hdd. Sure enough more things broke:

GRUB loading...
Welcome to GRUB!

error: out of partition.
Entering rescue mode...
grub rescue>

Awwww. To double test, lets power off the machine, plug in the first disk again and pull the plug on the second disk, and sure enough again, it worked this time.

So, fixing this is probably not as easy as the other stuff until now?
You cannot google much, or rather google much but not have good hits, its gross with this problem.

A solution I found, here: (Beware, its in German.)

grub-mkdevicemap -n
grub-install /dev/sda
grub-install /dev/sdb

Maybe just installing grub to the second disk might have been enough after all, but now sadly this doesn't cut it either.

Looks like I should not trust debian-based installers any more than I trust redhat-based ones. (Which I absolutely don't, since any slightly more complex setup will fail in anaconda...)

final result: all is in vain

Now it seems, the problem can only be fixed with a manual reinstall, as there are several caveats when running a bootable software raid.

The RAID superblock, which contains all information on how the RAID is constructed, and which is written on each of the member disks, will be created in the v1.2 of the metadata, which will be written to the head of the disk. This creates problems with grub.

When doing a manual install, mdadm even asks if metadata version 0.90 should be used for a bootable device. Oh well, fuck installers.

There will be another post coming, where the partitioning will be done by hand.

This blog covers .csv, .htaccess, .pfx, .vmx, /etc/crypttab, /etc/network/interfaces, /etc/sudoers, /proc, 10.04, 14.04, AS, ASA, ControlPanel, DS1054Z, GPT, HWR, Hyper-V, IPSEC, KVM, LSI, LVM, LXC, MBR, MTU, MegaCli, PHP, PKI, R, RAID, S.M.A.R.T., SNMP, SSD, SSL, TLS, TRIM, VEEAM, VMware, VServer, VirtualBox, Virtuozzo, XenServer, acpi, adaptec, algorithm, ansible, apache, apachebench, apple, arcconf, arch, architecture, areca, arping, asa, asdm, awk, backup, bandit, bar, bash, benchmarking, binding, bitrate, blackarmor, blowfish, bochs, bond, bonding, booknotes, bootable, bsd, btrfs, buffer, c-states, cache, caching, ccl, centos, certificate, certtool, cgdisk, cheatsheet, chrome, chroot, cisco, clamav, cli, clp, clush, cluster, coleslaw, colorscheme, common lisp, console, container, containers, controller, cron, cryptsetup, csync2, cu, cups, cygwin, d-states, database, date, db2, dcfldd, dcim, dd, debian, debug, debugger, debugging, decimal, desktop, df, dhclient, dhcp, diff, dig, display manager, dm-crypt, dmesg, dmidecode, dns, docker, dos, drivers, dtrace, dtrace4linux, du, dynamictracing, e2fsck, eBPF, ebook, efi, egrep, emacs, encoding, env, error, ess, esx, esxcli, esxi, ethtool, evil, expect, exportfs, factory reset, factory_reset, factoryreset, fail2ban, fbsd, fedora, file, filesystem, find, fio, firewall, firmware, fish, flashrom, forensics, free, freebsd, freedos, fritzbox, fsck, fstrim, ftp, ftps, g-states, gentoo, ghostscript, git, git-filter-branch, github, gitolite, gnutls, gradle, grep, grml, grub, grub2, guacamole, hardware, haskell, hdd, hdparm, hellowor, hex, hexdump, history, howto, htop, htpasswd, http, httpd, https, i3, icmp, ifenslave, iftop, iis, imagemagick, imap, imaps, init, innoDB, inodes, intel, ioncube, ios, iostat, ip, iperf, iphone, ipmi, ipmitool, iproute2, ipsec, iptables, ipv6, irc, irssi, iw, iwconfig, iwlist, iwlwifi, jailbreak, jails, java, javascript, javaws, js, juniper, junit, kali, kde, kemp, kernel, keyremap, kill, kpartx, krypton, lacp, lamp, languages, ldap, ldapsearch, less, leviathan, liero, lightning, links, linux, linuxin3months, lisp, list, livedisk, lmctfy, loadbalancing, locale, log, logrotate, looback, loopback, losetup, lsblk, lsi, lsof, lsusb, lsyncd, luks, lvextend, lvm, lvm2, lvreduce, lxc, lxde, macbook, macro, magento, mailclient, mailing, mailq, manpages, markdown, mbr, mdadm, megacli, micro sd, microsoft, minicom, mkfs, mktemp, mod_pagespeed, mod_proxy, modbus, modprobe, mount, mouse, movement, mpstat, multitasking, myISAM, mysql, mysql 5.7, mysql workbench, mysqlcheck, mysqldump, nagios, nas, nat, nc, netfilter, networking, nfs, nginx, nmap, nocaps, nodejs, numberingsystem, numbers, od, onyx, opcode-cache, openVZ, openlierox, openssl, openvpn, openvswitch, openwrt, oracle linux, org-mode, os, oscilloscope, overview, parallel, parameter expansion, parted, partitioning, passwd, patch, pdf, performance, pfsense, php, php7, phpmyadmin, pi, pidgin, pidstat, pins, pkill, plesk, plugin, posix, postfix, postfixadmin, postgres, postgresql, poudriere, powershell, preview, profiling, prompt, proxmox, ps, puppet, pv, pvecm, pvresize, python, qemu, qemu-img, qm, qmrestore, quicklisp, r, racktables, raid, raspberry pi, raspberrypi, raspbian, rbpi, rdp, redhat, redirect, registry, requirements, resize2fs, rewrite, rewrites, rhel, rigol, roccat, routing, rs0485, rs232, rsync, s-states, s_client, samba, sar, sata, sbcl, scite, scp, screen, scripting, seafile, seagate, security, sed, serial, serial port, setup, sftp, sg300, shell, shopware, shortcuts, showmount, signals, slattach, slip, slow-query-log, smbclient, snmpget, snmpwalk, software RAID, software raid, softwareraid, sophos, spacemacs, spam, specification, speedport, spi, sqlite, squid, ssd, ssh, ssh-add, sshd, ssl, stats, storage, strace, stronswan, su, submodules, subzone, sudo, sudoers, sup, swaks, swap, switch, switching, synaptics, synergy, sysfs, systemd, systemtap, tar, tcpdump, tcsh, tee, telnet, terminal, terminator, testdisk, testing, throughput, tmux, todo, tomcat, top, tput, trafficshaping, ttl, tuning, tunnel, tunneling, typo3, uboot, ubuntu, ubuntu 16.04, udev, uefi, ulimit, uname, unetbootin, unit testing, upstart, uptime, usb, usbstick, utf8, utm, utm 220, ux305, vcs, vgchange, vim, vimdiff, virtualbox, virtualization, visual studio code, vlan, vmstat, vmware, vnc, vncviewer, voltage, vpn, vsphere, vzdump, w, w701, wakeonlan, wargames, web, webdav, weechat, wget, whois, wicd, wifi, windowmanager, windows, wine, wireshark, wpa, wpa_passphrase, wpa_supplicant, x2x, xfce, xfreerdp, xmodem, xterm, xxd, yum, zones, zsh

View posts from 2017-02, 2017-01, 2016-12, 2016-11, 2016-10, 2016-09, 2016-08, 2016-07, 2016-06, 2016-05, 2016-04, 2016-03, 2016-02, 2016-01, 2015-12, 2015-11, 2015-10, 2015-09, 2015-08, 2015-07, 2015-06, 2015-05, 2015-04, 2015-03, 2015-02, 2015-01, 2014-12, 2014-11, 2014-10, 2014-09, 2014-08, 2014-07, 2014-06, 2014-05, 2014-04, 2014-03, 2014-01, 2013-12, 2013-11, 2013-10

Unless otherwise credited all material Creative Commons License by sjas