2009-02-06

Booting fakeraid RAID5 Linux, the half assed way

OK, by now you have a whole Linux system sitting idly on one of your RAID5 device mapper partitions. First thing you want to do then is edit the etc/fstab there to have the root filesysten point to the right device (in our case /dev/mapper/nvidia_aeejfdbep2). Not that it's actually necessary considered how we are going to boot, but it can't hurt.

If you've done your groundwork, you found out by now that using GRUB or LILO as is won't be of much help, as none of them is able to handle a RAID5 device mapper array. We don't have a choice her but have to run dmraid -ay before we try to access the disks, and of course, that means we need to build an initrd image.
As a side note, RAID1 or RAID0 shouldn't be an issue with GRUB, so you can probably follow this tutorial and complete the bootloader installation on your main RAID5 array, and have it work.

Now, our final solution here will still require an external partition with a /boot directory that the bootloader can refer to at boottime, but for the sake of this exercise, and also for our final solution, which will no longer require an external non RAID5 partition (more about that in the next post), we're gonna setup our RAID5 Linux system as close to standalone as possible, which means that what we ultimately want is the ability to run the bootloader from our RAID5 system (so that the day we have a bootloader that properly handles RAID5, we can just install it on the RAID5 MBR rather than the external disk, and ditch the latter in a heartbeat).
Unfortunatley, this means that we'll need to use GRUB rather than LILO, so we'll break things in 2 parts: First we'll configure initrd and setup LILO from the non RAID Linux to boot our RAID, and once we're there we'll setup GRUB to boot our RAID from the RAIDed Linux (but still using the non RAIDed boot). If you're confused, just hang on.

Part 1: Setting up initrd to boot the RAID5 Linux

Well, time and time again, I find more compelling reasons to use Slackware, the last one being /boot/README.initrd, which is installed by default, and in which Patrick Volkerding tells you anything you want to know about how to create initrd. Not that there is much you need to know in the end as:
root@stella# cd /boot
root@stella# mkinitrd -c
is all you need for now. The mkninitrd will have created a brand new initrd-tree and initrd.gz for you - isn't that nice?

Now, obviously, we need to add our dmraid executable to the initrd-tree and recreate initrd.gz. But if you're thinking "fine, we'll just pick up the exec", think again!
The dmraid executable we built was the dynamically linked version, so if you strace the files used, you'll see that we're gonna have to copy a whole bunch of libraries as well:
root@stella# strace -e trace=file dmraid -ay 2>&1 | more
execve("/sbin/dmraid", ["dmraid", "-ay"], [/* 35 vars */]) = 0
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY) = 3
open("/lib/libdevmapper.so.1.02", O_RDONLY) = 3
open("/lib/libc.so.6", O_RDONLY) = 3
open("/proc/mounts", O_RDONLY) = 3
(...)
RAID set "nvidia_aeejfdbe" already active
RAID set "nvidia_aeejfdbep1" already active
RAID set "nvidia_aeejfdbep2" already active
RAID set "nvidia_aeejfdbep3" already active
NB: The removed output above has to do with /proc, /dev or /sys, which won't be an issue.

The smart way (or lazy way, which is even better), is to compile dmraid as a static binary, so that we don't have to care about those pesky libraries. Therefore:
root@stella# cd /usr/src/dmraid/1.0.0.rc15/
root@stella# make clean
root@stella# ./configure --enable-static_link
root@stella# make
root@stella# ls -alF tools/dmraid /sbin/dmraid
-rwxr-xr-x 1 root root 204658 2009-02-05 23:01 /sbin/dmraid*
-rwxr-xr-x 1 root root 816664 2009-02-06 16:47 tools/dmraid*
root@stella# cp tools/dmraid /boot/initrd-tree/sbin/
That's 600 KB more to our initrd right there, but at least we know that we have everything we need.
Now, all that's left is editing the init script in /boot/initrd-tree to call our command.
Depending on your distro, the name & content of the init script could be very different, so you might have to be creative. In the case of slackware, the init script is called "init" (which makes sense, because then you don't have to specify it as a kernel parameter), and in the
if [ "$RESCUE" = "" ]; then
section, which already contains some disk detection routines, we're gonna add:
  # Initialize DMRAID:
if [ -x /sbin/dmraid ]; then
/sbin/dmraid -ay
fi
Now, we shall rebuild our initrd:
mkinitrd -r /dev/mapper/nvidia_aeejfdbep2

The stage is now setup to see if we can boot our RAID5 system using LILO, by adding the following section in /etc/lilo.conf:
image = /boot/vmlinuz
initrd = /boot/initrd.gz
root = /dev/mapper/nvidia_aeejfdbep2
label = RAID5_Linux
read-only
Now, reinstall LILO:
root@stella# lilo
Warning: '/proc/partitions' does not match '/dev' directory structure.
Name change: '/dev/dm-0' -> '/dev/disk/by-name/nvidia_aeejfdbe'
Warning: Name change: '/dev/dm-1' -> '/dev/disk/by-label/Vista64'
Warning: Name change: '/dev/dm-2' -> '/dev/disk/by-name/nvidia_aeejfdbep2'
Warning: Name change: '/dev/dm-3' -> '/dev/disk/by-label/Media'
Added RAID5_Linux *
Added Slack_New
Added Slack_Old
Added Rescue_1
4 warnings were issued.

Don't worry too much about the warnings. Just reboot and, yay, it works!... Err, well... kinda, because while you should have seen the dmraid disks being mapped, and you do end up with everything running as it should from the RAID5 partition, you might end up with a boot where the rc.d/init scripts from Slackware are not being displayed on the console at bootime as they should.
If you look in /var/log/messages, you will see that all the scripts do indeed run, but you're being left with a very silent screen right before you end up with the prompt.

The problem is actually due to udev screwing up the /dev repository after the root filesystem is mounted. The solution to that? Keep the /dev used by the kernel even after root is mounted by compiling your kernel with:
Device Drivers ---> Generic Driver Options ---> "Create a kernel maintained /dev tmpfs (EXPERIMENTAL)" and "Automount devtmpfs at /dev" bot selected.

Part 2: Setting up GRUB on the RAID5 Linux partition

There are 2 (well 3) reasons why we want to replace LILO with GRUB here:
1. GRUB is more likely to be patched for full RAID5 support compared to LILO, so when that happens, we want to be ready
2. If you're using RAID0/RAID1 instead of RAID5, GRUB should actually be able to install the bootloader on your RAID array
3. Having GRUB handle RAID takes some trickery which you probably want to read about.

Once more, we'll be using grub 0.97, however, if you use the vanilla version, no matter what you do or where your /boot partition might be located (even on a standard non-RAID disk), you might end up with the infamous:
grub> setup (hd0)
Checking if "/boot/grub/stage1" exists... no
Checking if "/grub/stage1" exists... no

Error 2: Bad file or directory type
AFAIK, this could be due to GRUB 0.97 being unable to access ext3 256 bytes inodes, OR it could happen if you have more than 2 GB RAM, or it could have to do with the 2.6 kernel new geometry. Well, all I know is that one of the grub 0.97 patches from Debian fixes the problem. Thus:
4) Reinstall grub on RAID:
root@stella# wget ftp://alpha.gnu.org/gnu/grub/grub-0.97.tar.gz
root@stella# tar -xvzf grub-0.97.tar.gz
root@stella# cd grub-0.97
root@stella# wget http://ftp.de.debian.org/debian/pool/main/g/grub/grub_0.97-47lenny2.diff.gz
root@stella# gunzip grub_0.97-47lenny2.diff.gz
root@stella# patch -p1 < grub_0.97-47lenny2.diff
root@stella# cat debian/patches/00list
OK, that last line gives us the order in which we should do the installation of the patches, so from there on you just need to run a bunch of:
patch -p1 < whatever.patch
In the order provided from the 00list file (i.e. starting with cvs-sync.patch and ending with use_grub-probe_in_grub-install.diff). Once you're there, just compile and install grub so that we move to the final phase. Now, we'll still tell GRUB to use our /boot directory on /dev/sdc1, because we don't really have a choice here (if you don't believe me, you can try installing on RAID5 and see your 'setup (hd#)' command fail miserably), but we will also tell it how to "see" our RAID5 array, and to be able to do that, we will need to know our disk geometry, which we can get from fdisk. What we want are the C(ylinders) H(eads) and S(sectors) value:
root@stella# fdisk /dev/mapper/nvidia_aeejfdbe
Command (m for help): p

Disk /dev/mapper/nvidia_aeejfdbe: 2000.4 GB, 2000409722880 bytes
255 heads, 63 sectors/track, 243202 cylinders
These days, most disks have 255 heads and 63 tracks anyway (which are the greatest values you can set), so what you really need is the number of cylinders. This we will use to provide the geometry of our RAID5 "disk" to GRUB, in C H S order, because it is unable to figure it our by itself. And also, since we are doing a GRUB installation from scratch, we have to copy the stage1 & stage2 files to the /boot/grub directory (which is information that the clueless people using ready made packages are apparently unable to provide - damn you Ubuntu!):
mount /dev/sdc1 /mnt/hd
mkdir /mnt/hd/boot/grub
cp /usr/local/lib/grub/i386-pc/* /mnt/hd/boot/grub/
grub --device-map=/dev/null
grub> device (hd0) /dev/sdc

grub> device (hd1) /dev/mapper/nvidia_aeejfdbe

grub> geometry (hd1) 243202 255 63
drive 0x81: C/H/S = 243202/255/63, The number of sectors = -387927166, /dev/map
per/nvidia_aeejfdbe
Partition num: 0, Filesystem type unknown, partition type 0x7
Partition num: 1, Filesystem type is ext2fs, partition type 0x83
Partition num: 2, Filesystem type unknown, partition type 0x7

grub> find /boot/grub/stage1
(hd0,0)

grub> root (hd0,0)
Filesystem type is ext2fs, partition type 0x83

grub> setup (hd0)
Checking if "/boot/grub/stage1" exists... yes
Checking if "/boot/grub/stage2" exists... yes
Checking if "/boot/grub/e2fs_stage1_5" exists... yes
Running "embed /boot/grub/e2fs_stage1_5 (hd0)"... 16 sectors are embedded.
succeeded
Running "install /boot/grub/stage1 (hd0) (hd0)1+16 p (hd0,0)/boot/grub/stage2
/boot/grub/menu.lst"... succeeded
Done.
Good, now we're ready to create our boot menu:
vi /mnt/hd/boot/grub/menu.lst

default 0
timeout 3

title Vista (64 bit)
rootnoverify (hd1,0)
chainloader +1

title Slackware 12.2 (RAID5)
root (hd1,1)
kernel (hd0,0)/boot/vmlinuz root=/dev/mapper/nvidia_aeejfdbep2
initrd (hd0,0)/boot/initrd.gz
This allows us to boot both Vista and Slackware on the RAID5 array using the /boot partition on the non RAID disk. Note that at this stage, it is probably a good idea to duplicate the /boot directory from the non RAID to the RAID partition, and recreate a small boot partition from scratch on the non RAID.

Of course, as mentionned before, because we still need a non RAID HDD, this is an half assed solution. In the next post, we'll see a better assed solution, where we do away with that extra HDD, and where we'll explore some new interesting stuff...

No comments:

Post a Comment

Note: only a member of this blog may post a comment.