Previous Next Contents

7. Performance, Tools & General Bone-headed Questions

  1. Q: I've created a RAID-0 device on /dev/sda2 and /dev/sda3. The device is a lot slower than a single partition. Isn't md a pile of junk?
    A: To have a RAID-0 device running a full speed, you must have partitions from different disks. Besides, putting the two halves of the mirror on the same disk fails to give you any protection whatsoever against disk failure.
  2. Q: I have 2 Brand X super-duper hard disks and a Brand Y controller. and am considering using md. Does it significantly increase the throughput? Is the performance really noticeable?
    A: The answer depends on the configuration that you use.
    Linux MD RAID-0 (striping) performance:

    Must wait for all disks to read/write the stripe.

    Linux MD RAID-1 (mirroring) read performance:

    MD implements read balancing. In a low-IO situation, this won't change performance. But, with two disks in a high-IO environment, this could as much as double the read performance. For N disks in the mirror, this could improve performance N-fold.

    Linux MD RAID-1 (mirroring) write performance:

    Must wait for the write to occur to all of the disks in the mirror.

  3. Q: What is the optimal RAID-5 configuration for performance?
    A: Since RAID-5 attempts to equally distribute the I/O load across several drives, the best performance will be obtained when the RAID set is balanced by using identical drives, identical controllers, and the same (low) number of drives on each controller. Note, however, that using identical components might raise the probability of multiple drives failures.
  4. Q: What is the optimal block size for a RAID-4/5 array?
    A: When using the current (November 1997) RAID-4/5 implementation, it is strongly recommended that the file system be created with mke2fs -b 4096 instead of the default 1024 byte filesystem block size. This is because the current RAID-5 implementation allocates one 4K memory page per disk block; thus 75% of the memory which RAID-5 is allocating for pending I/O is not being used. With a 4096 block size, it will potentially queue 4 times as much pending I/O to the low level drivers without allocating additional memory. Note: the 4K memory page size applies to the Intel x86 architecture. I think memory pages are 8K on Alpha/Sparc (????), and thus the above figures should be adjusted accordingly. Note: if your file system has a lot of small files (files less than 10KBytes in size), a considerable fraction of the disk space might be wasted. This is because disk space is allocated in multiples of the block size. Allocating large blocks for small files clearly results in a waste of disk space. Note: the above remarks do NOT apply to Software RAID-0/1/linear. Note: most ''typical'' systems do not have that many small files. That is, although there might be thousands of small files, this would lead to only some 10 to 100MB wasted space, which is probably an acceptable tradeoff for performance on a multi-gigabyte disk. Note: for news servers, there might be tens or hundreds of thousands of small files. In such cases, the smaller block size may be more important than the improved performance. Note: there exists an experimental file system for Linux which packs small files and file chunks onto a single block. It apparently has some very positive performance implications when the average file size is much smaller than the block size. Note: Future versions may implement schemes that obsolete the above discussion. However, this is difficult to implement, since dynamic run-time allocation can lead to dead-locks; the current implementation performs a static pre-allocation.
  5. Q: How does the chunk size influence the speed of my RAID device?
    A: The chunk size is the amount of data contiguous on the virtual device that is also contiguous on the physical device. Depending on your workload, the best is to let the chunk size match the size of your requests, so two requests have chances to be on different disks, and to be run the same time. This suppose a lot of testing with different chunk sizes to match the average request size, and to have the best performances.
  6. Q: Where can I put the md commands in the startup scripts, so that everything will start automatically at boot time?
    A: Rod Wilkens < rwilkens@border.net> writes:
    What I did is put ``mdadd -ar'' in the ``/etc/rc.d/rc.sysinit'' right after the kernel loads the modules, and before the ``fsck'' disk check. This way, you can put the ``/dev/md?'' device in the ``/etc/fstab''. Then I put the ``mdstop -a'' right after the ``umount -a'' unmounting the disks, in the ``/etc/rc.d/init.d/halt'' file.
    For raid-5, you will want to look at the return code for mdadd, and if it failed, do a
    ckraid --fix /etc/raid5.conf
                
    
    to repair any damage.
  7. Q: I was wondering if it's possible to setup stripping with more than 2 devices in md0? This is for a news server, and I have 9 drives... Needless to say I need much more than two. Is this possible?
    A: Yes. (describe how to do this)
  8. Q: When is Software RAID superior to Hardware RAID?
    A: Normally, Hardware RAID is considered superior to Software RAID, because hardware controllers often have a large cache, and can do a better job of scheduling operations in parallel. However, integrated Software RAID can (and does) gain certain advantages from being close to the operating system. For example, ... ummm. Opaque description of caching of reconstructed blocks in buffer cache elided ... On a dual PPro SMP system, it has been reported that Software-RAID performance exceeds the performance of a well-known hardware-RAID board vendor by a factor of 2 to 5. Software RAID is also a very interesting option for high-availability redundant server systems. In such a configuration, two CPU's are attached to one set or SCSI disks. If one server crashes or fails to respond, then the other server can mdadd, mdrun and mount the software RAID array, and take over operations. This sort of dual-ended operation is not always possible with many hardware RAID controllers, because of the state configuration that the hardware controllers maintain.
  9. Q: If I upgrade my version of raidtools, will it have trouble manipulating older raid arrays? In short, should I recreate my RAID arrays when upgrading the raid utilities?
    A: No, not unless the major version number changes. An MD version x.y.z consists of three sub-versions:
         x:      Major version.
         y:      Minor version.
         z:      Patchlevel version.
                
    
    Version x1.y1.z1 of the RAID driver supports a RAID array with version x2.y2.z2 in case (x1 == x2) and (y1 >= y2). Different patchlevel (z) versions for the same (x.y) version are designed to be mostly compatible. The minor version number is increased whenever the RAID array layout is changed in a way which is incompatible with older versions of the driver. New versions of the driver will maintain compatibility with older RAID arrays. The major version number will be increased if it will no longer make sense to support old RAID arrays in the new kernel code. For RAID-1, it's not likely that the disk layout nor the superblock structure will change anytime soon. Most all Any optimization and new features (reconstruction, multithreaded tools, hot-plug, etc.) doesn't affect the physical layout.
  10. Q: The command mdstop /dev/md0 says that the device is busy.
    A: There's a process that has a file open on /dev/md0, or /dev/md0 is still mounted. Terminate the process or umount /dev/md0.
  11. Q: Are there performance tools?
    A: There is also a new utility called iotrace in the linux/iotrace directory. It reads /proc/io-trace and analyses/plots it's output. If you feel your system's block IO performance is too low, just look at the iotrace output.
  12. Q: I was reading the RAID source, and saw the value SPEED_LIMIT defined as 1024K/sec. What does this mean? Does this limit performance?
    A: SPEED_LIMIT is used to limit RAID reconstruction speed during automatic reconstruction. Basically, automatic reconstruction allows you to e2fsck and mount immediately after an unclean shutdown, without first running ckraid. Automatic reconstruction is also used after a failed hard drive has been replaced. In order to avoid overwhelming the system while reconstruction is occurring, the reconstruction thread monitors the reconstruction speed and slows it down if its too fast. The 1M/sec limit was arbitrarily chosen as a reasonable rate which allows the reconstruction to finish reasonably rapidly, while creating only a light load on the system so that other processes are not interfered with.
  13. Q: What about ''spindle synchronization'' or ''disk synchronization''?
    A: Spindle synchronization is used to keep multiple hard drives spinning at exactly the same speed, so that their disk platters are always perfectly aligned. This is used by some hardware controllers to better organize disk writes. However, for software RAID, this information is not used, and spindle synchronization might even hurt performance.


Previous Next Contents