Detect and swap out a failing Linux hard drive

Detecting

  • Configure smartd to send email for a device:
/dev/sda -a -m youremail@example.com
  • Configure your server to use msmtp to send email and put an .msmtprc in /root with your mailserver settings
#/etc/mail.rc

set sendmail="/usr/bin/msmtp -t"

Oh no, you're getting errors

  • Offline unrecoverable sector errors and current pending sectors: the drive is losing places where it can read/write, and you likely have data corruption
  • Error count is going up: replace the drive

The process

  • Buy a new hard drive
  • Have a USB → SATA adapter of some sort
  • Partition the new drive to be the same as the old drive
    • Swap partition? mkswap /dev/swap-partition
  • Mount the new partitions on wherever (like /mnt)
  • rsync -vra --progress /old/partition/* /old/partition/.??* /new/partition
  • Get the blkid of the new drive/partitions
  • Edit /etc/fstab and replace the existing UUIDs or PARTUUIDs with the new ones
  • Reboot the machine
  • Everything should work
    • This is because you're using block ids and not /dev/sd? in /etc/fstab
  • Power down the machine
  • Swap the drives
    • Bonus: Label maker to label the new drive with its partitions
  • Power up the machine
  • You should be good to go

Destroying the old drive

  • shred /dev/sd? -z should be good enough (3 passes random data, one pass zeroes), unless you're really paranoid