Hetzner - DokuWiki

Hot Swapping/en

Inhaltsverzeichnis

Hot-Swapping

Hot-swapping makes it possible to replace hard drives while the server is still running in order to minimize downtime in the event of a hard drive failure.

Compatibility

Several of our servers are hot-swappable. With these servers, it is possible to switch out hard drives while the system is running:

  • DX-Server
  • EX41(-SSD)
  • EX40 (in DC17*)
  • SX131
  • SX291

*You can find out which data center your server is located in by going to “Server overview” → “Main functions; server” in your account on the user interface Robot.

Important note

As a matter of principle, it is important to remember to remove the hard drive that you want to replace from the RAID array before you write a support request in which you ask our technicians to replace the hard drive. This will help prevent further damage to the HDD during the swap. In addition, we ask that you pay close attention to stating the correct serial number of the HDD you would like to be swapped when you write your support request.

Process

Hardware RAID

A hardware RAID controller enables our technicians to hot-swap hard drives regardless of the operating system that you are running on the above-listed servers. Currently, we can install Adaptec and LSI RAID controllers.

You can find more information about the controllers in the following articles:

To request a hard drive replacement, please fill out a support request as usual.

Below are a few examples of hot-swapping procedures.

Disclaimer: Please note that these are merely examples. The instructions and especially the commands must naturally be the proper ones for your system!

LSI Controller

This example is for a Debian 8.2 installation on a RAID-1 array with two SSDs.

  • You will need the command line tool MegaCli64. You can find it here at http://download.hetzner.de/tools/LSI/tools/MegaCLI/8.07.10_MegaCLI_Linux.zip (The RPM package can be converted to a deb package and installed using an 'alien' command.)
  • This tool allows different notation used for the parameters. These can be written with or without a hyphen and with or without paying attention to capitalization.
  • An alias can be used to simplify the process:
alias megacli='/opt/MegaRAID/MegaCli/MegaCli64'

In this example, a defective SSD is located at slot 0.

1) You can determine the state and serial number (inquiry data) of the SDD with the following example command:

megacli pdlist a0 | grep -Ei 'enclosure|slot|firmware state|inquiry'

2) If the defective hard drive is not yet in the offline state (firmware-state), set it to 'offline':

megacli pdoffline physdrv[252:0] a0

3) At this point, the SSD needs to be marked as missing...

megacli pdmarkmissing physdrv[252:0] a0

4) … and it needs to be prepared to be swapped.

megacli pdprprmv physdrv[252:0] a0

5) You should now request the swap in a support request.

6) Check the firmware state again after we have successfully swapped the SSD:

megacli pdlist a0 | grep -Ei 'enclosure|slot|firmware state|inquiry'

7) If the recovery does not start on its own, you will need to do it manually.

Adaptec Controller

This example is for a configuration with a Debian 8.2 installation on a RAID-1 array with two hard drives.

The defective drive in this example is connected to slot 0.

1) You can determine the state and serial number (inquiry data) of the HDD with the following example command:

arcconf getconfig 1 pd|egrep "Device #|State\>|Reported Location|Reported Channel|Serial|S.M.A.R.T. warnings"

2) Use the following command if the defective hard drive is not yet marked as 'failed'.

arcconf setstate 1 device 0 0 ddd

3) You should now request the swap in a support request.

4) Check the firmware state again after we have successfully swapped the HDD.

arcconf getconfig 1 pd | egrep "Device #|State\>|Reported Location|Reported Channel|Serial|S.M.A.R.T. warnings"

5) The recovery usually starts automatically. If the recovery does not start on its own, you will need to do it manually.

Software RAID

Theoretically, it is also possible to hot-swap drives on SATA controllers. The operating system that is in use should recognize the change in the connection state at the respective controller as soon as the new HDD is connected. The instructions for hot-swapping with software RAID will vary depending on your operating system and configuration. Below are a few examples:

Disclaimer: Please note that these are merely examples. The instructions and especially the commands must naturally be the proper ones for your system!

Linux

You can find information and a detailed example scenario for how to replace drives using Linux software RAID site Drivers in software RAID.

Windows

Important: It is not possible to hot-swap the start plex* in Windows. For that reason, you must boot the system from the intact plex before the defective disk will be exchanged. (*Microsoft uses the term plex/plexing to refer to mirroring; a plex is a section of a mirrored volume.)

The following example is based on a Hetzner standard installation of Windows Server 2012 R2 in UEFI mode with two hard drives and mirroring. The defective hard drive is disk 1 (the secondary plex). The system was started from the primary plex.

1) Remove the HDD from the RAID array.

Open the context menu of volume C: in the disk manager and select 'remove mirroring'.

2) Determine the serial number of the defective or intact HDD with diskid32.exe.

3) You should now request the hot-swap in a support request.

4) After the HDD has been successfully swapped, start diskpart.

5) Prepare your drives or set up your partitions in accordance with the intact HDD.

  • If the new HDD is not recognized, use:
DISKPART> rescan
  • If the defective hard drive is marked as M1 (missing), use:
DISKPART> select disk M1
DISKPART> delete disk
  • Convert the replacement HDD to a dynamic disk using GPT.
  • Set up and format an EFI partition and assign it to drive E.
  • Add the HDD to the mirroring of C and wait until the synchronization is complete.
DISKPART> select disk 1
DISKPART> convert gpt
DISKPART> create partition efi size=200
DISKPART> format fs=fat32 quick
DISKPART> assign letter=e
DISKPART> convert dynamic
DISKPART> select volume c
DISKPART> add disk 1 wait
  • Assign the EFI partition of the intact HDD to the letter x.
DISKPART> select disk 0
DISKPART> select part 1
DISKPART> assign letter=x
DISKPART> exit

6) EFI partitions and boot manager

In the following example, these drive letters are assigned to the EFI partitions:
x: already existing EFI partition
e: newly created EFI partition on the replacement hard drive

  • Next we recommend that you save the system BCD memory (here in the file BCD_backup in the current directory) so that you can make changes later, or if needed, to undo changes that you have made by using bcdedit /import.
bcdedit /export BCD_backup
  • Now you should copy the EFI partition recursively, but while doing this, skip over the BCD memory and the 'system volume information' directory.
robocopy x:\ e:\ * /e /copyall /dcopy:t /xf BCD.* /xd "System Volume Information"
  • Now you can import the system BCD memory with bcdedit onto the replacement hard drive.
bcdedit /export e:\EFI\Microsoft\Boot\BCD

Now you can start both of the two plexes by using either one of the boot managers.

In certain circumstances it will be necessary to make some additional adjustments to the BCD memory (for example, if there is still an orphaned start menu entry). You can find further information (on Windows Server 2012) online and in this documentation:

http://download.microsoft.com/download/6/E/E/6EE26977-FAA0-41CC-8BDA-7A0C5E6EB9CC/Configuring%20Disk%20Mirroring%20for%20Windows%20Server%202012.docx

FreeBSD

gmirror + UFS

This example is for a configuration with a FreeBSD installation with UFS and gmirror on the following arrays:

/dev/mirror/boot (ada0p1 + ada1p1)

/dev/mirror/swap (ada0p2 + ada1p2)

/dev/mirror/root (ada0p3 + ada1p3)

The defective HDD is ada1.

1) Remove the defective HDD from the RAID array.

  • Check the HDD's state.
# gmirror status
  • If necessary deactivate the defective partition HDD.
# gmirror deactivate boot ada1p1
# gmirror deactivate swap ada1p2
# gmirror deactivate root ada1p3
  • Mark the defective HDD as “forgotten”.
# gmirror forget boot
# gmirror forget swap
# gmirror forget root

2) Determine the serial number of the defective HDD.

  • Do this, for example, by using smartctl from the smartmontools package.
# smartctl -a /dev/ada1 |grep -i serial
  • Or you can use camcontrol:
# camcontrol identify /dev/ada1 |grep -i serial

3) You should now request the hot-swap in a support request.

4) After the HDD has been successfully swapped, copy the partition tables from ada0 to ada1.

# gpart backup ada0 | gpart restore ada1

PLEASE NOTE: There seems to be a bug in FreeBSD 11 which prevents the system from booting from the new disk after restoring the partition table. If you are facing issues please have a look at this FreeBSD forum thread: [1]

5) Add the partitions of the replacement HDD to gmirror.

# gmirror insert boot ada1p1
# gmirror insert swap ada1p2
# gmirror insert root ada1p3

6) Install bootcode on the replacement HDD.

# gpart bootcode -b /boot/pmbr -p /boot/gptboot -i 1 ada1

ZFS

This example is for a configuration with a FreeBSD 10.2 installation with ZFS on the following arrays:

/dev/mirror/boot (ada0p1 + ada1p1)

/dev/mirror/swap (ada0p2 + ada1p2)

ZFS pool zroot with mirroring via gpt/root0 (GPT label for ada0p3) and gpt/root1 (GPT label for ada1p3)

The defective HDD is ada0.

(The two gmirror mirrors boot and swap are handled using the same approach as above.)

1) If ZFS is used for mirroring, you should also check the state of the mirror before you request the hot-swap. You may also need to check the relevant partition (in this example, gpt/root0) to see if it has been set to the offline state.

# zpool status
 pool: zroot
state: ONLINE
 scan: none requested
config:
       NAME           STATE     READ WRITE CKSUM
       zroot          ONLINE       0     0     0
         mirror-0     ONLINE       0     0     0
           gpt/root0  ONLINE       0     0     0
           gpt/root1  ONLINE       0     0     0
# zpool offline zroot gpt/root0
# zpool status
 pool: zroot
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
       Sufficient replicas exist for the pool to continue functioning in a
       degraded state.
action: Online the device using 'zpool online' or replace the device with
       'zpool replace'.
 scan: none requested
config:
       NAME                     STATE     READ WRITE CKSUM
       zroot                    DEGRADED     0     0     0
         mirror-0               DEGRADED     0     0     0
           8894732708877724737  OFFLINE      0     0     0  was /dev/gpt/root0
           gpt/root1            ONLINE       0     0     0

# gmirror deactivate boot ada0p1
# gmirror deactivate swap ada0p2
# gmirror forget boot
# gmirror forget swap

2) If GPT labels are used as in this example, you can use gpart to determine which label belongs to which hard drive.

# gpart list | grep -Ei 'geom|label'
Geom name: ada0
label: boot0
label: swap0
label: root0
Geom name: ada1
label: boot1
label: swap1
label: root1

3) Now determine the serial number of the defective HDD.

  • You can use smartctl from the smartmontools package to do this, for example:
# smartctl -a /dev/ada0 |grep -i serial
  • Or you can use camcontrol:
# camcontrol identify /dev/ada0 |grep -i serial

4) You should now request the hot-swap in a support request; make sure to provide us with the correct serial number of the hard drive that you would like to be replaced. After we have successfully swapped the HDD, the partition tables need to be migrated via gpart; the gmirror-mirroring needs to be repaired and the boot code needs to be installed.

# gpart backup ada1 | gpart restore ada0
# gmirror insert boot ada0p1
# gmirror insert swap ada0p2
# gpart bootcode -b /boot/pmbr -p /boot/gptzfsboot -i 1 ada0

5) Then you should modify the GPT label of the replacement hard drive's ZFS partition (in this case, the third one, which is ada0p3). In this example, gpt/root0:

# gpart modify -i 3 -l root0 ada0

6) Now you can replace the failed part of the mirror:

# zpool replace zroot gpt/root0
# zpool status -x
all pools are healthy

You can find more detailed information on configurations and administration of the ZFS file system in this documentation from Oracle: http://docs.oracle.com/cd/E19253-01/819-5461/



© 2018. Hetzner Online GmbH. Alle Rechte vorbehalten.