Hetzner - DokuWiki

Seriennummern von Festplatten und Hinweise zu defekten Festplatten/en

Inhaltsverzeichnis

How to determine the serial number of a drive

In Windows

There is a tool in Windows which enables you to read the serial numbers of one or several drives. The program is called "DiskID32" and is Open Source.

The tool can be downloaded here.

First, open the program directory with the console and enter the following command:

diskid32.exe

You will receive more information about your drive(s).
The serial number(s) for the drive(s) selected are to be found in "Drive Serial Number".

Example:

Windows Tool


In Linux

There are two solutions for determining the serial number, the first one with udevadm:

 /sbin/udevadm info --query=property --name=sda | grep ID_SERIAL

and the second solution with hdparm:

Open your terminal and enter the following command:

sudo hdparm -i /dev/sda | grep SerialNo

With "sudo" you receive the administrator rights which you need for reading information from the drive. A function is called up via "hdparm" which provides you with information about the drive.

With "-i /dev/sda" you assign "hdparm" a parameter which will provide you with concrete information about the drive.

However, the hardware which is to be checked can vary from the interface:

  • IDE / ATA device: "-i /dev/hd[a-t]"
  • SCSI / SATA device:"-i /dev/sd[a-z]"

Next, filter the serial number of the drive from the output using " | grep SerialNo".

If this command returns an error, the program itself will probably need to be installed:

sudo apt-get install hdparm

Example:

Linux drive serial number

In FreeBSD

The following command can be used in FreeBSD:

smartctl -a /dev/ad0 | grep Serial

"smartctl" is a function which enables you to read drive information.

With the "-a" parameter, all available information for the first assigned drive is displayed.

The various interfaces for devices also apply here:

  • IDE / ATA device: "-a /dev/ad[0-9]+"
  • SCSI device: "-a /dev/da[0-9]+" OR "-a /dev/pass[0-9]+"
  • SATA device: "-a /dev/ada[0-9]+"

With " | grep Serial" you are able to filter the serial number of the drive from the information.

Example:

FreeBSD drive serial number


Information on defective drives

To detect damaged drives, it is advisable to use a tool for recognizing such errors.

We use Smartmontools on Windows, Linux and FreeBSD.

In Windows

If you have not yet installed Smartmontools on your Windows Server 2008, the latest version can be downloaded here.

ATTENTION: During the setup you must put a check next to "PATH variable". Alternatively, you can set the path for this program in the environment variables.

Once the tool has been installed successfuly, the command prompt can be opened.

Enter the following command to check whether the program is suitable:

smartctl -h

A list of commands should appear in the command prompt which can be assigned as "smartctl".

As "smartctl" behaves in exactly the same way as it does in Linux, the same commands can be used in Windows. Similarly, the path name for the types of partition remains the same.

Therefore, you will need to use the same parameters in Windows as those for Linux.

Be aware of the different interfaces for devices:

  • IDE / ATA devices: "-H /dev/hd[a-t]"
  • SCSI / SATA devices: "-H /dev/sd[a-z]"

WARNING: Normal drive names such as "c:", do not work!

In Linux

With Linux systems, messages from the kernel as well as Smartmontools provide information about a defective drive.

Kernel messages can be displayed by "dmesg". Here it is particularly important to pay attention to messages beginning with "ata". The "dmesg | grep ata" command could be used here, for example.

First, the Smartmontools need to be installed on Linux Systems. The packet ID will differ in the systems:

  • Debian: "smartmontools"
  • Fedora: "kernel-utils"

Administrator rights are required for the installation.

Next, you can continue work in the console.

(Please note that administrator rights are needed for the entire process)

Enter the following command in the console:

smartctl -H /dev/sda

Be aware of the different interfaces for devices:

  • IDE / ATA devices: "-H /dev/hd[a-t]"
  • SCSI / SATA devices: "-H /dev/sd[a-z]"

This instruction queries the condition of your drive. You will receive a message informing you if the drive can continue to be used.

If "FAILED!" is shown, then something is wrong with your drive.

If "PASSED" appears, then your drive is OK.

If you would like a more exact result for your drive, you can use the added chart. This can be found by using the same command under "Failed Attributes:".

An explanation of the attributes can be found in the section "Measured Values for Attributes".

If you now evaluate your chart using the measured values, you will receive an overview of the errors on your drive.

If you would like to detect all possible errors on your drive, you can use the command "smartctl -A /dev/sda".

(BE AWARE of your device types here)

This time all available errors are shown. You can now evaluate the chart.

An explanation of the attributes can be found in the section "List of Attributes".

In FreeBSD

It is worth taking a look at the kernel messages (dmesg) in FreeBSD to find out more about any drive defects.

As with Windows and Linux, you need to install Smartmontools in the package management.

The following command can be used for this:

pkg_add -r -v smartmontools

(Please note that administrator rights are required for the entire process)

As with Linux, you can use the same "smartctl" commands in FreeBSD.
However, there is one small difference for the path name of the drive.

Instead of the usual path details e.g: "/dev/hd[a-t]" and "/dev/sd[a-z]",
use the new path: "/dev/ad[0-9]+".

List of Attributes

Parameter Name

Description

Raw Read Error Rate

Critical. A lower value points to uncorrectable read errors to do with the drive surface or with the magentic heads.

Throughput Performance

Critical. A general indicator of throughput performance. Lower values show that the disk is no longer able to work at full speed.

Spin Up Time

Average period of time taken by the drive to accelerate its disks. Poor values may point to problems with storage which often come from being stored at too high a temperature.

Start/Stop Count

Non-critical. Counts the number of start/stop cycles for the drive.

Reallocated Sector Count

Very critical. Counts how many reserve sectors have been allocated by the hard drive. Points to media problems.

Read Channel Margin

Indicates how much bandwidth is used on average for read operations. The exact description is not documented.

Seek Error Rate

Critical. Counts the frequency of errors during read operations, which depend on the condition of the positioning system or the surface.

Seek Time Performance

General value, which describes the performance of the seek operation of the magnetic heads. Lower values point to mechanical problems

Power On Hours Count

Counts the number of hours in power-on state. The format mostly depends on the manufacturer.

Spin Retry Count

Critical. Indicates the number of attempts needed to start so that the disk can reach its fully operational speed.

Recalibration Retries

Critical. Counts how often the disk needs to recalibrate the read/write heads. Points to mechanical malfunction.

Device Power Cycle Count

Shows statistics for how often the hard drive is switched on and off.

Soft Read Error Rate

Indicates how often the operating system has reported an error when reading data from a disk.

G-Sense Error Rate

G-Sense stands for Shock-Sensor, which measures strong vibrations during operation.

Power-Off Retract Cycle

Ultimately, shows a count of how often the hard disk was powered down.

Load/Unload Cycle Count

Indicates how often the disk has put its read/write heads into landing zone position.

Temperature

Specifies the temperature for the drive. Not important, as the values are usually very inaccurate for most devices.

Reallocation Events Count

Very critical. Counts every attempt by the disk to remap sectors even if this does not succeed.

Current Pending Sector Count

Very critical. Shows the number of unstable sectors which are waiting to be moved to a special reserved area.

Uncorrectable Sector Count

Very critical. The number of defective sectors which the internal logical drive cannot restore and move to the reserved area.

UltraDMA CRC Error Rate

Critical. The number of CRC errors during data transfer. May point to defective cables, driver conflicts or to overclocking problems.

Write Error Rate

Critical. Counts the frequency of errors on writing sectors.

Disk Shift

Very critical. This value shows whether an imbalance has occurred due to problems with temperature or the effects of shock.

Loaded Hours

Indicates how long the disk has spent under data load. This is shown by the movement of the magnetic heads actuator.

Load/Unload Retry Count

Undocumented unit count for the number of loading retries when the read/write heads change position.

Load Friction

Shows a statistical value, for the level of friction caused by loading on drive.

Load-in Time

Indicates how long the magnetic heads actuator was not in the landing zone position.

Torque Amplification Count

Counts the number of attempts by the internal logic of the drive to bring the rotation into line.

GMR Head Amplitude

A purely statistical value describing the distance of repetitive forward/reverse motion covered by the read/write heads.

(Source: http://en.wikipedia.org/wiki/S.M.A.R.T.)

Measured Values for Attributes

VALUE is a normalized measured value, which mostly counts backwards (the lower, the worst)
WORST the worst value up to now.
THRESHOLD the limit below which the value should not drop.
TYPE stands for the definition of the parameter: "Pre-fail" is a warning of failure soon, while "Old age" means that it is generally a matter of progressive aging. (The current temperature does not necessarily fall into one or the other categories).
UPDATED shows whether the value is permanently (always) updated or if it is updated first through a self test of the "Offline data collection" type.
RAW_VALUE is the actual measured value, ie. the measured temperature or the error count.

(Source: http://en.wikipedia.org/wiki/S.M.A.R.T.)

Creating a complete SMART log

To create a complete SMART log use the command smartctl with the option "-x". The specification of the drive is similar to the explanation under "Information on defective drives"

Start a SMART Self Test

The self test of the drive can be started with smartctl and the option "-t short" or "-t long". The specification of the drive is similar to the explanation under "Information on defective drives"

This self test is a manufacturer-specific test, which is performed from the drive firmware. The server should not be used during the test, as this could stop the test.

Drive test with hardware check

In the Rescue System you can use the tool hwcheck to do a check of the drives to the standards Hetzner perform. There is a short test (choice G), which takes approximately 15 - 30 minutes and a long test (choice 1), which can last more than 6 hours. While carrying out the tests, no other actions should be performed on the server. RAID controllers are automatically detected in the test.

Functioning of the hardware checks

The test begins with a reading test on each drive. Approximately 100GB of data is read from each drive. It checks whether there are read errors. The data itself is not investigated and is discarded immediately.

If the reading test is completed then the SMART values ​​of the drives are read. These are reviewed and examined for abnormalities that could indicate a defect.

When the SMART values have been checked the long or the short test is started, depending on the choice made.

In a short test only parts of the drive are investigated. This shortens the duration of the test.
The long test checks all sectors of the drive multiple times.

What tests are carried out in detail in a self test are manufacturer dependent and are usually not known exactly.

If the self test is complete, the result is checked again and the SMART values are again examined.

Finally, the obtained results are checked and a log file with the final result created for each drive (hddtest-[serial number].log). These can be found under /root/hwcheck-logs/.

Drives with RAID controllers

In Windows

Adaptec has developed an administration tool for extracting drive information from a RAID system. This program is structured graphically and called "Adaptec Storage Manager". It can be downloaded here. For this, please use the user data contained in your confirmation email.

Install and start the program on your server. A graphical user interface will appear.

Next, right click in the box on "Direct Attached Storage". At right, look in the drop-down menu and double click on the operating system installed with the corresponding IP and system. A request for login data should follow. After you have logged in, a message will appear stating that a RAID controller has been found. Confirm with "Register Later" and then double click on RAID controller.

A list of all the drives contained in the RAID controller will appear. Double click on your selected drive. A window opens where you can now retrieve the drive information.

In Linux

smartctl usually shows the serial number for the drive, however there are special programs for various controllers:

  • for 3ware Controllers with tw_cli

tw_cli should be contained in the packet source of the distributions. Start tw_cli without specifying parameters:

tw_cli

Call up ensues as follows:

/cx/py show serial

x stands for the controller number, with O y standing for the number of the drive.

  • for Adaptec Controller with arcconf

In Linux you will need a tool called "arcconf". This program can be downloaded here.

Extract the file. Then move it:

mv arcconf-64 /usr/local/bin

Next, turn the "arcconf" into an executable file:

chmod +x /usr/local/bin/arcconf

Now, execute the file:

/usr/local/bin/arcconf

It is possible that you will need to install "libstdc++5" as well, as "arcconfig" requires this packet. Should this be the case, you can download the file here.

If a list of possible command parameters appears, the program is working correctly.

Now you can read the serial number of the drive using the command "./usr/local/bin/arcconf getconfig 1". The digit "1" , indicates which RAID controller is concerned.

However, please note that your drives in RAID will first be listed from "Physical Device information" . The serial number for your drives can be found in "Serial number".

In addition to the serial numbers, this tool provides further useful information regarding your drives.

In FreeBSD

To obtain the drive serial number in FreeBSD, you need to enter the following command in the terminal:

portsnap fetch update
cd /usr/ports/sysutils/arcconf
make install clean && rehash

The command "/usr/local/sbin/arcconf getconfig 1" enables you to access the drives. Please be mindful here that the digit following "getconfig" indicates the RAID controller.

As with Linux, various information about the drive and the "Serial number", can be found in "Physical Device information".



© 2019. Hetzner Online GmbH. Alle Rechte vorbehalten.