Linux Hardware Monitoring

2. Smart Status Page 2

Review Pages

1. Introduction
2. Smart Status Page 2
3. Sensor/GPU Monitoring

- SMART status Page 2

The information will depend on your exact hard drive model and manufacturer, but certain items are certainly present. Of particular note are the items underlined in light blue, but any error should be generally considered suspicious. The power-on time is a good measure of the operating lifetime of your hard drive: expect most drives to fail before 3000 hours (usually within warranty, due to bad construction) and after 20000 hours (old age), which translates to approximately 2 months and a little over 2 years continuous use, respectively. Temperature is generally not very important, unless extreme values are noted. A good operating temperature for most drives is between 10°-40° C. In the example shown above, a 12cm fan blowing directly on the drive cools it to 15° which is OK.

Useful commands are:

View info for drive /dev/sda. Note that the switch -d ata may be needed if your drive is SATA, but is not required for ATA drives.

smartctl -d ata -a /dev/sda

Enable SMART:

smartctl -d ata -s on /dev/sda

Run a short offline test (use smartctl -a, as shown above, to view the results) :

smartctl -d ata -t short /dev/sda

Run a long offline test:

smartctl -d ata -t long /dev/sda

Admittedly, the most interesting part of the smartmontools suite is the ability to run SMART monitoring as a unix deamon. The application is named “smartd” and can be started during the boot process by your initialization scripts (check your distribution for how to do this---may already be started). The configuration of smartd is very simple and requires a single configuration file named “smartd.conf”, which usually resides in /etc/smartd.conf.

Here is an example of a very, very simple smartd.conf file:

# Sample configuration file for smartd. See man smartd.conf.
# Home page is: http://smartmontools.sourceforge.net
# $Id: smartd.conf,v 1.33 2004/01/13 16:53:06 ballen4705 Exp $
# smartd will re-read the configuration file if it receives a HUP
# signal
# The file gives a list of devices to monitor using smartd, with one
# device per line. Text after a hash (#) is ignored, and you may use
# spaces and tabs for white space. You may use '\' to continue lines.
# You can usually identify which hard disks are on your system by
# looking in /proc/ide and in /proc/scsi.
# The word DEVICESCAN will cause any remaining lines in this
# configuration file to be ignored: it tells smartd to scan for all
# ATA and SCSI devices. DEVICESCAN may be followed by any of the
# Directives listed below, which will be applied to all devices that
# are found. Most users should comment out DEVICESCAN and explicitly
# list the devices that they wish to monitor.
DEVICESCAN

The only line of interest is the last one (all previous lines a re comments), which tells smartd to scan for SMART devices and enable monitoring whenever possible. This option does not run any tests.

We can do better than that with a custom smartd.conf file, as follows (file contains only these two lines):

/dev/sda -d ata -s L/../../3/06
/dev/sdb -d ata -s L/../../7/06

This instructs smartd that we have two SMART devices, /dev/sda and /dev/sdb which must be monitored. The “-d ata” switch may be necessary for SATA devices. The last argument “-s L/../../3/18” requests a background hard disk test whenever the expression matches the date. The format is “L” or “S” for a long or short test, respectively and “/MM/DD/d/HH” (month, day of the month, day of the week, 24-hour) for the pattern to be followed. In this case, /dev/sda is scanned on the third day of the week (Wednesday) at 06am, every week, while /dev/sdb is scanned on Sundays, again at 06am.

Everyday short tests can be enabled in addition to weekly long tests. This example runs a short test on drive /dev/sda at 02:00am every day AND a long test every Saturday 03:00am.

/dev/sda -d ata -s (S/../.././02|L/../../6/03)

You can also ask for an email warning if the SMART status reports an error. Here is an example which instructs smartd to send an email in case of any error.

/dev/sda -d ata -s L/../../6/03 -l error -m l33t_hax0r@mymail.net

The output in your log file (syslog) will be rather verbose, but should not cause any worries. Here is an excerpt from my /var/log/messages file:

Apr 1 20:32:16 hagakure smartd[4430]: Device: /dev/sda, SMART Usage Attribute: 194 Temperature_Celsius changed from 193 to 199
Apr 1 21:02:16 hagakure smartd[4430]: Device: /dev/sdb, SMART Usage Attribute: 190 Unknown_Attribute changed from 70 to 71
Apr 1 21:02:16 hagakure smartd[4430]: Device: /dev/sdb, SMART Usage Attribute: 194 Temperature_Celsius changed from 30 to 29

Note that /dev/sda and /dev/sdb report temperatures differently. Smartd also detected a change in an unknown attribute.