By Ionut Ilascu March 24, 2020 06:26 PM 1
Hewlett Packard Enterprise (HPE) is once again warning its customers that certain Serial-Attached SCSI solid-state drives will fail after 40,000 hours of operation, unless a critical patch is applied.
The company made a similar announcement in November 2019, when firmware defect produced failure after 32,768 hours of running.
Affected drives
The current issue affects drives in HPE server and Storage products like HPE ProLiant, Synergy, Apollo 4200, Synergy Storage Modules, D3000 Storage Enclosure, StoreEasy 1000 Storage.
HPE Model Number HPE SKU HPE SKU DESCRIPTION HPE Spare Part SKU HPE Firmware Fix Date
EK0800JVYPN 846430-B21 HPE 800GB 12G SAS WI-1 SFF SC SSD 846622-001 3/20/2020
EO1600JVYPP 846432-B21 HPE 1.6TB 12G SAS WI-1 SFF SC SSD 846623-001 3/20/2020
MK0800JVYPQ 846432-B21 HPE 800GB 12G SAS MU-1 SFF SC SSD 846624-001 3/20/2020
MO1600JVYPR 846436-B21 HPE 1.6TB 12G SAS MU-1 SFF SC SSD 846625-001 3/20/2020
The company says that this is a comprehensive list of impacted SSDs it makes available. However, the issue is not unique to HPE and may be present in drives from other manufacturers.
If the SSD in these products runs a firmware version older than HPD7, they will fail after being powered on for 40,000 hours; this translates into 4 years, 206 days, 16 hours and it is about half a year shorter than the extended warranty available for some of them.
When the failure point is reached, neither the data nor the drive can be recovered. Preventing such a disaster is possible in environments with data backup setups.
HPE learned about the firmware bug from a SSD manufacturer and warns that if SSDs were installed and put into service at the same time they are likely to fail almost concurrently.
“Restoration of data from backup will be required in non-fault tolerance modes (e.g., RAID 0) and in fault tolerance RAID mode if more drives fail than what is supported by the fault tolerance RAID mode logical drive [e.g. RAID 5 logical drive with two failed SSDs]” - HPE advisory
The new firmware can be installed by using the online flash component for VMware ESXi, Windows, and Linux.
Not as bad as last time
There is some good news, though. By checking the shipping dates from HPE and considering the 40,000 hours expiration limit, no affected SSD have failed because of this firmware bug.
HPE estimates that unpatched SSDs will begin to fail as early as October 2020. This gives plenty of time for admins to apply the corrected firmware.
Back in November, reports about storage drive failure came pouring on social media and forums, with ussers complaing about device collapsing in bulk, minutes apart.
Finding out the uptime of an affected drive is possible with the Smart Storage Administrator (SSA) utility, which offers the power-on time for every drive installed on the system.
Alternatively, users can run scripts that can check if the firmware on their SSDs has the 40,000 power-on-hours failure issue. The scripts work for certain HPE SAS SSDs and are available for Linux, VMware and Windows.
JohnC_21 - 4 hours ago
The company said in a bulletin that the “issue is not unique to HPE and potentially affects all customers that purchased these drives.” HPE has not identified the SSD maker and refused to do so, saying: “We’re not confirming manufacturers.”
However, a Dell EMC urgent firmware update issued last month also mentioned SSDs failing after 40,000 operating hours and specifically identified SanDisk SAS drives. The update included firmware version D417 as a fix.
The fault fixed by the Dell EMC firmware concerns an Assert function which had a bad check to validate the value of a circular buffer’s index value. Instead of checking the maximum value as N, it checked for N-1. The fix corrects the assert check to use the maximum value as N.
It seems likely that the HPE drives are SanDisk drives as well.
https://blocksandfiles.com/2020/03/24/hpe-enterprise-ssd-40k-hours-flaw/
However, a Dell EMC urgent firmware update issued last month also mentioned SSDs failing after 40,000 operating hours and specifically identified SanDisk SAS drives. The update included firmware version D417 as a fix.
The fault fixed by the Dell EMC firmware concerns an Assert function which had a bad check to validate the value of a circular buffer’s index value. Instead of checking the maximum value as N, it checked for N-1. The fix corrects the assert check to use the maximum value as N.
It seems likely that the HPE drives are SanDisk drives as well.
https://blocksandfiles.com/2020/03/24/hpe-enterprise-ssd-40k-hours-flaw/
No comments:
Post a Comment