HPE has a fix out now for an issue they have been working through with customers that impacts four SSD models that can fail at 40,000 hours. A few months ago, HPE found that some SAS SSDs it utilized would fail at 32,768 hours of operation or 3 years, 270 days 8 hours exactly. Here is a link to that bulletin. It turns out, this is a different issue where drives will fail at 40,000 hours or 4 years, 206 days, 16 hours. Here is the new HPE bulletin.
HPE SSDs Failing at 40,000 Hours
There are four SSDs that will fail at 40,000 hours which are 800GB and 1.6TB SAS SSDs. Here is the official list:
PE Model Number
|
HPE SKU
|
HPE SKU DESCRIPTION
|
HPE Spare Part SKU
|
HPE Firmware Fix Date
|
EK0800JVYPN
|
846430-B21
|
HPE 800GB 12G SAS WI-1 SFF SC SSD
|
846622-001
|
3/20/2020
|
EO1600JVYPP
|
846432-B21
|
HPE 1.6TB 12G SAS WI-1 SFF SC SSD
|
846623-001
|
3/20/2020
|
MK0800JVYPQ
|
846434-B21
|
HPE 800GB 12G SAS MU-1 SFF SC SSD
|
846624-001
|
3/20/2020
|
MO1600JVYPR
|
846436-B21
|
HPE 1.6TB 12G SAS MU-1 SFF SC SSD
|
846625-001
|
3/20/2020
|
HPE says that the drives need to be updated with firmware HPD7 or later. You can see the drive in our cover image is HPD6 which is still vulnerable.
The hardest part about these failures is that the death timer starts at the time of power on. As a result, if you have an array of these drives that go from 39,999 hours to 40,000 hours, you will see the entire array fail at the same time. Even if you mixed EO16000JVYPP and MO1600JVYPR drives in an array, and they were all powered on and ran together, this could again lead to catastrophic failure.
HPE says that “this issue is not unique to HPE and potentially affects other manufacturers and customers that purchased these drives.” They also say that “HPE was notified by a Solid State Drive (SSD) manufacturer of a firmware defect affecting certain SAS SSD models.” (Source: HPE) As a result, this could, in theory, impact the same SSD models sold under different brands. While HPE is being forthcoming and has firmware updates and mitigations 6-7 months beforehand, but the fact we have not heard from other vendors is concerning.
HPE says that given the shipping dates of the drives, no drives should fail until October 2020 at the earliest so there is time to patch. HPE also has the HPD7 firmware and tools to detect and update drives in-place. It says most updates should be able to happen without requiring a reboot and will require minimal I/O. It certainly sounds like updating is a lot less scary than seeing an entire array fail.
HPE SSDs Failing at 32,768 Hours
Just to keep a complete list going, here are the drives that were impacted by the 32,768-hour firmware bug:
HPE Model Number
|
HPE SKU
|
HPE SKU DESCRIPTION
|
HPE Spare Part SKU
|
HPE Firmware Fix Date
|
VO0480JFDGT
|
816562-B21
|
HP 480GB 12Gb SAS 2.5 RI PLP SC SSD
|
817047-001
|
11/22/2019
|
VO0960JFDGU
|
816568-B21
|
HP 960GB 12Gb SAS 2.5 RI PLP SC SSD
|
817049-001
|
11/22/2019
|
VO1920JFDGV
|
816572-B21
|
HP 1.92TB 12Gb SAS 2.5 RI PLP SC SSD
|
817051-001
|
11/22/2019
|
VO3840JFDHA
|
816576-B21
|
HP 3.84TB 12Gb SAS 2.5 RI PLP SC SSD
|
817053-001
|
11/22/2019
|
MO0400JFFCF
|
822555-B21
|
HP 400GB 12Gb SAS 2.5 MU PLP SC SSD S2
|
822784-001
|
11/22/2019
|
MO0800JFFCH
|
822559-B21
|
HP 800GB 12Gb SAS 2.5 MU PLP SC SSD S2
|
822786-001
|
11/22/2019
|
MO1600JFFCK
|
822563-B21
|
HP 1.6TB 12Gb SAS 2.5 MU PLP SC SSD S2
|
822788-001
|
11/22/2019
|
MO3200JFFCL
|
822567-B21
|
HP 3.2TB 12Gb SAS 2.5 MU PLP SC SSD S2
|
822790-001
|
11/22/2019
|
VO000480JWDAR
|
875311-B21
|
HPE 480GB SAS SFF RI SC DS SSD
|
875681-001
|
12/9/2019
|
VO000960JWDAT
|
875313-B21
|
HPE 960GB SAS SFF RI SC DS SSD
|
875682-001
|
12/9/2019
|
VO001920JWDAU
|
875326-B21
|
HPE1.92TB SAS RI SFF SC DS SSD
|
875684-001
|
12/9/2019
|
VO003840JWDAV
|
875330-B21
|
HPE 3.84TB SAS RI SFF SC DS SSD
|
875686-001
|
12/9/2019
|
VO007680JWCNK
|
870144-B21
|
HPE 7.68TB SAS 12G RI SFF SC DS SSD
|
870460-001
|
12/9/2019
|
VO015300JWCNL
|
870148-B21
|
HPE 15.3TB SAS 12G RI SFF SC DS SSD
|
870462-001
|
12/9/2019
|
VK000960JWSSQ
|
P06584-B21
|
HPE 960GB SAS RI SFF SC DS SSD
|
P08608-001
|
12/9/2019
|
VK001920JWSSR
|
P06586-B21
|
HPE 1.92TB SAS RI SFF SC DS SSD
|
P08609-001
|
12/9/2019
|
VK003840JWSST
|
P06588-B21
|
HPE 3.84TB SAS RI SFF SC DS SSD
|
P08610-001
|
12/9/2019
|
VK003840JWSST
|
P11329-B21
|
HPE 3.84TB SAS RI LFF SCC DS SPL SSD
|
P11360-001
|
12/9/2019
|
VK007680JWSSU
|
P06590-B21
|
HPE 7.68TB SAS RI SFF SC DS SSD
|
P08611-001
|
12/9/2019
|
VO015300JWSSV
|
P06592-B21
|
HPE 15.3TB SAS RI SFF SC DS SSD
|
P08612-001
|
12/9/2019
|
Although these types of firmware issues are very dangerous, we have to give credit to HPE for proactively getting the word and fixes out to minimize impacts.
Dell had the exact same issues. Those are Sandisk SSD’s.
Are these Sandisk too?
We got affected after missing a few servers we thought had several weeks left. Three of them died on a Friday :-(
So, what’s the next expiration date for those drives? Also I’m wondering whether or not one could reset the counter by just successively pulling the drives? Once is never, twice is always. I wouldn’t trust those disks going forward.
@weust,
your suspicion is correct at least for the SSD in the photo. The supplier part number SDLTOCKM-… visible on the SSD label tells in no uncertain terms that it is a Sandisk/WD product. (From the horses mouth: https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/collateral/cert/rohs/SDLTOCKM-XXXX%20SanDisk%20RoHS%20DoC.pdf)
I love to use good quality desktop ssd parts on servers to avoid this scenarios. I can’t remember the amount of times a “certified” part has given me more headaches than regular decent joe sixpack piece of hardware. For this case, not a single Intel SSD 400 series on small production servers has failed me, and so with sandforce2 based sata ssd drives (already phased out most of them).