Conversation

Ah yes, disk "S.M.A.R.T"

# smartctl -x /dev/sdd
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-32-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
...
Local Time is:    Mon Dec  8 15:24:45 2025 GMT
...

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

=====
Left it for a bit, and
=====

# smartctl -x /dev/sdd
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-32-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
...
Local Time is:    Mon Dec  8 23:38:57 2025 GMT
...

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

2
1
0

Hard drives have this magic (and very annoying) ability to go from

"I am doomed, dead, gonzo, curtains for me, rip, F in the chat"

to

"actually it turns out I had some spare sectors behind the sofa I forgot about, nvm, I am good now"

For this reason, there is no spinning disk that isn't entirely 100% FDE'd because I _know_ the moment that I take it out of the machine it will somehow magically fix itself.

At least when SSDs go they normally go out with some "supernova event" of either:

hanging,
sending weird ATA replies,
replying with random pages of memory rather than data,
falling off the bus and returning back as a new vendor with a capacity of 4MB

but at least at that point they don't come back to life!

3
1
0

@benjojo ... may I suggest buying more Samsung SSDs and less KingDian, Transcend, etc.? 😹

1
0
0

@manawyrm I have seen all of the above failures on even the fancy enterprise drives! The cheap ones do end up doing the "have some of my RAM instead" trick more often, but the rest is seemingly all up for grabs as far as failure modes go

The old Intel enterprise SSDs loved to do the "I am now a 4MB drive" trick

2
0
0

@benjojo Don't forget the latest trick up there sleeve, if you just let me turn off one of my heads i lose 1/20th capacity but i am fine again :D

1
0
0

@evey it's a good party trick as long as you are cool with the idea of rebuilding everything on the drive (I assume this is only really useful for ceph and similar workloads)

0
0
0

@benjojo I saw a USB stick that forgot its USB id from laying unused in a warm place for a few years

but after a few plugs and unplugs it came back to life....

1
0
0

@wolf480pl to be fair the average USB NAND flash is so bottom of the barrel that anything is seemingly possible.

I have/had a USB drive on my desk for a couple of years, only to find that when I tried to read back a mp4 video on it, the video was totally stuffed... "cool".

1
0
0

@benjojo @manawyrm
when they offer you their RAM instead, does that allow you overwrite their firmware? 🤔

1
0
0

@wolf480pl @manawyrm I think that was what the 4MB "drive" was for, it had forgotten it's own firmware and was asking for it back, and I guess when you are a drive everything looks like a ATA interface

0
0
0

@benjojo right, but if these can come back, I wouldn't rule out SSDs coming back too

0
0
0
@benjojo @manawyrm is the 4MB drive actually usable for storage?
1
0
0

@noisytoot @manawyrm not really, I think you need to write with some magic ATA writes to actually write the firmware back on, I suspect it just reports a size to make HBA's happy

0
0
1

@benjojo I've seen SMART health tests fail so rarely I immediately assume imminent and catastrophic failure, whether it changes its mind or not. It is one step down from a printer making a weird noise and we all know what we do when that happens.

1
0
0

@_aD It's a slightly different set of problems when losing a disk is fine because ceph will just shuffle stuff around for you in no time. So I am inclined to let it fail, or just fix itself

0
0
0