Ah yes, disk "S.M.A.R.T" =====
# smartctl -x /dev/sdd
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-32-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
...
Local Time is: Mon Dec 8 15:24:45 2025 GMT
...
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.
Left it for a bit, and
=====# smartctl -x /dev/sdd
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.1.0-32-amd64] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
...
Local Time is: Mon Dec 8 23:38:57 2025 GMT
...
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
_aD@hachyderm.io
replied 09 Dec 2025 03:17 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/Sc7G9F6n94C5kVWVK7
@benjojo I've seen SMART health tests fail so rarely I immediately assume imminent and catastrophic failure, whether it changes its mind or not. It is one step down from a printer making a weird noise and we all know what we do when that happens.
benjojo
replied 09 Dec 2025 12:26 +0000
in reply to: https://hachyderm.io/users/_aD/statuses/115687438792672890
@_aD It's a slightly different set of problems when losing a disk is fine because ceph will just shuffle stuff around for you in no time. So I am inclined to let it fail, or just fix itself
benjojo
replied 08 Dec 2025 23:51 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/Sc7G9F6n94C5kVWVK7
Hard drives have this magic (and very annoying) ability to go from "I am doomed, dead, gonzo, curtains for me, rip, F in the chat" to "actually it turns out I had some spare sectors behind the sofa I forgot about, nvm, I am good now" For this reason, there is no spinning disk that isn't entirely 100% FDE'd because I _know_ the moment that I take it out of the machine it will somehow magically fix itself. At least when SSDs go they normally go out with some "supernova event" of either: hanging, but at least at that point they don't come back to life!
sending weird ATA replies,
replying with random pages of memory rather than data,
falling off the bus and returning back as a new vendor with a capacity of 4MB
manawyrm@chaos.socia..
replied 08 Dec 2025 23:55 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/1HDZN5Fk512r9XXJ3v
benjojo
replied 08 Dec 2025 23:59 +0000
in reply to: https://chaos.social/users/manawyrm/statuses/115686646369781322
@manawyrm I have seen all of the above failures on even the fancy enterprise drives! The cheap ones do end up doing the "have some of my RAM instead" trick more often, but the rest is seemingly all up for grabs as far as failure modes go The old Intel enterprise SSDs loved to do the "I am now a 4MB drive" trick
wolf480pl@mstdn.io
replied 09 Dec 2025 00:02 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/3G71Hk78fg19jHb9c4
benjojo
replied 09 Dec 2025 00:03 +0000
in reply to: https://mstdn.io/users/wolf480pl/statuses/115686672183722807
@wolf480pl @manawyrm I think that was what the 4MB "drive" was for, it had forgotten it's own firmware and was asking for it back, and I guess when you are a drive everything looks like a ATA interface
noisytoot@berkeley.e..
replied 09 Dec 2025 02:42 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/3G71Hk78fg19jHb9c4
benjojo
replied 09 Dec 2025 12:24 +0000
in reply to: https://berkeley.edu.pl/objects/230282f0-b31d-43a5-bd01-397bda670946
@noisytoot @manawyrm not really, I think you need to write with some magic ATA writes to actually write the firmware back on, I suspect it just reports a size to make HBA's happy
evey@chaos.social
replied 08 Dec 2025 23:56 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/1HDZN5Fk512r9XXJ3v
@benjojo Don't forget the latest trick up there sleeve, if you just let me turn off one of my heads i lose 1/20th capacity but i am fine again :D
benjojo
replied 09 Dec 2025 00:00 +0000
in reply to: https://chaos.social/users/evey/statuses/115686649148039351
@evey it's a good party trick as long as you are cool with the idea of rebuilding everything on the drive (I assume this is only really useful for ceph and similar workloads)
wolf480pl@mstdn.io
replied 08 Dec 2025 23:58 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/1HDZN5Fk512r9XXJ3v
@benjojo I saw a USB stick that forgot its USB id from laying unused in a warm place for a few years but after a few plugs and unplugs it came back to life....
benjojo
replied 09 Dec 2025 00:02 +0000
in reply to: https://mstdn.io/users/wolf480pl/statuses/115686659813343838
@wolf480pl to be fair the average USB NAND flash is so bottom of the barrel that anything is seemingly possible. I have/had a USB drive on my desk for a couple of years, only to find that when I tried to read back a mp4 video on it, the video was totally stuffed... "cool".
wupatz@berlin.social
replied 09 Dec 2025 16:03 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/1HDZN5Fk512r9XXJ3v
@benjojo I do frugal ebay golfing on SMART alerts that I know have a likelyhood to recover. Hey, free TBs! as in transient bytes :)