home tags events about login
one honk maybe more

benjojo posted 29 Sep 2024 14:53 +0000

Feel like in general of running the bgp.tools business over a time I have made pretty good hardware purchasing decisions, I would however make one exception to that.

If at any point you are buying SSDs with the intention of using them for more than bare idle workloads, for the love of god spend the 2x price multiplier on enterprise drives.

I'm officially over replacing Crucial MX500s in production, I finally pulled the trigger on replacing all of them left because they consume so much time just keeping up with them slowly all burning out at slightly different rates, even though I decided to not buy anymore over 9 months ago

Life is to short to stand around in a data hall waiting for a mdraid to resync

0x47df@duckpon.de replied 29 Sep 2024 16:37 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/Y9fF3M74gN4Jhxzy5j

@benjojo i found this out the hard way with my gotosocial instance yesterday. apparently there are something in the region of 220tb written to the used SSD inside it. it can write well for about a week before a reboot to flush the write queue has to happen. somehow no SMART failure, nor reallocated blocks. but wow it has become absurdly slow.

jakob@mastodon.chaos.. replied 29 Sep 2024 16:55 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/Y9fF3M74gN4Jhxzy5j

@benjojo I've made good experience so far by looking for drives with TLC or lower and DRAM Cache. Apacer and Samsung with quite a lot of drive writes and a few years of operation seem to hold up good. But yes the moment you have the money - especially as a business - enterprise drives are worth it. Just the power loss protection alone can make the difference between just booting again and having to restore a backup.

albonycal@fosstodon... replied 29 Sep 2024 18:18 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/Y9fF3M74gN4Jhxzy5j

@benjojo I have Crucial BX500 running on my server. They have their own custom S.M.A.R.T attribute for this
``` 202 Percent_Lifetime_Remain 50```

> It is a measure of how much of the drive’s projected lifetime is remaining at any point in time. When SSD is new it will report “100”, and when its specified lifetime has been reached “0,”
it does not mean that the drive is going to fail when that counter reaches zero, only that your SSD may need to be replaced soon.

Mine is at 50% right now

benjojo replied 29 Sep 2024 18:40 +0000
in reply to: https://noxon.cc/users/jeff/statuses/113222288396514996

@jeff I bought a load of Samsung "MZ7L3960HCJR" to replace them, also known as Samsung PM893, they have been faster, more consistent, and the hardest hit one has only just hit it's first % of lifetime after 100+ days

The only annoying thing about weening off the consumer drives for the enterprise drives is that the enterprise ones tend to be 480G/960G, where the consumer ones are 500G/1T, meaning if you want to easily do a RAID swap upgrade you have to double your sizes

wrmsr@peering.social replied 30 Sep 2024 14:21 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/Y9fF3M74gN4Jhxzy5j

@benjojo I can relate to that, as I have multiple projects causing high write volumes to my SSDs coming from my databases.

They are mostly Samsung consumer m.2 SSDs (ranging everything from 960 to 980) but I am at the point of having to replace one every 6-8months due to them all reaching their TBW limit.

I've started now replacing everything with PM9A3 4TBs and, oh boy, was this ever a good decision. Looking at those power loss protection caps also has a good feeling (despite me running ZFS).