home tags events about login
one honk maybe more

benjojo posted 13 Apr 2024 19:18 +0000

Who would win?

15 working DDR4 DIMMs or 1 single DDR4 DIMM that ECC errored so hard the system decided it was not worth getting to the point of even telling me what DIMM had gone bad at startup

benjojo replied 14 Apr 2024 15:07 +0000
in reply to: https://mastodon.gamedev.place/users/MissAemilia/statuses/112265705142783377

@MissAemilia Yeah the issue was two fold, one that this was a blade that had not yet had it's IPMI reset, so I needed to boot it in order to see those messages, two the serial console/VGA console could not init before the bad DIMM would take the system down

The second issue was that the chassis/firmware/whatever had a limit of how many ECC correctables can happen in a short time, this DIMM seemed to have DDR4 trained just fine, but instantly blew past this limit to the point where the CPU CATERR'd

cks@mastodon.social replied 13 Apr 2024 21:08 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/861YGNyGc4t6rR9614

@benjojo Apparently modern memory systems have to be 'trained' on system boot to establish their specific characteristics and timing and so on. I'd assume that this is separate for each DIMM so that a bad DIMM can't contaminate this process for the rest via some shared line, but now I'm wondering if I was too optimistic there.

(Modern memory is kind of scary if I think about it too much, but this applies to basically all elements of a modern system. My storage runs OSes!)

benjojo replied 14 Apr 2024 15:09 +0000
in reply to: https://mastodon.social/users/cks/statuses/112265954062793153

@cks I think the DIMM trains just fine (at least looking at the BMC seems to imply so), it's just when the DIMM then "enters the ring" it triggers so many correctable errors so quickly that the CPU just CATERR's out.

The whole memory system is magic, but i'm kind of surprised that the system is not smart enough to "kick out" a DIMM that is partially bad (trains fine, can't reliably remember things)