New friend acquired
stanford@iceshrimp.s..
replied 12 Feb 2024 16:43 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/4JyW99sKWS62vP3hK9
jeroen@secluded.ch
replied 12 Feb 2024 19:28 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/4JyW99sKWS62vP3hK9
@benjojo "acquired"..... so who lost connectivity? ;) But good to see that I am not the only one transporting network/servers in a backpack.... or did you know how versatile and waterproof those blue IKEA bags are? ;)
electronic_eel@socia..
replied 12 Feb 2024 20:43 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/4JyW99sKWS62vP3hK9
benjojo
replied 12 Feb 2024 23:30 +0000
in reply to: https://social.treehouse.systems/users/electronic_eel/statuses/111920453283227428
benjojo
replied 22 Feb 2024 12:19 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/4JyW99sKWS62vP3hK9
Updates, the insides of this thing are very cute and compact!
karppinen@mastodon.o..
replied 22 Feb 2024 14:58 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/467Rj2M8KMYH4wtF1T
@benjojo looks so spacious and well designed compared to the (Trident3) Dell S5212F. Additional boards galore, twice the fans due to the silly fan placement, etc.
benjojo
replied 22 Feb 2024 12:21 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/467Rj2M8KMYH4wtF1T
There is also a mystery QSFP port connector sitting on the side of the switch... with no connector on the other side. Wonder what it is used for, There seem to be plenty of other programming pins on the board, so I doubt it's a factory programming connector
electronic_eel@socia..
replied 22 Feb 2024 12:27 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/2HbVv8D5DV6Yk62W3v
@benjojo this looks like a half-width switch to me. Is it one of the Mellanox ones you posted in a photo some time ago? If so, it could be that this is for an option to "stack" two of these units next to each other.
benjojo
replied 22 Feb 2024 12:28 +0000
in reply to: https://social.treehouse.systems/users/electronic_eel/statuses/111975126970221854
@electronic_eel Why would they put that port on the side of the board? They would surely have to have the opposite switch upside down for a side connector to be at all practical
erikk@chaos.social
replied 22 Feb 2024 12:44 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/2HbVv8D5DV6Yk62W3v
@benjojo I would guess its just for testing the board after it rolled of the factory. Or at least test the asic it self. Given it seems the board is used for both the SN2100 and SN2010 but would need to pull one open at work to check
flangey@chaos.social
replied 22 Feb 2024 13:21 +0000
in reply to: https://chaos.social/users/erikk/statuses/111975194454253462
benjojo
replied 22 Feb 2024 13:31 +0000
in reply to: https://chaos.social/users/flangey/statuses/111975337891477372
equinox@chaos.social
replied 22 Feb 2024 15:20 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/2HbVv8D5DV6Yk62W3v
@benjojo random test pins wouldn't be able to carry high-speed signaling… and there is at least one high-speed thing you might want access to for factory programming: PCIe
erikk@chaos.social
replied 22 Feb 2024 15:24 +0000
in reply to: https://chaos.social/users/equinox/statuses/111975807728525716
equinox@chaos.social
replied 22 Feb 2024 15:30 +0000
in reply to: https://chaos.social/users/erikk/statuses/111975821876611543
@erikk @benjojo yeah, could be anything between ×1 and ×4 if it is PCIe (funnily enough, for a "full" PCIe ×1, you'd still need a QSFP connector instead of SFP; there aren't enough pins on SFP to carry the PCIe clock and auxiliary signals. It can be made to work without those, but depending on details that can be annoying…)
equinox@chaos.social
replied 22 Feb 2024 15:22 +0000
in reply to: https://chaos.social/users/equinox/statuses/111975807728525716
@benjojo also, although the position is a relatively strong argument against it, this could be breakout from the SoM
benjojo
replied 22 Feb 2024 13:09 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/2HbVv8D5DV6Yk62W3v
benjojo
replied 22 Feb 2024 15:51 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/qsjm5YQ9lqGP7G13hp
Well that was painful (firmware versions mismatching and not upgrading etc) but we got there! A working 25G/100G switch running boring debian
# ip l | grep BROADCAST | wc -l
23
# sensors
mlxsw-pci-0100
Adapter: PCI adapter
fan1: 7004 RPM
fan2: 7437 RPM
fan3: 7234 RPM
fan4: 7079 RPM
temp1: +32.0°C (highest = +32.0°C)
front panel 001: +0.0°C (crit = +0.0°C, emerg = +0.0°C)
...
front panel 022: +0.0°C (crit = +0.0°C, emerg = +0.0°C)
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +18.0°C (high = +98.0°C, crit = +98.0°C)
Core 1: +18.0°C (high = +98.0°C, crit = +98.0°C)
Core 2: +19.0°C (high = +98.0°C, crit = +98.0°C)
Core 3: +19.0°C (high = +98.0°C, crit = +98.0°C)
# uname -a
Linux bgptools-switch 6.1.78-2fast2benjojo-2 #1 SMP PREEMPT_DYNAMIC Thu Feb 22 13:43:28 UTC 2024 x86_64 GNU/Linux
benjojo
replied 22 Feb 2024 15:57 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/tJ3St7xHljMydc22wx
I am mostly appreciating that I don't have to hear this anymore now that the software is figured out how to chill the fans out
cks@mastodon.social
replied 22 Feb 2024 21:22 +0000
in reply to: https://benjojo.co.uk/u/benjojo/h/x38nK7XLyj454J881r
@benjojo I sort of understand why so many things panic-ramp the fans to 100% when they start powering up, but boy do I wish they didn't do it. Especially the things with BMCs. Dear BMC, you already have a ton of sensors, can't you tell that the system is not going to overheat just because you started the CPU?
drscriptt@oldbytes.s..
replied 22 Feb 2024 21:49 +0000
in reply to: https://mastodon.social/users/cks/statuses/111977230760054856
zev@honk.bewilderbee..
replied 22 Feb 2024 22:00 +0000
in reply to: https://mastodon.social/users/cks/statuses/111977230760054856
@cks @benjojo There's unfortunately often a delay between when the BMC powers on the host processor and when the interfaces by which the BMC can read the temperature of that processor (e.g. PECI on Intel platforms or SB-TSI for AMD) actually come fully online. The fans are usually on the same 12V power rail as the host and hence turn on when it does, and lacking a valid temperature reading from the host CPU, going into failsafe mode is the...well, safe option. Logic like "if we just turned it on right now on it's probably not very hot" runs into problems if it had recently been on and is still holding a lot of residual heat...you could potentially get into tracking more history to disambiguate that in turn, but then you're suddenly a lot more stateful than you were which gets messy and fragile (especially considering that the BMC and the host can both reboot independently of each other), and it's ultimately just a lot simpler and less error-prone to make it (relatively) stateless and err on the side of not cooking things. And of course since most servers end up situated in places where there usually aren't people around to hear them, acoustic noise optimization is typically pretty low on the list of priorities. (Yes, I work on BMC firmware.)
oclsc@mstdn.ca
replied 22 Feb 2024 22:15 +0000
in reply to: https://mstdn.ca/users/oclsc/statuses/111977431446863796