I'm wrapping up the setup for a Pure Storage FlashArray used in a GPU cluster and need some guidance on the Linux networking. I have two Cisco Nexus 9336C-FX2 switches configured with vPCs and VLAN 77 assigned to the storage network. The cabling is done, and now I'm turning my attention to my RHEL 8.10 servers, which are a mix of DELL R750XAs and a DGX-1. Each server has 2 to 4 InfiniBand interfaces (ConnectX-6), and my goal is to create named LACP bonds (for example, ps_bond0) and to add VLAN 77 with an MTU of 9216. I'm using nmcli to set this up, but although I've created the bond and added the InfiniBand interfaces, I'm having trouble getting the VLAN interface to come up. Are there any common issues I should be aware of? I'd appreciate any advice and can share my current configuration if that helps!
4 Answers
It's worth noting that the Cisco Nexus 9336C-FX2 might not support InfiniBand since it's fundamentally different from Ethernet. You should check if you can switch the ConnectX-6 interfaces to Ethernet mode. I found this guide that may be helpful: [link]. Just be cautious, as I’m not sure if it directly applies to your NIC model.
Make sure that the physical links are up first before diving deeper into LACP issues. Are you sure the cables are connected properly? If nmcli dev status shows them down, you might want to investigate the previous suggestion about changing the mode to Ethernet.
Just a heads up: for storage connections, it’s usually better not to use bonding. Multi-Path I/O (MPIO) operates at the storage layer rather than the networking one, so bonding might not be necessary.
You can check the status of your bond using /proc/net/bonding/. It can provide you insights into why it might not be coming up. If the interfaces aren’t showing up there, that's definitely something to look into.
Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures