EPYC Gen 4 Processors: Struggling with Low Bandwidth Performance

0
0
Asked By TechieNerd42 On

Hey everyone, I'm dealing with a frustrating situation with our EPYC Gen 4 processors. We've got three Dell PE 7625 servers, each with dual AMD 9374F processors (32 cores) and 512GB RAM. Unfortunately, we're experiencing some pretty terrible bandwidth issues between VMs and from VMs to the host node on the same server. Specifically, we're only seeing around 13 Gbps from host to VM and about 8 Gbps VM to VM, despite having a 50 Gbps bridge setup with two 25 Gbps ports bonded using LACP and no other traffic going through.

I've tried several measures to fix this but haven't had any luck:
1. Enabled multiqueue in Proxmox with 8 queues, but it didn't help.
2. Adjusted the BIOS settings to NPS=4/2, but again, no change.
3. I ran an iperf test between VMs on an older Intel cluster, and it achieved around 30 Gbps, while my AMD setup only managed 38 Gbps in comparison, which is pretty disappointing given the specs.

I went ahead and tested the same Proxmox version on another server with an Intel Xeon 5410, and it peformed at 68 Gbps under identical conditions. I'm not sure why there's such a difference, and I'm looking for any insight on how to boost the inter core/process bandwidth to maximize throughput. If this keeps up, I'm worried about recommending AMD processors for virtualization to future buyers.

Any suggestions or insights would be really appreciated! I've even tried different operating systems like Redhat and Debian, but performance hasn't improved, and only with Ubuntu 22 did I briefly see 50 Gbps before it dropped again with an upgrade. Thanks!

2 Answers

Answered By BandwidthBuster76 On

I think your issue might be due to the network stack being involved even though you're not actually using the network cards for the test. To get a better sense of the true bandwidth between cores, you should try using Intel's Memory Latency Checker. It could give you clearer insights into the performance across your cores.

TechieNerd42 -

That’s a good point. I’ll check out the Intel MLC, but it’s puzzling why my AMD setup is lagging behind an entry-level Intel chip. Any ideas?

Answered By ServerGuru99 On

It sounds like you might be dealing with a NUMA (Non-Uniform Memory Access) issue. Check to ensure that the core and RAM assignments align properly with your server’s architecture. Sometimes mismatches can lead to performance drops like what you’re experiencing.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.