I'm currently grappling with some tricky network problems, like sudden latency spikes and brief outages that are really hard to pin down. These issues aren't consistent, and the usual logging and monitoring tools just aren't cutting it. I'm reaching out to see if anyone here has experienced similar issues and could share their insights.
1. Are there any specific vendors or consulting services that you'd recommend for network validation or testing, especially for those troublesome, hard-to-reproduce problems?
2. What's the average cost for such services—are they generally for one-time diagnostics or ongoing monitoring?
3. Are there software solutions or hardware appliances that you'd suggest for effectively capturing and analyzing these network issues (self-hosted would be a plus, but cloud options can work too)?
4. What tools or strategies have you personally found useful in tackling these kinds of problems?
Right now, I'm mostly guessing and trying to catch these issues in action, so I'd appreciate any leads or personal experiences you could share!
5 Answers
You might also find tools like AppDynamics (more focused on applications), NetBrain, or Thousand Eyes valuable for your needs. They can provide additional insights into app performance and network behavior.
If you’re also considering a lightweight option, check out pktmon with multi-file logging. You can analyze log files with Network Monitor or convert them to pcap for Wireshark. It might help pinpoint issues effectively when they arise!
For another angle, check your disk read/write queue depths on your file servers with perfmon—ideally, they should be near zero. Also, if you log data with Wireshark during the problem, you can capture valuable insights about what’s happening right before an issue occurs.
Also, keep in mind that sometimes, tweaking send/receive buffers or ensuring your MTU settings are correct at routing points can make a huge difference. Don't forget to check for any network loops either!
I personally use EMCO Ping Monitor to keep tabs on several locations with P2P networks. It tracks up/down times, latency, and jitter, which has been a lifesaver for troubleshooting VOIP issues. Plus, you can customize the alert thresholds, which is super handy! It also displays nicely on our NOC screen.
That sounds like a solid solution! I might have to look into that.
You might want to check out LiveAction; it offers a ton of performance data about your network and the traffic that moves through it. Just a heads up, though: you'll need a solid grasp of how your network gear functions to really get the most out of it. And be prepared to spend quite a bit—this isn’t just a $10,000 tool.
Start focusing on packet discards instead of just utilization graphs, because that can uncover potential congestion or misconfigurations. Also, keep an eye out for interfaces sending pause frames, which could hint at flow control issues.
Great pointers! It's so easy to overlook those finer details when troubleshooting. I'll definitely start looking at the discards more closely.
Awesome ideas! I’ve run into MTU issues before, so I’ll definitely double-check that.