I'm working in a university datacenter with around 200 devices. We're currently using Zabbix for monitoring metrics, and it's been fantastic. However, we lack any sort of log aggregation, which is making troubleshooting a bit of a nightmare. Currently, I'm testing log solutions with just one node to see what might fit best before rolling out to the entire environment. I'm looking for an open-source stack that offers complete observability—things like correlation, aggregation, filtering, visualization, and alerting.
I'm torn between two options:
1. **Keep Zabbix and Add Wazuh:** This would mean continuing to use Zabbix for metrics, which works great, and adding Wazuh for logs. It's a low-risk approach but means managing two separate systems.
2. **Switch to a Unified Stack (OpenSearch/ELK/OpenObserve/CheckMk):** This would consolidate logs and metrics all in one environment from the get-go.
I'm hesitant because we're just starting out with one host deployed. Is having that "unified view" really beneficial? Or should I stick with specialized tools—using Zabbix for metrics and Wazuh for logs? Also, if anyone has experience with OpenSearch, ELK, OpenObserve, or CheckMk for infrastructure monitoring, I'd love to hear how they perform, especially for CPU, RAM, and disk metrics. Zabbix is great for metrics, but if any of these alternatives can handle everything seamlessly, maybe that's the way to go? Plus, being a small team (2-3 people), I want to make the best choice before scaling up to all 200 devices. Any insights?
4 Answers
I cover a similar infrastructure setup, and I've found that splitting the systems is the way to go. Keep Zabbix as your metrics tool and add Graylog for logs. I’m running Graylog in Docker, and it’s been a lifesaver for troubleshooting both Windows and Linux servers!
Honestly, I’m not sure where you see the consolidation in Option 2. Each of those tools serves different purposes, and you probably don't want to manage them all together unless you're really familiar with how they function. You'd be better off sticking with Zabbix and Wazuh for the small team size you have. Can you achieve your goals with the other options? Sure, but they may introduce unnecessary complexity without a good understanding of their functionalities.
Your point about the team size is valid. For a team of 2-3, sticking to Zabbix and adding Wazuh seems more practical.
You might want to hold tight for Zabbix 8. They’re adding more log management features according to their roadmap. If you can wait, testing that version might be worth it before making a decision on additional tools.
Good call on Zabbix 8.0! I’m currently on 7.0 LTS. I plan to add Wazuh for logs and security measures now since we really need log aggregation, then check out 8.0 when it drops. But will 8.0 have full-text search capabilities across all devices, or would Wazuh still be necessary for that?!
I’d suggest going with your first option—stick with Zabbix and add Wazuh. Zabbix is fantastic for monitoring metrics like network and storage health, while Wazuh excels at security and log management. For devices like switches and firewalls, consider bringing in Graylog for log aggregation; it provides great visibility and can integrate with both Zabbix and Wazuh favorably. Plus, both Zabbix and Wazuh have reliable agents for Windows and Linux, which would make setup easier.
I've had success using Vector.dev to help with log transport to Graylog. It does a great job with filtering before sending logs and can handle buffers effectively!
Librenms can also serve as a syslog server; just be cautious with load; it can slow down under pressure.

Thanks for clarifying the differences! I wasn't planning to use all those together; just wondering which single one would be better for handling both metrics and logs compared to sticking with Zabbix and adding Wazuh. Would something like OpenObserve really simplify management for our small team instead?