Hey folks,
I'm looking for some advice regarding a DR test I conducted in Azure about two months ago for a client. Everything went smoothly, following my run plan perfectly. After I completed my checks and started everything back up, all seemed normal. However, we recently received reports on Monday that the jobs in our SAP system, which uses HANA, are running slower than usual.
I've ensured that the disk caching settings adhere to Azure's documentation. The HANA database is running on an m128s, and the application servers are on d64s.
I've thoroughly reviewed the performance metrics multiple times without finding any indicators of issues—CPU, memory, network, and disk all look good. The only concern is that I notice brief latency spikes on the HANA instance's data disks, lasting about 10 minutes and occasionally hitting 600ms. Still, it seems minor considering the overall response time during a 24-hour period is manageable at around 100ms. I understand that Azure can have disk latency under load, and I've observed similar spikes before the DR test as well. Overall, metrics look remarkably consistent before and after the test.
I'm stuck on what else to check. What could have changed from a VM perspective during the failover and failback? Has anyone dealt with a similar situation? I'm also beginning to examine the OS for potential clues, but once again, the metrics don't suggest heavy usage.
Just to clarify, the system was performing well before the DR test, and we have that documented. Post-DR, it appeared fine, yet some SAP jobs are now taking twice as long, with others experiencing a slower rate too.
I'm starting to wonder if new data was introduced during the DR that could be causing this. Any insights would be greatly appreciated! Feel free to ask for more details if you need them, as it's a lot to fit into one post.
3 Answers
Have you checked for any potential network hops or other integrated systems that might not be in the same Azure region as your DR setup? Also, verify the type of disks you’re using. Are they Standard or Premium? That might affect performance too.
The systems are all in the same VNet, and while they do connect outbound occasionally, their main payroll jobs are not affected by that and are also running slower. Just to add, we're primarily using South Africa North with backup in West for DR, and the disks are Premium SSD before and after the test.
Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures