I'm seeking advice on a Kubernetes issue. Specifically, how can I reliably capture and store full memory crash dumps (over 100GB) from a Windows pod in Azure Kubernetes Service (AKS) after it crashes? It's crucial that these dumps are saved without corruption and are accessible for download or inspection later. Here's some extra context: my cluster runs on AKS and I've tried using a premium Azure disk (az-disk), but it hasn't proven reliable for this use case. I'm also considering options like emptyDir, but I haven't tested that yet. Any insights would be really appreciated!
4 Answers
Can you share what you’re trying to debug? Knowing more about the application may help others provide tailored advice. Also, Windows containers can be tricky; what do you hate most about them?
You should definitely refactor your application to handle the dumps better. If these large memory dumps are happening frequently, it might indicate an underlying issue that needs addressing. Instead of just collecting them, look into optimizing your application to reduce memory usage or prevent crashes altogether.
It sounds like you're dealing with quite the challenge! For those massive crash dumps, I'd suggest looking into a dedicated storage solution instead of just relying on a disk. Maybe Azure Blob Storage could be an option? It might help avoid corruption, and you could set up a process to handle retries if the uploads fail. Also, ensure the pod has proper permissions to write to the storage resource.
I agree, using Blob Storage seems like a safer bet for handling those large files. Just make sure to keep an eye on access rights!
If you're having a lot of issues with the Windows containers, maybe explore lightweight alternatives in your architecture. Sometimes, the hassle isn't worth the payoff!
Good point about Azure Blob Storage! It's super scalable, which would be ideal for those big dumps. Plus, it could save you some headaches over time.