Hey everyone! I could use some advice on a locking issue I'm experiencing during our CI/CD deployments. We have a Java application running on Tomcat, and it's operational 24/7. The app frequently accesses certain `Metadata` tables, which often leads to it holding locks on those rows during updates. Right now, when we run our deployment script to update the metadata, it's getting blocked by those locks, leading to timeouts. Currently, we have to shut down all application nodes just to run the update SQL, causing complete downtime. I'm looking for ways to architect a solution for Zero Downtime deployments. Is there a DevOps approach to this that doesn't involve significant changes to the code or heavy involvement from the Java development team?
2 Answers
You might be dealing with a fundamental issue in the application's design. Instead of relying on locks, consider reading the configurations at initialization and refreshing them at intervals. Maybe implement a designated admin endpoint that can trigger a refresh from external systems. Although, if you try to implement hacks to work around this, it could lead to more significant issues later.
One way to address this without changing your application is to treat your updates as a lock orchestration step. You can set short lock timeouts during the migration session, and if there's a lock, you'll need to automatically kill the blocking sessions temporarily. This means if your update script hits a lock, it can detect and terminate only those blocking sessions. During this process, momentarily block app access to that table just for the update, then revert it afterwards, making sure that you don’t disrupt user requests. Running a canary deployment can help ensure this works smoothly before rolling out to all nodes.
Awesome insight! When you mention setting the lock timeout, should that be for the migration session specifically? Also, do I need a script to handle catching blocking PIDs manually or does the timeout take care of it automatically?

I see your point, but as a DevOps engineer, I really need to find immediate workarounds. If I can't make it work without working with the developers, I'm going to have to point out that their application isn't well-suited for CI/CD.