I recently made some changes to our RDS setup, specifically upgrading to a larger, read-optimized primary instance for our Aurora PostgreSQL cluster. Since this upgrade, I've noticed some bizarre performance issues, especially with our RDS Proxy. One of my service endpoints, which previously took around 0.5 seconds to respond, is now taking 12.8 seconds for the exact same queries when routed through the proxy compared to connecting directly to the cluster's writer endpoint. Has anyone else experienced similar latency issues after upgrading their instances? We've relied on RDS Proxy without issues until now, so I'm really puzzled. I even tried creating a new proxy to see if it would fix anything, but the latency persists.
1 Answer
Have you checked the explain plans for your slower queries? It’s possible that during the upgrade, the table statistics got messed up or lost, which means the query optimizer might not be functioning properly. If you've got parallelism enabled, the extra vCPUs might also be throwing things off due to changes in the number of workers.
I did run some explain analyze commands on the query, and the new instance actually performs better in both planning and execution phases when compared to the old instance. Although I did notice some unusually long planning times on the new instance occasionally.