How do you decide when to make changes to a risky production service?

0
14
Asked By CuriousCoder42 On

I'm interested in hearing how different teams handle the decision to alter a production service that may be overprovisioned or too costly. It currently works but is not very stable and is customer-facing, so there is hesitance to make any adjustments. How do you determine whether to leave it as is or attempt to improve it? Is there a structured process for making this choice, or does it mostly rely on individual experience and risk tolerance? I'd love to get insight into how you practically manage this dilemma.

5 Answers

Answered By CultureChanger On

This question highlights both cultural and technical aspects. Culturally, fear is often what prevents teams from fixing known issues. Establishing core technical standards and engaging leadership on the costs of not fixing problems is vital. If there's an agreement on the risks, addressing tech debt shouldn't feel tribal; there should be a transparent process for managing it.

Answered By SafetyFirst48 On

It really depends on capacity and how much it ranks against other priorities. "Costs more than it should" might not look bad if changing it takes away resources we could use elsewhere. We always need to consider opportunity costs in these cases.

Answered By HandsOnHannah On

When I face a service like this, I tend to get my hands dirty while being mindful of the risks. I think about testing changes on a lower-spec replica or understanding the service better. If management acknowledges the risks, I document any snags to improve overall understanding of the service for future changes.

ReplicaRanger -

I like your approach! Testing for potential issues in a controlled way makes sense. Plus, being well-informed helps everyone manage risks better.

Answered By TechWizard77 On

When I hear that a service is both critical and fragile, I immediately feel a push to take action. It's important to address the root issues before considering any scaling or other changes. A service like this needs focused attention; ignoring it only allows problems to escalate over time.

RiskyBusinessMan -

Absolutely! A fragile service is a red flag that something needs to be fixed. If it's not maintained properly, the risks are just going to grow.

Answered By CodeCruncher On

I believe nothing should be off-limits when it comes to making improvements. I used to work with a critical encryption service that everyone was afraid to touch. When we finally did experience a failure, it turned out we hadn't replicated the keys needed to decrypt data. The fear of touching it led to a 100% loss, something I learned the hard way. We need to challenge the notion that some services are too sacred to modify.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.