We're reaching a point where our customers want to access the raw data directly instead of going through our BI tools. I'm open to this idea since it seems fairly common, but I'm concerned about how to do it safely, especially since we're using MS SQL. I've read that a popular method involves implementing CDC (Change Data Capture) with connectors or using AWS DMS to stream data into Kafka, which can then be sent to a cloud sink like Azure Data Streams. However, I haven't tested how schema changes impact what customers experience. Are there safer alternatives to provide this access?
3 Answers
With the complexities involved, it's probably not wise to take random suggestions from online forums. You might want to consider hiring or consulting a knowledgeable enterprise architect who can grasp your business case and tech stack. That being said, there are various ways to mirror data. If you're using MS SQL, setting up a read-only Always On Availability Group secondary replica could be an option.
Implementing CDC can bring its own set of challenges. I’ve run several Debezium servers, and it often ends up amplifying issues if database users are overloading the system.
Honestly, it might be risky to give direct access to your backend systems. There's a lot that can go wrong there. Instead, you might want to set up a flat file delivery agreement with your business. This way, you can control what data goes out without opening up your database directly.
You can avoid direct database access for CDC. Tools like Airbyte might work for you. Just collaborate with your client on setting up the ingestion destination on their side and configure CDC to send data there.
I get that. I usually take online advice with a grain of salt as well. I’m just trying to gather insights beyond what a quick Google search offers. One of our databases is already part of an Always On group, but we'd have to think about security if customers need VPN access to production, which seems risky.