We're reaching a point where our customers want to access the raw data directly instead of going through our BI tools. I'm open to this idea since it seems fairly common, but I'm concerned about how to do it safely, especially since we're using MS SQL. I've read that a popular method involves implementing CDC (Change Data Capture) with connectors or using AWS DMS to stream data into Kafka, which can then be sent to a cloud sink like Azure Data Streams. However, I haven't tested how schema changes impact what customers experience. Are there safer alternatives to provide this access?
2 Answers
Implementing CDC can bring its own set of challenges. I’ve run several Debezium servers, and it often ends up amplifying issues if database users are overloading the system.
Honestly, it might be risky to give direct access to your backend systems. There's a lot that can go wrong there. Instead, you might want to set up a flat file delivery agreement with your business. This way, you can control what data goes out without opening up your database directly.
You can avoid direct database access for CDC. Tools like Airbyte might work for you. Just collaborate with your client on setting up the ingestion destination on their side and configure CDC to send data there.