Hey everyone! I'm newly responsible for managing Azure Data Factory (ADF) pipelines at my job. We currently use it to extract data, transform it with SQL scripts, and load it into an Azure SQL database, but I have a series of questions to improve our setup.
1. The current run time is about an hour daily, which feels long to me. I've pinpointed some issues in the monitoring tab—what strategies can I use to enhance performance in ADF?
2. The SQL scripts are a bit cumbersome to work with since they're hidden in a small text field. Is there a way to use VS Code to interact more easily with the SQL code in a pipeline?
3. Can I maintain, update, and test the ADF pipelines using VS Code? What are the best practices for this?
4. Since the pipeline is already in production, how can I set up a testing environment safely?
5. With Microsoft pushing Fabric a lot, is ADF becoming obsolete? Should I stick with ADF or consider making a move?
6. Can ADF handle compiling emails, Excel files, and reports like Alteryx, or is it strictly for ELT/ETL tasks?
I know I could find answers online or through AI, but I value insights from the community. Thanks!
3 Answers
I think a testing environment is essential, especially for a production pipeline. Set up your ADF configurations in Git and create a separate dev environment connected to that repo. After testing, you can set up a deployment pipeline to push updates to the production environment. Just remember to adjust your data source parameters as needed. And about Fabric, it's really not ready for production environments—I've heard a lot of complaints about it, so proceed with caution.
Honestly, I find ADF pretty frustrating at times. It only seems manageable for simple ingestion tasks. Once you start adding more pipelines or need them to interconnect, it gets messy. For performance tweaks, focus on optimizing your storage formats and SQL queries instead of full loads when unnecessary. An hour for multiple pipelines doesn’t sound terrible, but consider looking into alternatives like Databricks or SQLMesh for transformations instead of relying solely on ADF.
To interact with SQL code, I recommend using views to simplify things. You can create a database project in VS Code, which makes handling the SQL less of a headache than that small text area ADF provides. As for maintaining and testing pipelines, while it’s technically possible through VS Code, ADF is really more of a low-code solution, so you might want to try tools like Airflow for that purpose.
Exactly! I wouldn't recommend using Fabric yet either—it's not even close to what ADF and Databricks can do together.