Hey everyone! I'm curious about the latest technologies and libraries that you've found useful for ETL and ELT pipelines. I've recently been experimenting with ConnectorX and DuckDB, and I think they're amazing! Also, I've started using a logging library in Python which has really enhanced my ability to track my pipelines. What other cool tools or methods are you all using?
4 Answers
What logging library are you using? I'd love to check it out!
In my opinion, Prefect and DuckDB make an excellent combination for an ETL stack. If you're working with vector embeddings, consider using ONNX runtime models instead of heftier PyTorch models for better performance.
I've been really impressed with Ploomber; it's a solid Python DAG framework that treats nodes as Python functions. It supports various configurations and has nice integrations with Jupyter, Docker, and Kubernetes. Plus, the built-in caching and logging features are super helpful! Also, Ibis is great if you want to work with dataframes across multiple compute backends seamlessly.
Don't forget about Polars! It's an amazing tool for handling dataframes efficiently in Python.
I use Loguru. It has really streamlined my logging process.