Hey everyone! I'm looking to get a sense of the current options for frameworks—especially open-source ones, but I'm open to all suggestions—that can help with data quality checks. I used "great expectations" a while back, but I'm curious if it's still a top choice or if there are better alternatives out there now. I'm particularly interested in any frameworks that might use LLMs for these quality checks. Any recommendations?
5 Answers
For real projects, I've had good experiences with dbt and sqlmesh. Great Expectations is decent, but it can get pretty messy as projects scale. Just a heads up, relying on LLMs for quality checks might not be practical in high-stakes scenarios since they're not deterministic.
Just to throw it out there, I don’t think relying solely on LLMs is the way to go.
You might want to try Pandera for dataframe validation. It’s been helpful for my data validation needs.
Have you considered Frictionless? It's worth checking out for managing data quality.
You mentioned you're looking for data quality tools—what specific types of data are you working with? If it's tabular, there are some other options worth exploring!
Related Questions
Online Hash Generator - String to Hash Converter
Convert CSV To HTML Table
Convert Json To Xml
Bitrate Converter
JavaScript Multi-line String Builder
GUID Generator