What’s the Best Way to Persist Data Across Pipeline Runs?

0
1
Asked By TechWhiz123 On

I'm looking for a reliable way to save and manage key-value outputs from one pipeline run to the next. To be clear, I want to persist data outside of just passing values between jobs in the same pipeline. Currently, I've been using S3 to store the data in JSON or YAML and then accessing it in future runs, but it feels a bit too manual for something that seems common in workflows. I'd love to hear about more effective or maintainable solutions that you've found in real-world scenarios. Any best practices or potential pitfalls to watch out for? For context, I'm running a list of client names through a stepwise migration process where new clients are flagged and old ones are removed. If a step fails, that client doesn't get removed until it succeeds—the migration steps are all idempotent. Thanks!

5 Answers

Answered By PipelineGuru77 On

Have you considered using a matrix to manage your data? Instead of typical criteria, you could use client IDs or names for better organization. Not sure if it's a perfect fit, but worth exploring!

Answered By CloudSavant11 On

You might also like the idea of using a lightweight key-value store with a RESTful interface, like Kinto. It can provide a more precise reading/writing experience without needing to overwrite entire blobs. This could work well as a sidecar solution!

TechWhiz123 -

Kinto sounds interesting! I couldn't find much info on it—do you have a link?

Answered By CodeCrusader88 On

For my similar use case, we decided to just go with MySQL. It's proved super handy, especially when we need to modify or add business logic. Sure, a static JSON file could work, but SQL gives us timestamping and auto-incrementing features, plus it’s great for making a status dashboard!

DevDreamer42 -

How are you interacting with the database in your jobs? Just standard SQL queries?

Answered By QueryMasterX On

It’s tough to give specific suggestions without knowing your tools. If you're using something like Jenkins, you can archive artifacts to pull from in future runs. Alternatively, consider pushing your data to git with a meaningful commit message for traceability.

TechWhiz123 -

I'm using GitLab CI with a custom alpine image, so I can add any tools I need.

Answered By DataDiva99 On

S3 is solid, but remember to keep your URIs unique to avoid overwriting issues. It's definitely doable, but I get the feeling that there’s a need for a more streamlined method out there. Let me know if you hit any snags with this approach!

TechWhiz123 -

Yeah, I'm sticking with S3 for now, though I'm itching for an easier solution.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.