Hey everyone! I'm looking for some real advice on how to write better code and speed up my workflow. I've been working as a data scientist, but I don't have a formal programming background. Over the years, I've picked up some tips for clean coding and reusability, but I'm still struggling quite a bit. Lately, I've been given a lot of responsibility for data generation at my job. While I can complete the tasks, I often run into issues, like leaving test code in place or using limited inputs to check my work. This leads to incomplete data and ultimately backtracking to fix it, which is super frustrating.
I know that programming is an iterative process, but my iterations often feel unproductive. I try to review my scripts end-to-end to catch any errors before submitting my PR, but new logic I introduce sometimes breaks during actual runs due to data issues I didn't catch during testing.
My team works really quickly, and the pressure to keep up is making me anxious. During scrum meetings, I frequently don't have substantial updates to provide. Has anyone dealt with similar challenges? What strategies can help me cut down on my turnaround time while still producing high-quality work?
Also, just to add, I mainly work with dataframes using pandas or pyspark, often applying functions row by row. I've learned a bit about parallel processing with ThreadPoolExecutor and other methods to speed things up.
4 Answers
I noticed you're leaving test elements in your code. A good practice is to write your test cases separately from your main logic. It will also help if you can create a structure that allows more customization of your data without impacting everything else. Think of it as a design improvement!
Writing clean code can feel tricky, especially if you're new to programming. My advice would be to focus on separating unique tasks into their own classes or files, keeping the interfaces clean. This way, if you need to tweak something, you won't disrupt the whole system. Big functions that run for hundreds of lines are really tough to maintain. Little changes can save you tons of hassle down the line!
That's solid advice! I usually hold off on using classes until I feel the logic is stable. I need to start thinking about structuring my code with that in mind right from the beginning.
I’ve had similar roadblocks. I found that using a more procedural coding style, rather than strict object-oriented programming, helped me navigate complexities more easily. It allowed me to focus on manipulating data directly, simplifying a lot of tasks that felt complicated before. Sometimes, going back to basics can clear up a lot of the confusion!
When working with test elements, if you're using Python, consider a testing framework like doctest. It can be quite efficient for smaller projects, and it ensures you keep your tests organized without cluttering your main logic. Plus, it might reduce some of the friction you're feeling!

I get that! My hang-up is needing to see how every variable changes throughout my entire script. Maybe I should try breaking things out instead of relying on notebooks.