I've got a bit of a dilemma in my coding practice. I have two inputs: an output folder name and an input image. My goal is to generate an Excel file object that contains data extracted from images, while also creating a dictionary with that CSV data for later use in the program. The catch is that I feel uncomfortable merging these tasks into one function; it seems like I'm violating the principle that functions should only perform one task. However, if I separate them, I'd have to loop through the same data twice, which raises concerns about computational efficiency. Although I could technically use the Excel file to build the dictionary later, that feels messy for performance reasons. I'm asking for opinions on how best to handle this without compromising on clarity or performance, especially since the code will be under scrutiny. What do you all think?
6 Answers
Looping through the same collection twice isn't necessarily a huge issue, as long as you're not doing redundant computations on the items. The "one job per function" guideline is more of a suggestion, and in practice, some compromises are often needed. If your function remains clear and logical, you're probably okay.
You might be overthinking the performance aspect here. If it's a small project, either approach is fine. But since you mentioned it’ll be scrutinized, I would suggest splitting up the functions for clarity and maintenance reasons. It's better to be cautious.
Time definitely matters when you're dealing with tens of thousands of files. Efficiency should be a key factor here.
Breaking the 'single job' rule is okay; it's more about practicality than strict adherence to guidelines. Instead, consider a function that processes the data once, then calls separate functions to handle the Excel and dictionary tasks. It could make your code cleaner and more modular.
Is having an extra loop really that bad? If it makes your code simpler and easier to follow, that's often worth a little extra compute time. Plus, how can we define 'one job'? Sometimes those lines blur, and focusing too much on the rule might not be productive.
You might be overcomplicating things. Since you're dealing with two distinct outputs (Excel and CSV), just create two functions to handle each task. This way your code remains clean, and you won’t have potential bugs messing with Excel generation while trying to manage dictionary values.
It's hard to give exact advice without seeing your code, but typically, if you're processing a lot of data, it's better to handle it all in a single loop when possible. It seems mixing the Excel generation and dictionary creation might be too complex; keeping them separate could simplify any debugging or future changes.
That's true, especially if you aren’t reinvoking complex calculations each time. But if you are in this case, maybe reconsider how to divide the tasks.