I've noticed some performance issues when using Python's `copy.deepcopy()` that seems to be causing bottlenecks in my applications. In fact, I found that using `deepcopy` can actually be slower than serializing objects with libraries like pickle or json in certain situations. I wrote an article explaining why this is the case, highlighting issues like recursive approaches and safety checks that add memory overhead. I'm also sharing some practical alternatives to `deepcopy`, such as using shallow copies with manual handling, pickle round-trips, or restructuring code to minimize copying. Has anyone else experienced this issue or found any other performance pitfalls with commonly used Python functions?
5 Answers
Using deepcopy is like doubling down on inefficiency. It's true that deepcopy just pickles and unpickles an object, which can be slower than plain old pickling. If someone was to implement it in C, we'd likely see significant performance gains.
Deepcopy often feels like a code smell to me. The last few times I've seen it, it's been for simple data structures like numpy arrays or pandas dataframes. I’ve removed it and everything worked fine without it. People might be over-engineering their solutions by throwing deepcopy everywhere as a band-aid, which is frustrating.
Totally agree, it's like they've learned the wrong lesson and think deepcopy is some sort of cure-all!
I've been in similar situations where it just seems unnecessary and slows things down.
I recently faced a situation where I had to quickly adjust a friend's script, and it was surprisingly faster to just serialize and deserialize the objects using orjson rather than using deepcopy. I know you mention a speedup in your post, but in my case, it was even more significant!
That’s a great example! It’s surprising how often people overlook such alternatives for speed.
I rarely use deepcopy myself. If I need to keep multiple versions of complex data, I prefer libraries like pyrsistent or immutables. They offer structural sharing, which can save memory and can be faster than deepcopy.
Yeah, I get that! Those libraries let you maintain state without the overhead that comes with deepcopy.
I've seen colleagues using deepcopy without a second thought, even when the function just needs to read the object. It's frustrating! I think many do it out of fear of mutable states; they've had bad experiences and end up overusing it to be 'safe.'
It's like a trust issue with the codebase. They might not fully trust the implementation, especially when it comes to libraries manipulating state.
Exactly! It seems like a lack of understanding of when it's genuinely necessary. That just leads to unnecessary complexity.
Exactly, the implementation really matters! Someone should take the time to create a faster version in C.