Should I Store Large DataFrames in Class Attributes?

0
2
Asked By DataWizard999 On

Hi everyone! I'm a data analyst with no formal software development training, and I've been working with code from colleagues. I've noticed many examples where they store large DataFrames as class attributes, like this:
```python
import pandas as pd
class MyClass:
def __init__(self, df: pd.DataFrame, ...):
self.df = df
# Other parameters initialized here

def do_something_using_df(self) -> float:
pass
```
I initially didn't think much of it, but I've come to realize that these DataFrames can consume a lot of memory—think millions of rows and hundreds of columns! By creating instances of such classes, we end up duplicating the DataFrame, which can lead to several GBs of memory usage since these objects often remain referenced and aren't garbage collected.

So, I'm wondering: is it bad practice to store large DataFrames inside class attributes? I couldn't find clear guidance on this, especially since methods like `do_something_using_df()` typically perform specific calculations. While I think it's okay for smaller DataFrames with just a couple of columns, the real issue seems to be my colleagues who routinely dump massive DataFrames into classes without proper cleanup. Would a different approach, perhaps separating data handling and processing, maintain the single responsibility principle? I'd love any perspectives on the pros and cons of storing DataFrames as class attributes, not just in Python but across programming languages!

1 Answer

Answered By CodeSavant42 On

Yes, storing large DataFrames in class attributes can be considered a bad practice, especially if it leads to unnecessary duplication and increased memory usage. Pandas DataFrames are already objects, so there’s typically no need to wrap them in another object unless you truly need to enhance their functionality. Instead, consider using the 'pipe-and-filter' pattern, where you have functions that handle the DataFrame as input and return it after processing. This keeps your design cleaner and avoids memory issues associated with creating multiple instances.

AnalystNinja -

That sounds interesting! I looked up the pipe-and-filter pattern, and it seems that could be really useful for our needs. Would that approach only work with functions, or could it be applied to classes as well? In our case, we often rely on inheritance for our data operations.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.