Programming

Should I Store Large DataFrames in Class Attributes?

June 19, 2025

Asked By DataWizard999 On June 19, 2025

Hi everyone! I'm a data analyst with no formal software development training, and I've been working with code from colleagues. I've noticed many examples where they store large DataFrames as class attributes, like this:
```python
import pandas as pd
class MyClass:
def __init__(self, df: pd.DataFrame, ...):
self.df = df
# Other parameters initialized here

def do_something_using_df(self) -> float:
pass
```
I initially didn't think much of it, but I've come to realize that these DataFrames can consume a lot of memory—think millions of rows and hundreds of columns! By creating instances of such classes, we end up duplicating the DataFrame, which can lead to several GBs of memory usage since these objects often remain referenced and aren't garbage collected.

So, I'm wondering: is it bad practice to store large DataFrames inside class attributes? I couldn't find clear guidance on this, especially since methods like `do_something_using_df()` typically perform specific calculations. While I think it's okay for smaller DataFrames with just a couple of columns, the real issue seems to be my colleagues who routinely dump massive DataFrames into classes without proper cleanup. Would a different approach, perhaps separating data handling and processing, maintain the single responsibility principle? I'd love any perspectives on the pros and cons of storing DataFrames as class attributes, not just in Python but across programming languages!

1 Answer

Answered By CodeSavant42 On June 19, 2025

Yes, storing large DataFrames in class attributes can be considered a bad practice, especially if it leads to unnecessary duplication and increased memory usage. Pandas DataFrames are already objects, so there’s typically no need to wrap them in another object unless you truly need to enhance their functionality. Instead, consider using the 'pipe-and-filter' pattern, where you have functions that handle the DataFrame as input and return it after processing. This keeps your design cleaner and avoids memory issues associated with creating multiple instances.

AnalystNinja - June 20, 2025

That sounds interesting! I looked up the pipe-and-filter pattern, and it seems that could be really useful for our needs. Would that approach only work with functions, or could it be applied to classes as well? In our case, we often rely on inheritance for our data operations.

Should I Store Large DataFrames in Class Attributes?

1 Answer

Related Questions

How To: Running Codex CLI on Windows with Azure OpenAI

Set Wordpress Featured Image Using Javascript

How To Fix PHP Random Being The Same

Why no WebP Support with Wordpress

Replace Wordpress Cron With Linux Cron

Customize Yoast Canonical URL Programmatically

LEAVE A REPLY Cancel reply