Hey everyone, I encountered something unusual with package installations and I'm hoping you can shed some light on it. I was using PySpark version 3.5.5 without any issues. However, after I upgraded MLflow from a 2.x to a 3.x version (specifically the Databricks extra), PySpark started giving me errors, particularly those associated with Spark 4.
To troubleshoot, I created a clean virtual environment and installed PySpark 3.5.5. At this point, the site-packages folder only contained the expected files for PySpark. However, when I installed the Databricks Connect library, which is a transitive dependency of MLflow, I noticed it was directly modifying the files for PySpark in my site-packages directory. Instead of just hooking into it at runtime or extending functionality, it was literally overwriting the actual PySpark code.
I assumed that typically packages either use monkey-patching or create separate extension layers rather than overwriting another package's files. Now, I'm wondering if this behavior is standard practice in the Python community or if I have a right to be surprised by it.
5 Answers
It seems this is a known issue. There are discussions from Databricks employees acknowledging this behavior. It's concerning because overwriting files can lead to all kinds of problems.
Definitely not normal behavior! This should probably be reported as a bug with the Databricks library. There’s no good reason for it to overwrite files from another package like that.
Are you sure it doesn’t require a newer version of PySpark? Sometimes pip does automatic upgrades that can mess things up a bit.
That’s pretty sketchy behavior. But, it’s somewhat expected in proprietary SDKs; they often do these magical modifications to make things work seamlessly. Still, overwriting files instead of just placing them separately is generally not a good practice.
If you check the package metadata for databricks-connect, you'll find it actually claims to provide and obsoletes PySpark, meaning it’s meant to replace it rather than just coexist. That's problematic and certainly raises some red flags.

Related Questions
How To: Running Codex CLI on Windows with Azure OpenAI
Set Wordpress Featured Image Using Javascript
How To Fix PHP Random Being The Same
Why no WebP Support with Wordpress
Replace Wordpress Cron With Linux Cron
Customize Yoast Canonical URL Programmatically