I'm working at a regional bank right now and it's a bit of a mess. Currently, our data scientists and analysts have to download and install Python on their own machines, so there's no consistent tooling, no dependency management, and they're running everything on subpar hardware. I'm really curious about what larger enterprises typically do in this situation. Do they have a central server that people SSH into? Do they maintain local environments with a shared toolset? Any stories or experiences would be really helpful! Also, I heard Chase has a cool setup called Athena, which integrates Jupyter notebooks. That sounds interesting!
5 Answers
I think a lot of it depends on your team’s size and requirements. Platforms like Databricks could be a good fit, like others have mentioned. They allow a seamless connection to data sources and provide a good environment for collaborative work. But yes, they come with a hefty price tag too. You might want to check out other options like Datarobot or Snowflake, especially if you're looking to cut costs.
Yeah, and don’t forget about compliance! It's often a big hurdle for enterprises to manage data responsibly.
For those who are nervous about cloud costs, we’re using on-premise Linux servers, so Python code is deployed through Azure DevOps. It may not be the most flexible option, but it gives us control without the cloud worries.
On-premise setups have their own issues, especially when it comes to managing resources for data-heavy applications. Sometimes I feel like I spend more time troubleshooting than analyzing!
The local route definitely has its perks. Plus, you're away from the risk of public clouds leaking sensitive data.
In many enterprises, the norm is to develop locally as you've described, but then deploy to a more robust setup, like VMs or Docker containers, especially when the analysis needs to run on production data regularly. From my experience, many organizations have specific industry regulations that influence how and where Python is used for analytics.
I’ve noticed an increase in enterprises using cloud VMs for Python development. For instance, Azure ML or Google Cloud Vertex Workbench offers flexibility while taking away the burden of managing toolchains on local machines. Plus, it meshes well with infrastructure as code!
Totally agree! But I wish AWS had more streamlined options like Azure's—Sagemaker feels overly complex for simpler needs.
It would be nice to have a more uniform approach across cloud providers—every company seems to have their own flavor of handling these kinds of setups.
I personally use Jupyter Hub, which runs on a Kubernetes setup. It’s a bit more than you might need, but it offers a centralized environment that users can access through their browsers, eliminating some of the headaches of local installations. If you're interested, there's great documentation available for setting it up on Kubernetes or even just on a local server!
I remember working with a JupyterHub setup too, but we had some issues with shared permissions. Maybe it's smoother now with the newer versions?
Nebari.dev is another option. It has more features than the usual Jupyter setups, though it can be a little unstable.
Exactly! We just shifted from Databricks to a new platform that will save us loads annually. It’s crucial to weigh the costs based on what you really need.