I'm considering using Polars for its multi-core capabilities, but I'm curious about how well it works with other Python libraries in the PyData ecosystem, like scikit-learn and XGBoost. Can anyone share insights on this?
5 Answers
It's pretty simple to switch between Polars, NumPy, and Pandas if you need to. You could just whip up a quick prototype and see how it goes. By the way, both scikit-learn and XGBoost are mentioned as supported, so you're likely in good shape!
A lot of libraries are starting to embrace `narwhals` for writing DataFrame-agnostic code, like Altair. I've heard scikit-learn is working on that too, which is a good sign!
Awesome! Once XGBoost gets on board, we’ll be all set!
Check out this link for some info: XGBoost's currently converts Polars DataFrames to PyArrow tables, which might be more efficient than converting to NumPy or Pandas, although it may not be zero-copy for all data types.
Good to know! Plus, with Pandas 2.0 using Arrow as a backend, that could help with efficiency too.
I don't have all the details, but Polars does include a method to convert to Pandas. It might help if you run into compatibility issues.
If you need to, just search for how to convert a Polars DataFrame to a Pandas DataFrame. Just keep in mind that this can take a bit of time and could offset some of Polars' performance gains.
Exactly! If the conversion is too slow, it could really diminish the advantages Polars offers.
Totally! Given how widely used Pandas is, converting back to it when necessary is a solid plan. But honestly, I was drawn to Polars mainly for the multi-core support.