Hey everyone! I'm looking for some tips on how to streamline my code when using Polars. Recently, I discovered that I can simplify some filters like this: `df.filter(pl.col("size") == 12)` can be reduced to `df.filter(size=12)`. Are there other ways to make filtering easier? Sometimes using `pl.col()` feels like too much work! For example, I also have `df.filter(pl.col("value").is_in(values))` and `df.filter(pl.col("value") >= 10)`. Any advice?
9 Answers
Are we sure that making the code shorter is worth it? If it gets more concise, will it still be readable a year later when you come back to it after a break?
I haven't actually used Polars, but could you replace the column function with a direct DataFrame index like `df["column"]`? That might simplify things. Or, you might get rid of the `pl.` by importing `col` directly!
You can definitely use `df.filter(pl.col.value.is_in(values))`, which is simpler and cleaner. If you feel like `pl.col()` is too lengthy, try using an import alias. Just do `from polars import col as c`, then you can write `df.filter(c.value.is_in(values))`.
I usually import `lit` too since it makes expressions cleaner.
Just stick with SQL, it might be more straightforward for filtering!
Hey, if all else fails, you could always just use pandas instead!
Not sure if it's a huge improvement, but you could do:
```python
def filter_by(df, col_name, value):
return df.filter(pl.col(col_name) == value)
filtered_df = filter_by(df, "value", 5)
```
You can also do something like:
```python
pl.col.size
pl.col("size")
```
A good trick is `from polars import col as c`, that way it's shorter to type. Just remember to use `c` afterwards!
There's not a ton of shortcut options. The `df.filter(size=12)` works nicely because of keyword arguments, but it's mostly limited to equality checks. Unfortunately, you can’t do something like `df.filter(size < 12)`. One cool thing is that you can actually write SQL in Polars, which might be easier for some tasks!
However, `df["column"]` will only give you the rows for that column, not filter it.