Is There a Query and Eval Equivalent in Polars?

0
0
Asked By DataWizard42 On

I've been a pandas user for years, but I find typing out all the pl.col functions tedious when slicing and dicing data. While pandas has the query and eval functionalities that are great for data exploration, I love the performance boost Polars offers, especially for handling large datasets. Unfortunately, I'm missing an equivalent to pandas' query and eval in Polars and would rather not type lengthy expressions using pl.col or pl.when within nested conditions. For my own needs, I've tried to create a custom query and eval function using Lark for grammar parsing, which turns string expressions into Polars queries. I've also made some enhancements like a new with_column function and a version of np.select for Polars. Has anyone else explored similar features or solutions?

3 Answers

Answered By SyntaxSavvy On

You might want to consider using the pipe function; it can clean up your code. But yeah, avoid using when-then-otherwise too much because it's slower. Also, try using gt and ge instead of just > and >= for comparisons. And don’t forget about SQLContext—it's pretty nifty too!

DataWizard42 -

I see your point! I thought ChatGPT said using gt and le has no real performance gain over the standard operators. Pipe seems more about modular testing convenience.

SyntaxSavvy -

That’s correct, but it's still neater in many cases. SQL is also an option, though complex operations can really bog you down.

Answered By FastFingers88 On

Honestly, just practice typing faster! If that’s the main bottleneck, it could help. Plus, SQL queries can be executed similarly and they might not save you much time either.

DataWizard42 -

True, but polars.sql is just like you said—SQL can get convoluted with long, complex statements.

Answered By QuickTyp3r On

I get what you mean! I wasn’t a fan of pandas' query and eval myself. But hey, if you're typing pl.col a lot, why not import and alias it as 'c'? Makes things quicker since you can use c('foo') instead of pl.col('foo'). Plus, there's an attribute access option where you can use pl.col.foo for column names that fit that format. It’s slightly less typing, at least!

LinguisticGamer -

Haha, I know that feeling! I often keep using "from polars import col, lit" in all my notebooks. A template could save time!

DataWizard42 -

Thanks! I’ll definitely try aliasing. I’m still not sold on pandas' eval; I’ve seen some precision issues in functions like power or log. Query is powerful for filtering, but does it have any performance downsides?

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.