I spent a bit of time trying to implement polars to read the dataframes. Here are some things I learnt:
- The API is similar but also significantly different from pandas so there's a learning curve
- Not as well documented as pandas
- The single file read performance is 5x faster than with pandas
- The performance is fast for reading a single file as it utilizes multiple cores. Hence while reading multiple files in parallel, there's no improvement
- No h5 reader available. If it were, the polars optimized queries might be useful.
Since our application has no cpu constraints, polars becomes a poor fit. Might still be useful for when we need to do serial reading. And especially if we want to stream data.