Skip to content

Conversation

Mr0grog
Copy link
Contributor

@Mr0grog Mr0grog commented Feb 7, 2025

Sometimes an iterable's __len__ is incorrect, which can lead to bad output from partition_all. This change ensures partition_all checks for this case and raise an exception (when it previously output bad data) so the user can more easily find the invalid iterable and fix it.

Fixes #602. /cc @eriknw

Sometimes an iterable's `__len__` is incorrect, which can lead to bad output from `partition_all`. We now raise an exception (when we previously output bad data) in these cases so the user can more easily find the invalid iterable and fix it. Fixes pytoolz#602.
@Mr0grog
Copy link
Contributor Author

Mr0grog commented Sep 14, 2025

Hi, just checking back in on this! Is there anything that needs to happen to move it along?

@eriknw
Copy link
Member

eriknw commented Oct 16, 2025

LGTM, thanks @Mr0grog! This PR behaves as I expect.

I still think it's reasonable to expect __len__ to be accurate, so throwing an exception here is the correct behavior imho to help expose such mistakes (and I like the choice of LookupError over IndexError). Notably, I think it would be best if __len__ in tqdm were fixed (which I suspect is doable).

Apologies for the slow review, but I really appreciate the attention to detail you put into this @Mr0grog! Thanks for helping make toolz better ❤️

Merging.

@eriknw eriknw merged commit 5a7e078 into pytoolz:master Oct 16, 2025
1 check passed
@eriknw
Copy link
Member

eriknw commented Oct 16, 2025

Also, I find the branch name of this PR amusing: Mr0grog:602-errors-are-no-fun-but-they-are-more-fun-than-bad-data 😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

partition_all outputs padding tokens on sequences with inaccurate __len__

2 participants