Titbits #3
thorwhalen
started this conversation in
General
Replies: 1 comment
-
Grouping by key and finding duplicatesThe following function groups values by their key, then returns a dict with the sub-dict accumulated, filtered by a value condition: from collections import defaultdict
import operator
import functools
def groupby_key(pairs, *, val_filt=lambda x: True):
d = defaultdict(list)
for key, value in pairs:
d[key].append(value)
return {k: v for k, v in d.items() if val_filt(v)}With this, we can make a values_of_duplicate_keys = functools.partial(groupby_key, val_filt = lambda x: len(x) > 1)
kv_pairs = [(1, 2), (1, 3), (2, 3), (3, 4), (3, 5), (3, 6)]
assert values_of_duplicate_keys(kv_pairs) == {1: [2, 3], 3: [4, 5, 6]}One annoying problem is that because we use a filter that is a lambda function, our import pickle
deserialized_func = pickle.loads(pickle.dumps(values_of_duplicate_keys))
# PicklingError: Can't pickle <function <lambda> at 0x12b371630>: attribute lookup <lambda> on __main__ failed
To solve this (if necessary!) we can use a trick mentioned in my [pickle(pickle, pickle)](https://medium.com/@thorwhalen1/partial-partial-partial-f90396901362) medium article:
```python
from dol import Pipe # just a function that composes functions (easy to make your own)
greater_than_1 = functools.partial(operator.lt, 1)
# verify the function works
assert greater_than_1(2)
assert not greater_than_1(1)
length_greater_than_1 = Pipe(len, greater_than_1)
values_of_duplicate_keys = functools.partial(groupby_key, filt = length_greater_than_1)
values_of_duplicate_keys(kv_pairs)
# we can serialize this:
import pickle
deserialized_func = pickle.loads(pickle.dumps(values_of_duplicate_keys))
# and it works
assert deserialized_func(kv_pairs) == {1: [2, 3], 3: [4, 5, 6]} |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Here's to accumulate some ideas on titbits, before they make it to the actual package.
Beta Was this translation helpful? Give feedback.
All reactions