Recursion limit reached when threading 20 newsgroups dataset

When applying jwzthreading on the 20 newsgroup dataset, we get an error about the maximum recursion limit being reached in hashing,
```pytb
Traceback (most recent call last):
  File "/home/rth/src/jwzthreading/jwzthreading/jwzthreading.py", line 71, in __hash__
    return hash(tuple(sorted(self.items())) + (self.parent,))
  File "/home/rth/src/jwzthreading/jwzthreading/jwzthreading.py", line 71, in __hash__
    return hash(tuple(sorted(self.items())) + (self.parent,))
  File "/home/rth/src/jwzthreading/jwzthreading/jwzthreading.py", line 71, in __hash__
    return hash(tuple(sorted(self.items())) + (self.parent,))
  [Previous line repeated 996 more times]
RecursionError: maximum recursion depth exceeded while calling a Python object
```

The script to reproduce used on [20_newsgroups.tar.gz](https://github.com/FreeDiscovery/jwzthreading/files/2215037/20_newsgroups.tar.gz) dataset can be found below,

<details>

```py
import sys
from glob import glob
from email.parser import Parser

from jwzthreading import (Message, thread, print_container,
                          sort_threads)
from tqdm import tqdm

sys.setrecursionlimit(21000)

msglist = []
for path in tqdm(glob('20_newsgroups/*/*')):
    with open(path, 'rt', encoding='latin1') as fh:
        msg = Parser().parsestr(fh.read(), headersonly=True)
        msglist.append(msg)


threads = thread([Message(el, message_idx=idx)
                  for idx, el in enumerate(msglist)],
                 group_by_subject=False)

threads = sort_threads(threads, key='subject', missing='Z')

for container in threads[:20]:
    print_container(container)
```
</details>

20 newsgroup dataset has ~20000 messages, and the default recursion limit is 1000. The only reason this could be happening is that it finds a thread with more than 1000 emails.

Looking for a fix..

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recursion limit reached when threading 20 newsgroups dataset #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Recursion limit reached when threading 20 newsgroups dataset #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions