Skip to content

Recursion limit reached when threading 20 newsgroups dataset #11

@rth

Description

@rth

When applying jwzthreading on the 20 newsgroup dataset, we get an error about the maximum recursion limit being reached in hashing,

Traceback (most recent call last):
  File "/home/rth/src/jwzthreading/jwzthreading/jwzthreading.py", line 71, in __hash__
    return hash(tuple(sorted(self.items())) + (self.parent,))
  File "/home/rth/src/jwzthreading/jwzthreading/jwzthreading.py", line 71, in __hash__
    return hash(tuple(sorted(self.items())) + (self.parent,))
  File "/home/rth/src/jwzthreading/jwzthreading/jwzthreading.py", line 71, in __hash__
    return hash(tuple(sorted(self.items())) + (self.parent,))
  [Previous line repeated 996 more times]
RecursionError: maximum recursion depth exceeded while calling a Python object

The script to reproduce used on 20_newsgroups.tar.gz dataset can be found below,

Details
import sys
from glob import glob
from email.parser import Parser

from jwzthreading import (Message, thread, print_container,
                          sort_threads)
from tqdm import tqdm

sys.setrecursionlimit(21000)

msglist = []
for path in tqdm(glob('20_newsgroups/*/*')):
    with open(path, 'rt', encoding='latin1') as fh:
        msg = Parser().parsestr(fh.read(), headersonly=True)
        msglist.append(msg)


threads = thread([Message(el, message_idx=idx)
                  for idx, el in enumerate(msglist)],
                 group_by_subject=False)

threads = sort_threads(threads, key='subject', missing='Z')

for container in threads[:20]:
    print_container(container)

20 newsgroup dataset has ~20000 messages, and the default recursion limit is 1000. The only reason this could be happening is that it finds a thread with more than 1000 emails.

Looking for a fix..

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions