Skip to content

ingest.py FAULT #17

@batot1

Description

@batot1
it@ai:~/python/private_chat_with_docs$ source /home/it/python/private_chat_with_docs/bin/activate
(private_chat_with_docs) it@ai:~/python/private_chat_with_docs$ python3 ingest.py 
/home/it/python/private_chat_with_docs/lib/python3.11/site-packages/torch/cuda/__init__.py:128: UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
Creating new vectorstore
Loading documents from source_documents
Loading new documents:   0%|                              | 0/6 [00:00<?, ?it/s]
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/usr/lib/python3.11/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/home/it/python/private_chat_with_docs/ingest.py", line 84, in load_single_document
    return loader.load()
           ^^^^^^^^^^^^^
  File "/home/it/python/private_chat_with_docs/lib/python3.11/site-packages/langchain/document_loaders/pdf.py", line 311, in load
    return parser.parse(blob)
           ^^^^^^^^^^^^^^^^^^
  File "/home/it/python/private_chat_with_docs/lib/python3.11/site-packages/langchain/document_loaders/base.py", line 95, in parse
    return list(self.lazy_parse(blob))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/it/python/private_chat_with_docs/lib/python3.11/site-packages/langchain/document_loaders/parsers/pdf.py", line 62, in lazy_parse
    yield from [
               ^
  File "/home/it/python/private_chat_with_docs/lib/python3.11/site-packages/langchain/document_loaders/parsers/pdf.py", line 62, in <listcomp>
    yield from [
               ^
  File "/home/it/python/private_chat_with_docs/lib/python3.11/site-packages/fitz/fitz.py", line 5781, in __getitem__
    return self.load_page(i)
           ^^^^^^^^^^^^^^^^^
  File "/home/it/python/private_chat_with_docs/lib/python3.11/site-packages/fitz/fitz.py", line 4166, in load_page
    val = _fitz.Document_load_page(self, page_id)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: cycle in page tree
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/it/python/private_chat_with_docs/ingest.py", line 161, in <module>
    main()
  File "/home/it/python/private_chat_with_docs/ingest.py", line 151, in main
    texts = process_documents()
            ^^^^^^^^^^^^^^^^^^^
  File "/home/it/python/private_chat_with_docs/ingest.py", line 113, in process_documents
    documents = load_documents(source_directory, ignored_files)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/it/python/private_chat_with_docs/ingest.py", line 102, in load_documents
    for i, docs in enumerate(pool.imap_unordered(load_single_document, filtered_files)):
  File "/usr/lib/python3.11/multiprocessing/pool.py", line 873, in next
    raise value
RuntimeError: cycle in page tree
(private_chat_with_docs) it@ai:~/python/private_chat_with_docs$ python3 requirements_check.py 
langchain is correctly installed at version 0.0.274
gpt4all is correctly installed at version 1.0.8
chromadb is correctly installed at version 0.4.7
llama-cpp-python is correctly installed at version 0.1.81
urllib3 is correctly installed at version 2.0.4
PyMuPDF is correctly installed at version 1.23.5
python-dotenv is correctly installed at version 1.0.0
unstructured is correctly installed at version 0.10.8
extract-msg is correctly installed at version 0.45.0
tabulate is correctly installed at version 0.9.0
pandoc is correctly installed at version 2.3
pypandoc is correctly installed at version 1.11
tqdm is correctly installed at version 4.66.1
sentence_transformers is correctly installed at version 2.2.2

(private_chat_with_docs) it@ai:~/python/private_chat_with_docs$ python --version
Python 3.11.2

$ ls -la source_documents/
total 46256
drwxr-xr-x 2 it it     4096 Jul 29 16:14  .
drwxr-xr-x 8 it it     4096 Jul 30 11:29  ..
-rw-r--r-- 1 it it 12814685 Jan 25  2024  linux_all-in-one.pdf
-rw-r--r-- 1 it it  2493430 Jan 25  2024 'Linux for Beginners by Jason Cannon .pdf'
-rw-r--r-- 1 it it  7022363 Jan 25  2024 'Linux Fundamentals-Paul Cobbaut.pdf'
-rw-r--r-- 1 it it  9346436 Jan 25  2024 'Linux - The Complete Reference.pdf'
-rw-r--r-- 1 it it  9346436 Jan 25  2024 'The Complete Reference Linux.pdf'
-rw-r--r-- 1 it it  6327685 Jan 25  2024  The.Linux.Command.Line.2nd.Edition.www.EBooksWorld.ir.pdf

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions