Skip to content

Unable to ingest .docx files #15

@arehan

Description

@arehan

I've been following the steps in readme and the video tutorial. However, I'm unable to pass through successful ingestion of a docx file. It works fine with .pdf. Anything I need to look into?
This is what I get when I type in python3 ingest.py

`Creating new vectorstore
Loading documents from source_documents
Loading new documents: 0%| | 0/2 [00:02<?, ?it/s]
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
^^^^^^^^^^^^^^^^^^^
File "/Users/rehan.arif/Documents/Chat with docs/Ollama/2-ollama-privateGPT-chat-with-docs/ingest.py", line 84, in load_single_document
return loader.load()
^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/document_loaders/unstructured.py", line 86, in load
elements = self._get_elements()
^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/langchain/document_loaders/word_document.py", line 122, in _get_elements
from unstructured.partition.docx import partition_docx
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/site-packages/unstructured/partition/docx.py", line 6, in
import docx
ModuleNotFoundError: No module named 'docx'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/Users/rehan.arif/Documents/Chat with docs/Ollama/2-ollama-privateGPT-chat-with-docs/ingest.py", line 161, in
main()
File "/Users/rehan.arif/Documents/Chat with docs/Ollama/2-ollama-privateGPT-chat-with-docs/ingest.py", line 151, in main
texts = process_documents()
^^^^^^^^^^^^^^^^^^^
File "/Users/rehan.arif/Documents/Chat with docs/Ollama/2-ollama-privateGPT-chat-with-docs/ingest.py", line 113, in process_documents
documents = load_documents(source_directory, ignored_files)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/rehan.arif/Documents/Chat with docs/Ollama/2-ollama-privateGPT-chat-with-docs/ingest.py", line 102, in load_documents
for i, docs in enumerate(pool.imap_unordered(load_single_document, filtered_files)):
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/pool.py", line 873, in next
raise value
ModuleNotFoundError: No module named 'docx'`

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions