🦆 Duckling

What it does

Duckling extracts textual content and image descriptions from common document formats (PDF, images, CSV/XLSX) and returns them as standardized LangChain Document objects for downstream indexing or retrieval-augmented generation.

How it works

Detects the input file format and dispatches to a dedicated converter.
PDF converter uses Docling for parsing and OCR, extracting text and
Drawing PDFs are handled by a drawing-focused pipeline that prompts
Images are encoded and described via an LLM prompt; tables are

Technologies

Docling / docling_core: PDF parsing, OCR and image artifact extraction.
LangChain / langchain_core: Standard Document model used as output.
LangGraph: Small state graph to route files by detected format.
OpenAI-compatible LLM (via langchain_openai.ChatOpenAI): image and drawing description prompts and refinement.
PyMuPDF (fitz) and OpenCV (cv2): page rendering and image handling.
Transformers (HuggingFace tokenizer): token-aware chunking for text.

Minimal example

Create and activate a virtual environment and install dependencies:

python -m venv .venv
& .\.venv\Scripts\Activate.ps1
poetry install

Ensure LLM credentials and any environment settings are available (for example, place keys in a .env file read by the app).
Example usage (Python):

from duckling.graph import DucklingGraph

graph = DucklingGraph()
state = graph.run(r"C:\path\to\file.pdf", namespace="my-namespace")
documents = state.get("documents", [])

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
duckling		duckling
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
main.py		main.py
pylintrc		pylintrc
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦆 Duckling

What it does

How it works

Technologies

Minimal example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🦆 Duckling

What it does

How it works

Technologies

Minimal example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages