Add Nemotron 3 Nano Omni Document Intelligence cookbook by chiachihchen · Pull Request #169 · NVIDIA-NeMo/Nemotron

chiachihchen · 2026-04-28T03:05:51Z

Summary

Add a self-contained, no-GPU Document Intelligence cookbook for Nemotron 3 Nano Omni that pairs it with Nemotron Parse on the hosted catalog endpoint (https://integrate.api.nvidia.com/v1):

Layout extraction -- Parse turns each PDF page into a typed block tree (titles, sections, tables, picture bboxes).
Picture transcription -- Nano Omni in Instruct mode (temperature=0.2, top_k=1) classifies + transcribes each cropped figure.
Reading-order document -- text + transcriptions are stitched back into one Markdown bundle per PDF.
Multi-page reasoning -- Nano Omni in Thinking mode (enable_thinking=True, temperature=0.6, top_p=0.95) answers free-form questions over the assembled documents.

Runs end-to-end on a free NVIDIA_API_KEY from build.nvidia.com -- no GPU, no Docker, no local model weights required.

Files

Under usage-cookbook/Nemotron-3-Nano-Omni/doc-intelligence-with-parse/:

doc_intelligence_cookbook.ipynb -- 20-cell notebook with no-GPU intro callout, uv-aware install (with pip fallback), hard-fail key check, and chat_template_kwargs={"enable_thinking": True} for the QA call. Self-downloads the four demo PDFs from the public yubo2333/MMLongBench-Doc Hugging Face dataset on first run.
README.md -- model overview, requirements, uv-based quick start, NIM Deploy-tab links, troubleshooting.
.gitignore -- auto-downloaded PDFs, output artefacts, .env, .ipynb_checkpoints/.

Test plan

All 8 code cells execute end-to-end against the catalog endpoint (~40 s, 0 errors).
Figure parity: 4 PNG Parse layout overlays (cell 12), 5 PNG picture crops + 15 markdown Specialist transcripts (cell 14), 4 markdown QA results (cell 18), 1 inlined pipeline-flowchart PNG in markdown source.
Both Deploy URLs return HTTP 200 on build.nvidia.com: Parse and Nano Omni.
Cookbook subfolder is fully self-contained -- no external assets required at push time.

Self-contained notebook pairing Nemotron Parse with Nemotron 3 Nano Omni on the hosted catalog endpoint (https://integrate.api.nvidia.com/v1) for an end-to-end document AI pipeline: layout extraction, per-picture transcription, and multi-page reasoning. Runs on a free NVIDIA_API_KEY -- no GPU, no Docker, no local model weights required. Files added under usage-cookbook/Nemotron-3-Nano-Omni/doc-intelligence-with-parse/: - doc_intelligence_cookbook.ipynb (20 cells, end-to-end runnable) - README.md (model overview, requirements, uv-based quick start) - .gitignore (auto-downloaded PDFs and per-page artefacts) Signed-off-by: Chia-Chih Chen <chiachihc@nvidia.com>

chiachihchen force-pushed the chiachihc/doc-intelligence-with-parse branch from b84abbb to 81fd979 Compare April 28, 2026 03:13

chiachihchen requested a review from marcromeyn April 28, 2026 03:17

marcromeyn approved these changes Apr 28, 2026

View reviewed changes

marcromeyn merged commit a15a89a into NVIDIA-NeMo:main Apr 28, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Nemotron 3 Nano Omni Document Intelligence cookbook#169

Add Nemotron 3 Nano Omni Document Intelligence cookbook#169
marcromeyn merged 1 commit intoNVIDIA-NeMo:mainfrom
chiachihchen:chiachihc/doc-intelligence-with-parse

chiachihchen commented Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chiachihchen commented Apr 28, 2026

Summary

Files

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants