Skip to content

Add Nemotron 3 Nano Omni Document Intelligence cookbook#169

Merged
marcromeyn merged 1 commit intoNVIDIA-NeMo:mainfrom
chiachihchen:chiachihc/doc-intelligence-with-parse
Apr 28, 2026
Merged

Add Nemotron 3 Nano Omni Document Intelligence cookbook#169
marcromeyn merged 1 commit intoNVIDIA-NeMo:mainfrom
chiachihchen:chiachihc/doc-intelligence-with-parse

Conversation

@chiachihchen
Copy link
Copy Markdown
Contributor

Summary

Add a self-contained, no-GPU Document Intelligence cookbook for Nemotron 3 Nano Omni that pairs it with Nemotron Parse on the hosted catalog endpoint (https://integrate.api.nvidia.com/v1):

  • Layout extraction -- Parse turns each PDF page into a typed block tree (titles, sections, tables, picture bboxes).
  • Picture transcription -- Nano Omni in Instruct mode (temperature=0.2, top_k=1) classifies + transcribes each cropped figure.
  • Reading-order document -- text + transcriptions are stitched back into one Markdown bundle per PDF.
  • Multi-page reasoning -- Nano Omni in Thinking mode (enable_thinking=True, temperature=0.6, top_p=0.95) answers free-form questions over the assembled documents.

Runs end-to-end on a free NVIDIA_API_KEY from build.nvidia.com -- no GPU, no Docker, no local model weights required.

Files

Under usage-cookbook/Nemotron-3-Nano-Omni/doc-intelligence-with-parse/:

  • doc_intelligence_cookbook.ipynb -- 20-cell notebook with no-GPU intro callout, uv-aware install (with pip fallback), hard-fail key check, and chat_template_kwargs={"enable_thinking": True} for the QA call. Self-downloads the four demo PDFs from the public yubo2333/MMLongBench-Doc Hugging Face dataset on first run.
  • README.md -- model overview, requirements, uv-based quick start, NIM Deploy-tab links, troubleshooting.
  • .gitignore -- auto-downloaded PDFs, output artefacts, .env, .ipynb_checkpoints/.

Test plan

  • All 8 code cells execute end-to-end against the catalog endpoint (~40 s, 0 errors).
  • Figure parity: 4 PNG Parse layout overlays (cell 12), 5 PNG picture crops + 15 markdown Specialist transcripts (cell 14), 4 markdown QA results (cell 18), 1 inlined pipeline-flowchart PNG in markdown source.
  • Both Deploy URLs return HTTP 200 on build.nvidia.com: Parse and Nano Omni.
  • Cookbook subfolder is fully self-contained -- no external assets required at push time.

Self-contained notebook pairing Nemotron Parse with Nemotron 3 Nano
Omni on the hosted catalog endpoint
(https://integrate.api.nvidia.com/v1) for an end-to-end document AI
pipeline: layout extraction, per-picture transcription, and multi-page
reasoning. Runs on a free NVIDIA_API_KEY -- no GPU, no Docker, no
local model weights required.

Files added under
usage-cookbook/Nemotron-3-Nano-Omni/doc-intelligence-with-parse/:

- doc_intelligence_cookbook.ipynb (20 cells, end-to-end runnable)
- README.md (model overview, requirements, uv-based quick start)
- .gitignore (auto-downloaded PDFs and per-page artefacts)

Signed-off-by: Chia-Chih Chen <chiachihc@nvidia.com>
@chiachihchen chiachihchen force-pushed the chiachihc/doc-intelligence-with-parse branch from b84abbb to 81fd979 Compare April 28, 2026 03:13
@chiachihchen chiachihchen requested a review from marcromeyn April 28, 2026 03:17
@marcromeyn marcromeyn merged commit a15a89a into NVIDIA-NeMo:main Apr 28, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants