Skip to content

Conversation

@phenomen
Copy link

@phenomen phenomen commented Jan 22, 2026

Currently WebAssembly build does not support Markdown export. I made this function available in WAPI.

Summary by CodeRabbit

  • New Features

    • Added a Markdown export option to extract full document text as Markdown.
  • Bug Fixes

    • Empty extraction results are now treated as "no data" — empty outputs return no content and length is set to 0.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 22, 2026

📝 Walkthrough

Walkthrough

Adds a new exported Zig function zpdf_extract_all_markdown(handle: i32, out_len: *usize) ?[*]u8 that returns full document text in Markdown, and updates the C API wrapper to treat zero-length extraction results as null; manifest line-count metadata updated.

Changes

Cohort / File(s) Summary
New Markdown export (WASM API)
src/wapi.zig
Added export fn zpdf_extract_all_markdown(handle: i32, out_len: *usize) ?[*]u8 — validates handle, retrieves document, calls doc.extractAllMarkdown(wasm_allocator), sets out_len, and returns the buffer pointer; preserves existing error and ownership semantics; returns null for invalid handle or missing/empty buffer.
C API: empty-result handling
src/capi.zig
Updated wrapper to treat zero-length extraction as "no data": if allocated slice length is 0, free slice, set out_len to 0, and return null; otherwise return data pointer and length.
Manifest updates
.../manifest
Reported lines changed: +22/-0 (manifest metadata adjusted).

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant WASM as "WASM API\n(zpdf_extract_all_markdown)"
    participant DocStore as "Document Store\n(handle lookup)"
    participant Doc as "Document\n.extractAllMarkdown"
    participant Alloc as "WASM Allocator"

    Client->>WASM: call zpdf_extract_all_markdown(handle, out_len_ptr)
    WASM->>DocStore: validate handle & lookup document
    alt handle invalid or doc missing
        DocStore-->>WASM: null
        WASM-->>Client: return null (out_len = 0)
    else document found
        WASM->>Doc: call extractAllMarkdown(Alloc)
        Doc->>Alloc: request buffer
        Alloc-->>Doc: buffer ptr + length
        Doc-->>WASM: buffer ptr + length
        alt length == 0
            WASM->>Alloc: free buffer
            WASM-->>Client: return null (out_len = 0)
        else length > 0
            WASM-->>Client: return buffer ptr (out_len = length)
        end
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I nibble bytes and craft a sign,
I pull the markdown from each line,
Tiny hops through code and heap,
I bundle text and guard your sleep,
A rabbit's haul — concise and fine.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add Markdown export to WAPI' directly and clearly describes the main change: adding Markdown export functionality to the WebAssembly API, which aligns with the PR objectives and file modifications.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/wapi.zig`:
- Around line 125-135: The zpdf_extract_all_markdown function can return a
pointer for empty buffers which breaks parity with
zpdf_extract_all/zpdf_extract_page; after calling
doc.extractAllMarkdown(wasm_allocator) (in the zpdf_extract_all_markdown branch
that handles documents[idx]), check if result.len == 0 and if so set out_len.* =
0 and return null, otherwise set out_len.* = result.len and return result.ptr;
preserve the existing guard for invalid handle and the final return null for
missing documents.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/wapi.zig`:
- Around line 122-142: The comment in zpdf_extract_all_markdown incorrectly
references toOwnedSlice while the function actually uses doc.extractAllMarkdown;
update the comment to describe that extractAllMarkdown returns an allocated
slice and that we treat zero-length results as "no data" by setting out_len to 0
and returning null; then make the C API (capi.zig) implementation consistent
with zpdf_extract_all_markdown by adding the same empty-buffer check around
extractAllMarkdown's result (check result.len == 0, set out_len.* = 0 and return
null) so both WAPI and C API handle empty slices identically.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/capi.zig`:
- Around line 194-200: The zero-length buffer returned by
doc.extractAllMarkdown(c_allocator) is allocated but dropped when the function
returns null, causing a leak; before returning null set out_len.* = 0 and free
the allocated slice via the same allocator (call c_allocator.free(result.ptr) or
the appropriate allocator.free for result.ptr) so the buffer is released; ensure
behavior matches zpdf_extract_all_reading_order and zpdf_extract_page_markdown
(or alternatively return result.ptr even for zero-length like those functions)
and retain the existing out_len handling and null return semantics used by
zpdf_free_buffer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant