Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
3675 commits
Select commit Hold shift + click to select a range
d2d1a35
feat: implement live task status rendering with thread-safe output ha…
myhloli Mar 26, 2026
181277f
feat: enhance concurrent request handling with configurable limits
myhloli Mar 26, 2026
61248e2
Merge pull request #4662 from Niujunbo2002/master
myhloli Mar 26, 2026
3dc3e06
feat: implement local API server and integrate API client for task su…
myhloli Mar 26, 2026
a37a6b6
feat: enhance request concurrency management with status updates and …
myhloli Mar 26, 2026
31e309f
feat: improve local queue handling and status updates in concurrency …
myhloli Mar 26, 2026
3ca042d
feat: add task status snapshot and enhance queue handling in API client
myhloli Mar 26, 2026
cd67423
feat: add queued_ahead attribute to task status and update handling i…
myhloli Mar 26, 2026
ce00207
feat: add queued_ahead attribute to task status and update handling i…
myhloli Mar 26, 2026
0e1ea31
feat: refactor demo script for async API integration and enhance inpu…
myhloli Mar 26, 2026
fffccef
feat: enhance async task management and cleanup in gradio_app
myhloli Mar 26, 2026
fde1bf5
feat: unify threading lock implementation for model singletons
myhloli Mar 26, 2026
12aebcf
feat: implement autoscroll functionality for status box in gradio_app
myhloli Mar 26, 2026
ad4bde7
feat: extend task result timeout to 3600 seconds in api_client
myhloli Mar 26, 2026
ed530b6
Merge pull request #4663 from myhloli/dev
myhloli Mar 26, 2026
05648a7
feat: remove unused functions and clean up client and fast_api modules
myhloli Mar 26, 2026
1de4a48
feat: improve progress bar handling and exclude idle time in analysis…
myhloli Mar 26, 2026
b6834bd
feat: simplify level adjustment logic in llm_aided.py
myhloli Mar 26, 2026
bd46b08
Merge pull request #4665 from myhloli/dev
myhloli Mar 26, 2026
e0e9f02
feat: add router module and integrate with API client for task manage…
myhloli Mar 27, 2026
ad555cd
feat: adjust batch ratio based on increased GPU memory thresholds
myhloli Mar 27, 2026
6cdd35c
feat: enhance model download functionality with temporary source hand…
myhloli Mar 27, 2026
6166491
feat: implement task stem normalization and improve filename handling…
myhloli Mar 27, 2026
dfcd9e1
feat: enhance error handling for max_concurrent_requests and normaliz…
myhloli Mar 28, 2026
7d7e8d8
Merge branch 'opendatalab:dev' into dev
myhloli Mar 28, 2026
032be3a
feat: improve JSON response handling and error reporting in router
myhloli Mar 28, 2026
07c5ecd
Merge remote-tracking branch 'origin/dev' into dev
myhloli Mar 28, 2026
ee5fc21
Merge pull request #4666 from myhloli/dev
myhloli Mar 28, 2026
1d76575
feat: increase layout base batch size and add VRAM cleanup function
myhloli Mar 28, 2026
d1369ea
feat: adjust layout base batch size and improve dynamic batch sizing …
myhloli Mar 28, 2026
a7e43b2
feat: rename batch_radio to batch_ratio for consistency across modules
myhloli Mar 28, 2026
08582e6
feat: remove unused span handling functions to streamline span proces…
myhloli Mar 28, 2026
708ff45
feat: add maximum UTF-8 byte length for task stems and remove unused …
myhloli Mar 28, 2026
7ff22ec
feat: enhance PDF image loading by adding pdf_bytes parameter and ref…
myhloli Mar 28, 2026
bbe4194
feat: remove unused PDF image processing functions to simplify codebase
myhloli Mar 28, 2026
c2136a1
Merge pull request #4667 from myhloli/dev
myhloli Mar 28, 2026
2ccbf89
feat: update accuracy and memory requirements in README files
myhloli Mar 28, 2026
18d2606
feat: update license from Apache 2.0 to AGPLv3 in LICENSE.md and READ…
myhloli Mar 28, 2026
a82ec6d
feat: refactor local API server initialization and enhance CLI argume…
myhloli Mar 28, 2026
208b2c7
feat: update CLI documentation and environment variables for mineru a…
myhloli Mar 28, 2026
5593088
feat: update CLI documentation to include mineru-router and command-l…
myhloli Mar 28, 2026
eac9f2f
feat: enhance output_files.md with new content structure and file gen…
myhloli Mar 28, 2026
42c78c0
feat: update quick_usage.md to enhance document parsing instructions …
myhloli Mar 28, 2026
d8ec431
feat: update index.md to enhance document parsing instructions and cl…
myhloli Mar 28, 2026
e5936d9
feat: update changelog.md to document new features and compatibility …
myhloli Mar 28, 2026
744e418
feat: update output_files.md to clarify content list versioning and f…
myhloli Mar 28, 2026
397902e
feat: update advanced_cli_parameters.md to improve clarity on GPU usa…
myhloli Mar 28, 2026
fb2049f
feat: update quick_usage.md and README files to include mineru-router…
myhloli Mar 28, 2026
b5e1b81
feat: update Dockerfiles to install mineru version 3.0.0
myhloli Mar 28, 2026
e81b45f
feat: update license in pyproject.toml from Apache-2.0 to AGPL-3.0
myhloli Mar 28, 2026
43c21ec
Merge pull request #4668 from myhloli/dev
myhloli Mar 28, 2026
07b7a4d
Merge pull request #4669 from opendatalab/master
myhloli Mar 28, 2026
57b1148
feat: remove data parallelism entries from multiple documentation files
myhloli Mar 28, 2026
a992b24
feat: refactor router task result handling for improved response mana…
myhloli Mar 28, 2026
2cce44d
Merge pull request #4671 from myhloli/dev
myhloli Mar 28, 2026
ad4e28c
Merge pull request #4672 from opendatalab/dev
myhloli Mar 28, 2026
ae16d1d
Merge pull request #4670 from opendatalab/release-3.0.0
myhloli Mar 28, 2026
33e4fbd
Update version.py with new version
myhloli Mar 28, 2026
e54c67d
feat: update minimum hardware requirements in index.md for clarity an…
myhloli Mar 28, 2026
72601b3
feat: update minimum hardware requirements in index.md for clarity an…
myhloli Mar 28, 2026
520c61f
Merge pull request #4673 from myhloli/dev
myhloli Mar 28, 2026
5e51ab2
Merge pull request #4674 from opendatalab/master
myhloli Mar 28, 2026
71b9e9f
feat: refactor OCR processing to improve span handling and reduce cod…
myhloli Mar 29, 2026
2d4fa2c
Merge pull request #4675 from myhloli/dev
myhloli Mar 29, 2026
1f82300
Merge pull request #4676 from opendatalab/dev
myhloli Mar 29, 2026
a6b6d30
Update version.py with new version
myhloli Mar 29, 2026
264c594
Merge pull request #4677 from opendatalab/master
myhloli Mar 29, 2026
f913df4
feat: remove commented-out data-parallel-size option from compose.yam…
myhloli Mar 30, 2026
de77fc3
feat: add mineru-router service to compose.yaml and update docker_dep…
myhloli Mar 30, 2026
62054c9
feat: implement stdin shutdown watcher and enhance local API shutdown…
myhloli Mar 30, 2026
d977881
feat: simplify parsing logic by removing unnecessary serialization an…
myhloli Mar 30, 2026
c635892
feat: adjust max concurrent requests based on macOS environment detec…
myhloli Mar 30, 2026
016169c
feat: add backend dependency checks for hybrid clients and update doc…
myhloli Mar 30, 2026
b85d18d
feat: update hybrid-http-client description to clarify local computin…
myhloli Mar 30, 2026
9063a41
Merge pull request #4684 from myhloli/dev
myhloli Mar 30, 2026
66df0cd
feat: enhance language guessing by normalizing text for surrogate pairs
myhloli Mar 30, 2026
1cfa71b
Merge pull request #4685 from myhloli/dev
myhloli Mar 30, 2026
6a5a2ef
Merge pull request #4686 from opendatalab/dev
myhloli Mar 30, 2026
d0d2bf9
Update version.py with new version
myhloli Mar 30, 2026
de8ea31
Merge pull request #4687 from opendatalab/master
myhloli Mar 30, 2026
84d1384
feat: improve task status polling with timeout handling and logging
myhloli Mar 30, 2026
980670a
Merge remote-tracking branch 'origin/dev' into dev
myhloli Mar 30, 2026
2bda248
feat: implement custom PDF render executor for improved multiprocessi…
myhloli Mar 30, 2026
a6f250b
feat: enhance PDF rendering with persistent executor and recycling logic
myhloli Mar 30, 2026
fa5f8d6
Merge pull request #4688 from myhloli/dev
myhloli Mar 30, 2026
be675a5
Merge pull request #4689 from opendatalab/dev
myhloli Mar 30, 2026
ff2caf3
Update version.py with new version
myhloli Mar 30, 2026
5ff2e16
Merge pull request #4690 from opendatalab/master
myhloli Mar 30, 2026
c3d4595
feat: add VLM model preload support in FastAPI and router configurations
myhloli Mar 30, 2026
5a9f426
feat: implement VLM model preload functionality with CLI argument sup…
myhloli Mar 30, 2026
da89e91
feat: add local API readiness check for Gradio startup with VLM prelo…
myhloli Mar 30, 2026
8422cf2
feat: add configurable local API startup timeout for Gradio integration
myhloli Mar 30, 2026
ce3575e
feat: add --enable-vlm-preload option to CLI for VLM model preloading…
myhloli Mar 30, 2026
d2478d1
docs: update model source switching methods to remove CLI flag suppor…
myhloli Mar 30, 2026
2cd0a0c
Merge pull request #4693 from myhloli/dev
myhloli Mar 30, 2026
96e2de3
Merge pull request #4694 from opendatalab/dev
myhloli Mar 30, 2026
5869af3
Update version.py with new version
myhloli Mar 30, 2026
b9485f1
Merge pull request #4695 from opendatalab/master
myhloli Mar 30, 2026
3d508ab
feat: add albumentations dependency to pyproject.toml
myhloli Mar 31, 2026
39b903f
fix: update sys_platform identifier for Windows in pyproject.toml
myhloli Mar 31, 2026
29f7670
feat: add custom JSON schema for file upload in Swagger UI
myhloli Mar 31, 2026
93d5251
feat: use Annotated for request form parameters in parse_request_form
myhloli Mar 31, 2026
1ca160f
refactor: replace PaddingSameAsPaddleMaxPool2d with torch's MaxPool2d…
myhloli Mar 31, 2026
b583702
fix: improve shutdown handling for FastAPI child process on Windows
myhloli Mar 31, 2026
11a9a94
Merge pull request #4703 from myhloli/dev
myhloli Mar 31, 2026
739c634
Merge pull request #4704 from opendatalab/dev
myhloli Mar 31, 2026
87a1404
feat: allow custom zip filename for response in FastAPI file handling
myhloli Mar 31, 2026
e976ca2
Merge pull request #4705 from myhloli/dev
myhloli Mar 31, 2026
2c65149
Merge pull request #4706 from opendatalab/dev
myhloli Mar 31, 2026
31f368a
Update version.py with new version
myhloli Mar 31, 2026
887758e
Merge pull request #4707 from opendatalab/master
myhloli Mar 31, 2026
69c39f9
feat: enhance paragraph text extraction to include inline content con…
myhloli Apr 1, 2026
40c5f10
feat: 添加解析xlsx
Sidney233 Apr 1, 2026
00e5a93
fix: correct paragraph text extraction by removing unnecessary stripping
myhloli Apr 1, 2026
d2c5a29
feat: add underscore thematic break escaping to Markdown processing
myhloli Apr 1, 2026
5f4d6a0
Merge pull request #4712 from myhloli/dev
myhloli Apr 1, 2026
ee21899
fix: correct logical condition for handling same-level list items in …
myhloli Apr 1, 2026
4cfebeb
fix: add logging for unexpected DOCX list states in _add_list_item
myhloli Apr 1, 2026
e3f8fb1
Merge pull request #4713 from myhloli/dev
myhloli Apr 1, 2026
bd7118a
Merge pull request #4714 from opendatalab/dev
myhloli Apr 1, 2026
a3b6547
fix: correct formatting of usage instructions in quick_usage.md
myhloli Apr 1, 2026
a97753c
Merge pull request #4715 from myhloli/dev
myhloli Apr 1, 2026
d18b7df
Update version.py with new version
myhloli Apr 1, 2026
13465ff
Merge pull request #4716 from opendatalab/master
myhloli Apr 1, 2026
39b62cc
fix: strip newline characters from paragraph text in office_middle_js…
myhloli Apr 1, 2026
1b478c2
Merge pull request #4717 from myhloli/dev
myhloli Apr 1, 2026
54b68d4
Merge pull request #4718 from opendatalab/dev
myhloli Apr 1, 2026
ede8d95
Update version.py with new version
myhloli Apr 1, 2026
d7011f4
Merge pull request #4719 from opendatalab/master
myhloli Apr 1, 2026
a25798b
docs: add detailed description of MinerU capabilities and integration…
myhloli Apr 2, 2026
9043944
Merge pull request #4723 from myhloli/dev
myhloli Apr 2, 2026
d59a692
Merge pull request #4724 from opendatalab/dev
myhloli Apr 2, 2026
f24b3bf
feat: 添加解析xlsx
Sidney233 Apr 2, 2026
194ff84
feat: 添加xlsx_analyze和pptx_analyze
Sidney233 Apr 2, 2026
719d1e0
Merge branch 'opendatalab:dev' into dev
Sidney233 Apr 2, 2026
ae314ad
feat: implement process management and shutdown mechanisms for MinerU
myhloli Apr 3, 2026
a2501cf
fix: improve shutdown handling for MinerU process management
myhloli Apr 3, 2026
a0dba76
Merge pull request #4731 from myhloli/dev
myhloli Apr 3, 2026
c6e0dbf
Merge pull request #4732 from opendatalab/dev
myhloli Apr 3, 2026
4be86ee
Update version.py with new version
myhloli Apr 3, 2026
82788a9
Merge pull request #4733 from opendatalab/master
myhloli Apr 3, 2026
c66848a
docs: update CLI tools documentation for mineru API usage
myhloli Apr 3, 2026
56f474d
Merge pull request #4734 from myhloli/dev
myhloli Apr 3, 2026
23f3bd5
feat: add function to identify disallowed control Unicode characters
myhloli Apr 7, 2026
576581d
Merge pull request #4743 from myhloli/dev
myhloli Apr 7, 2026
e83395a
feat: enhance table merging logic with improved row metrics and state…
myhloli Apr 7, 2026
0f43f17
feat: add aspect ratio checks and character count limits for PDF proc…
myhloli Apr 7, 2026
7f365ce
feat: optimize character processing in span_pre_proc.py for improved …
myhloli Apr 7, 2026
fede292
feat: adjust contrast threshold for OCR processing in span_pre_proc.py
myhloli Apr 7, 2026
f93e260
Merge pull request #4745 from myhloli/dev
myhloli Apr 7, 2026
3ad7e0b
Merge pull request #4746 from opendatalab/dev
myhloli Apr 7, 2026
e16b858
Update version.py with new version
myhloli Apr 7, 2026
c9f402a
Merge pull request #4747 from opendatalab/master
myhloli Apr 7, 2026
cdb1485
Merge branch 'opendatalab:dev' into dev
Sidney233 Apr 8, 2026
b23b849
Merge pull request #28 from myhloli/dev
myhloli Apr 8, 2026
20e798e
feat: 添加富文本解析,pptx添加多级列表
Sidney233 Apr 9, 2026
21f7700
Merge remote-tracking branch 'origin/dev' into dev
Sidney233 Apr 9, 2026
59da364
feat: 添加富文本解析,pptx添加多级列表
Sidney233 Apr 9, 2026
8cd60fc
feat: 添加富文本解析,pptx添加多级列表
Sidney233 Apr 9, 2026
710bdb3
fix: 修复pptx普通文本框缺失富文本的问题,xlsx公式锚点问题
Sidney233 Apr 10, 2026
7bbbce0
feat: update model paths and dependencies for MinerU2.5-Pro
myhloli Apr 10, 2026
6e81472
feat: enhance visual block handling with new grouping and mapping logic
myhloli Apr 10, 2026
903f545
feat: refactor visual block handling by moving utility functions to v…
myhloli Apr 10, 2026
b873489
feat: enhance block handling by introducing sub_type logic for images…
myhloli Apr 10, 2026
5a6caa1
feat: add support for base64 image handling in table HTML and normali…
myhloli Apr 10, 2026
4a80732
feat: implement base64 image handling utilities for HTML and table co…
myhloli Apr 10, 2026
937a15f
feat: add metadata copying for raw text blocks in hybrid and vlm magi…
myhloli Apr 10, 2026
5e66b5c
feat: enhance paragraph block processing with new utilities for mergi…
myhloli Apr 10, 2026
8fb3908
feat: add OCR detection utilities and refactor title handling in hybr…
myhloli Apr 11, 2026
6c42f0b
feat: reorganize utility imports and add shared module for cross-page…
myhloli Apr 11, 2026
a319dbd
fix: update mineru-vl-utils dependency version to 0.2.2
myhloli Apr 13, 2026
8924d92
feat: add support for chart blocks in hybrid model output and update …
myhloli Apr 13, 2026
a8d7e1f
feat: refactor visual block processing and enhance markdown rendering…
myhloli Apr 13, 2026
48eccbb
feat: refactor visual block processing and enhance markdown rendering…
myhloli Apr 13, 2026
47e025a
Merge pull request #29 from myhloli/add_2.5pro
myhloli Apr 13, 2026
3d2e2a5
feat: add markdown utility for escaping special characters and refact…
myhloli Apr 13, 2026
89363bb
feat: move utility functions to runtime_utils and update imports acco…
myhloli Apr 13, 2026
5098f7a
docs: add MinerU2.5-Pro reference to README
wangbinDL Apr 14, 2026
1a105c5
Merge pull request #4780 from wangbinDL/master
myhloli Apr 14, 2026
a20c3a6
feat: enhance numbering support in docx_converter with caching and st…
myhloli Apr 14, 2026
f7d49b5
feat: update mineru-vl-utils dependency version to 0.2.3
myhloli Apr 14, 2026
e148afa
feat: update license from AGPL-3.0 to Apache-2.0 in LICENSE.md and py…
myhloli Apr 14, 2026
9ff71ef
Merge pull request #4783 from opendatalab/master
myhloli Apr 14, 2026
e91bd9f
Merge branch 'opendatalab:dev' into dev
myhloli Apr 14, 2026
5f6e9bc
feat: update license from AGPLv3 to Apache-2.0 in README files
myhloli Apr 14, 2026
a56c0ad
Merge pull request #4784 from myhloli/dev
myhloli Apr 14, 2026
0725a2b
Merge pull request #4725 from Sidney233/dev
myhloli Apr 14, 2026
ef9f9f1
feat: update suffix handling to include office file types in gradio_a…
myhloli Apr 14, 2026
1065efb
feat: update suffix handling to include office file types in gradio_a…
myhloli Apr 14, 2026
2e9bb52
feat: enhance office file processing to support pptx and xlsx formats
myhloli Apr 14, 2026
e0b7062
feat: add support for including hidden sheets in xlsx conversion
myhloli Apr 14, 2026
5d29664
Merge pull request #4785 from myhloli/add_pptx_xlsx
myhloli Apr 14, 2026
89006a9
feat: remove commented code for clarity in model_output_to_middle_jso…
myhloli Apr 14, 2026
a2085a8
feat: enhance office file analysis and conversion to support image ex…
myhloli Apr 14, 2026
95ce346
feat: add vector image handling for office file conversions
myhloli Apr 14, 2026
b227426
Merge pull request #4787 from myhloli/add_pptx_xlsx
myhloli Apr 14, 2026
9db57d2
Merge pull request #4786 from opendatalab/add_pptx_xlsx
myhloli Apr 14, 2026
16124c3
Merge branch 'opendatalab:dev' into dev
myhloli Apr 14, 2026
e96faf2
feat: update documentation to include support for PPTX and XLSX file …
myhloli Apr 14, 2026
a8c55b2
feat: update header for MinerU 3 and enhance project description to i…
myhloli Apr 14, 2026
b10510b
feat: add support for page footnotes in PPTX conversion and enhance n…
myhloli Apr 15, 2026
925ed6a
feat: update links in header for MinerU2.5-Pro and arXiv references
myhloli Apr 15, 2026
9395138
feat: improve hyperlink handling and text formatting in PPTX conversion
myhloli Apr 15, 2026
fdf454a
feat: enhance picture handling in PPTX conversion with size and backg…
myhloli Apr 15, 2026
4aa69ad
feat: implement sorting for shape entries in PPTX conversion and enha…
myhloli Apr 15, 2026
ea1bc58
feat: normalize text handling and improve rich text segment processin…
myhloli Apr 15, 2026
625c23e
feat: add SVG support and enhance image handling in PPTX conversion
myhloli Apr 15, 2026
f8e2309
feat: enhance Markdown escaping for Office content in PPTX conversion
myhloli Apr 15, 2026
8bef400
Update mineru/resources/header.html
myhloli Apr 15, 2026
81daff0
feat: introduce shape transformation and flattening for improved PPTX…
myhloli Apr 15, 2026
c10d0ce
Merge remote-tracking branch 'origin/dev' into dev
myhloli Apr 15, 2026
3f62f6f
feat: add text block prefix escaping for improved Markdown handling
myhloli Apr 15, 2026
0f4f460
Merge pull request #4797 from myhloli/dev
myhloli Apr 15, 2026
125922e
feat: update PPTX_XYCUT_BETA for improved picture handling
myhloli Apr 15, 2026
63b1738
feat: enhance text block handling and promote titles in PPTX conversion
myhloli Apr 15, 2026
381823f
feat: add copyright notice and enhance XY-Cut++ algorithm documentation
myhloli Apr 15, 2026
e52d40b
feat: add copyright notice to multiple files
myhloli Apr 15, 2026
cb54ef0
feat: enhance font source retrieval in PPTX conversion
myhloli Apr 15, 2026
5c208f5
Merge pull request #4799 from myhloli/dev
myhloli Apr 15, 2026
36c6ee6
feat: implement public HTTP client policy and validation in FastAPI a…
myhloli Apr 16, 2026
96d3e27
feat: refine gap tolerance handling in Excel table extraction
myhloli Apr 16, 2026
bfc6acd
feat: refine gap tolerance handling in Excel table extraction
myhloli Apr 16, 2026
b622274
feat: improve gap tolerance selection by introducing preference margin
myhloli Apr 16, 2026
3a66a46
feat: enhance math content handling and LaTeX conversion in PPTX proc…
myhloli Apr 16, 2026
51354e4
feat: enhance Excel table extraction by improving chart handling and …
myhloli Apr 16, 2026
39765d2
feat: add semantic position filtering for Excel tables to enhance dat…
myhloli Apr 16, 2026
c8ca212
Fix PR #4805 review issues in XLSX, PPTX, and CLI policy code
myhloli Apr 16, 2026
efb74fd
Merge pull request #4805 from myhloli/dev
myhloli Apr 16, 2026
b55faab
feat: add cell merge handling and cross-page table merge support in t…
myhloli Apr 17, 2026
a94f6c3
feat: enhance cell merging logic by adding visual column mapping for …
myhloli Apr 17, 2026
2de3411
feat: update license information to include MinerU Open Source Licens…
myhloli Apr 17, 2026
730f1a7
feat: add license information section with MinerU Open Source License…
myhloli Apr 17, 2026
8044d95
feat: improve table merging logic by enhancing visual column mapping …
myhloli Apr 17, 2026
fafa251
Merge pull request #4808 from myhloli/dev
myhloli Apr 17, 2026
c1ed5ea
feat: update mineru-vl-utils dependency to version 0.2.4
myhloli Apr 17, 2026
0d031f8
feat: add release notes for version 3.1.0 highlighting licensing chan…
myhloli Apr 17, 2026
f51ea7a
feat: enhance table merging logic with semantic content checks and ro…
myhloli Apr 17, 2026
ea1b72d
feat: implement visual column mapping and segment calculation for tab…
myhloli Apr 17, 2026
3d6d66b
feat: update project metadata and dependencies in pyproject.toml
myhloli Apr 17, 2026
6ba8a24
chore: update release date for version 3.1.0 in README files
myhloli Apr 17, 2026
7559782
Merge pull request #4809 from myhloli/dev
myhloli Apr 17, 2026
44ca2de
feat: add badge for MinerU2.5 Pro technical report in documentation
myhloli Apr 17, 2026
1707640
Merge pull request #4811 from myhloli/dev
myhloli Apr 17, 2026
71bc034
chore: remove unused PUBLIC_HTTP_CLIENT_DISABLED_DETAIL import from m…
myhloli Apr 17, 2026
d348461
Merge pull request #4812 from myhloli/dev
myhloli Apr 17, 2026
69071d4
Merge pull request #4810 from opendatalab/release-3.1.0
myhloli Apr 17, 2026
2f078fc
feat: update LICENSE.md to clarify commercial use terms and attributi…
myhloli Apr 17, 2026
d9cd58a
Merge pull request #4813 from myhloli/dev
myhloli Apr 17, 2026
8067faa
Update version.py with new version
myhloli Apr 17, 2026
507941d
Merge pull request #4814 from opendatalab/master
myhloli Apr 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
5 changes: 5 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
*.js linguist-vendored
*.mjs linguist-vendored
*.html linguist-documentation
*.css linguist-vendored
*.scss linguist-vendored
88 changes: 73 additions & 15 deletions .github/ISSUE_TEMPLATE/bug_report.yml
Original file line number Diff line number Diff line change
@@ -1,19 +1,46 @@
name: Bug Report | 反馈 Bug
name: 🐛 Bug Report
description: Create a bug report for MinerU | MinerU 的 Bug 反馈
labels: bug

# We omit `title: "..."` so that the field defaults to blank. If we set it to
# empty string, Github seems to reject this .yml file.

body:
- type: markdown
attributes:
value: |
Thank you for submitting a MinerU 🐛 Bug Report! | 感谢您提交 MinerU 🐛 Bug 反馈!

- type: checkboxes
attributes:
label: 🔎 Search before asking | 提交之前请先搜索
description: >
Please search the MinerU [Readme](https://github.com/opendatalab/MinerU), [Issues](https://github.com/opendatalab/MinerU/issues) and [Discussions](https://github.com/opendatalab/MinerU/discussions) to see if a similar bug report already exists.
options:
- label: I have searched the MinerU [Readme](https://github.com/opendatalab/MinerU) and found no similar bug report.
required: true
- label: I have searched the MinerU [Issues](https://github.com/opendatalab/MinerU/issues) and found no similar bug report.
required: true
- label: I have searched the MinerU [Discussions](https://github.com/opendatalab/MinerU/discussions) and found no similar bug report.
required: true

- type: checkboxes
attributes:
label: 🤖 Consult the online AI assistant for assistance | 在线 AI 助手咨询
description: >
This [online AI assistant](https://deepwiki.com/opendatalab/MinerU) is specifically trained to help with MinerU and related topics! It's available 24/7 and ready to provide insights.
options:
- label: I have consulted the [online AI assistant](https://deepwiki.com/opendatalab/MinerU) but was unable to obtain a solution to the issue.
required: true

- type: textarea
id: description
attributes:
label: Description of the bug | 错误描述
description: |
A clear and concise description of the bug. | 简单描述遇到的问题

Provide console output with error messages and/or screenshots of the bug. | 请提供详细报错信息或者截图
placeholder: |
💡 ProTip! Include as much information as possible (screenshots, logs, tracebacks etc.) to receive the most helpful response.
validations:
required: true

Expand All @@ -24,11 +51,12 @@ body:

# Should not word-wrap this description here.
description: |
* Explain the steps required to reproduce the bug. | 说明复现此错误所需的步骤。
* Include required code snippets, example files, etc. | 包含必要的代码片段、示例文件等。
* Describe what you expected to happen (if not obvious). | 描述你期望发生的情况。
* If applicable, add screenshots to help explain the problem. | 添加截图以帮助解释问题。
* Include any other information that could be relevant, for example information about the Python environment. | 包括任何其他可能相关的信息。
If you have questions about the parsing results or encounter errors during execution: | 如对解析结果有疑问或在运行中出现报错等异常:
* Provide a minimal reproducible example. | 请提供一个最小可复现的demo。
* The demo should include the complete steps, code, and the PDF file to be parsed. | demo需要包含完整的操作步骤,代码,以及需要解析的PDF文件。
* When reporting parsing result anomalies and runtime errors, reproducible PDF files are essential. If the document is too large or confidential, you can print the problematic page(s) via the browser and submit the corresponding example file.
* 在反馈解析结果异常和运行时报错时,可复现的PDF文件是必不可少的,如文档过大或涉密,您可通过浏览器打印出出现问题的某一页或某几页再提交相应的示例文件。


For problems when building or installing MinerU: | 在构建或安装 MinerU 时遇到的问题:
* Give the **exact** build/install commands that were run. | 提供**确切**的构建/安装命令。
Expand All @@ -44,9 +72,9 @@ body:


- type: dropdown
id: os_name
id: os_mode
attributes:
label: Operating system | 操作系统
label: Operating System Mode | 操作系统类型
#multiple: true
options:
-
Expand All @@ -56,6 +84,22 @@ body:
validations:
required: true

- type: textarea
id: os_name_version
attributes:
label: Operating System Version| 操作系统版本
#multiple: true
description: |
* 如果您使用的是Linux系统,请提供Linux系统的**发行版名称**和**版本号**来帮助开发人员排查问题。
* If you are using a Linux system, please provide the Linux distribution and version number to help developers troubleshoot the issue.
* 如果您使用的是Windows或MacOS系统,请提供操作系统的**版本号**来帮助开发人员排查问题。
* If you are using a Windows or MacOS system, please provide the version number of the operating system to help developers troubleshoot the issue.
* 例如:Ubuntu 22.04, CentOS 7.9, MacOS 15.1, Windows 11
* For example: Ubuntu 22.04, CentOS 7.9, MacOS 15.1, Windows 11.

validations:
required: true

- type: dropdown
id: python_version
attributes:
Expand All @@ -64,23 +108,35 @@ body:
# Need quotes around `3.10` otherwise it is treated as a number and shows as `3.1`.
options:
-
- "3.13"
- "3.12"
- "3.11"
- "3.10"
- "3.9"
validations:
required: true

- type: dropdown
id: software_version
attributes:
label: Software version | 软件版本 (magic-pdf --version)
label: Software version | 软件版本 (mineru --version)
#multiple: false
options:
-
- "`<2.2.0`"
- "`2.2.x`"
- "`>=2.5`"
validations:
required: true

- type: dropdown
id: backend_name
attributes:
label: Backend name | 解析后端
#multiple: false
options:
-
- "0.6.x"
- "0.7.x"
- "0.8.x"
- "vlm"
- "pipeline"
validations:
required: true

Expand All @@ -93,5 +149,7 @@ body:
-
- cpu
- cuda
- mps
- npu
validations:
required: true
11 changes: 11 additions & 0 deletions .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
blank_issues_enabled: false
contact_links:
- name: 🙏 Q&A
url: https://github.com/opendatalab/MinerU/discussions/categories/q-a
about: Ask the community for help
- name: 💡 Feature requests and ideas
url: https://github.com/opendatalab/MinerU/discussions/categories/ideas
about: Share ideas for new features
- name: 🙌 Show and tell
url: https://github.com/opendatalab/MinerU/discussions/categories/show-and-tell
about: Show off something you've made
28 changes: 0 additions & 28 deletions .github/ISSUE_TEMPLATE/feature_request.md

This file was deleted.

8 changes: 4 additions & 4 deletions .github/workflows/cla.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,18 +18,18 @@ jobs:
steps:
- name: "CLA Assistant"
if: (github.event.comment.body == 'recheck' || github.event.comment.body == 'I have read the CLA Document and I hereby sign the CLA') || github.event_name == 'pull_request_target'
uses: contributor-assistant/github-action@v2.5.0
uses: contributor-assistant/github-action@v2.6.1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# the below token should have repo scope and must be manually added by you in the repository's secret
# This token is required only if you have configured to store the signatures in a remote repository/organization
PERSONAL_ACCESS_TOKEN: ${{ secrets.RELEASE_TOKEN }}
with:
path-to-signatures: 'signatures/version1/cla.json'
path-to-document: 'https://github.com/opendatalab/MinerU/blob/master/MinerU_CLA.md' # e.g. a CLA or a DCO document
# branch should not be protected
branch: 'master'
allowlist: myhloli,dt-yy,Focusshang,renpengli01,icecraft,drunkpig,wangbinDL,qiangqiang199,GDDGCZ518,papayalove,conghui,quyuan,LollipopsAndWine
branch: 'cla'
allowlist: myhloli,dt-yy,Focusshang,renpengli01,icecraft,drunkpig,wangbinDL,qiangqiang199,GDDGCZ518,papayalove,conghui,quyuan,LollipopsAndWine,Sidney233

# the followings are the optional inputs - If the optional inputs are not given, then default values will be taken
#remote-organization-name: enter the remote organization name where the signatures should be stored (Default is storing the signatures in the same repository)
Expand Down
74 changes: 27 additions & 47 deletions .github/workflows/cli.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
# For more information see: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python

name: mineru
name: mineru-cli-test
on:
push:
branches:
Expand All @@ -10,59 +10,39 @@ on:
paths-ignore:
- "cmds/**"
- "**.md"
pull_request:
branches:
- "master"
- "dev"
paths-ignore:
- "cmds/**"
- "**.md"
workflow_dispatch:
jobs:
cli-test:
runs-on: pdf
if: github.repository == 'opendatalab/MinerU'
runs-on: ubuntu-latest
timeout-minutes: 240
strategy:
fail-fast: true

steps:
- name: PDF cli
uses: actions/checkout@v3
with:
fetch-depth: 2
- name: PDF cli
uses: actions/checkout@v6
with:
ref: dev
fetch-depth: 2

- name: install&test
run: |
source activate mineru
conda env list
pip show coverage
# cd $GITHUB_WORKSPACE && sh tests/retry_env.sh
cd $GITHUB_WORKSPACE && python tests/clean_coverage.py
cd $GITHUB_WORKSPACE && coverage run -m pytest tests/unittest/ --cov=magic_pdf/ --cov-report html --cov-report term-missing
cd $GITHUB_WORKSPACE && python tests/get_coverage.py
cd $GITHUB_WORKSPACE && pytest -m P0 -s -v tests/test_cli/test_cli_sdk.py
- name: install uv
uses: astral-sh/setup-uv@v7

notify_to_feishu:
if: ${{ always() && !cancelled() && contains(needs.*.result, 'failure') && (github.ref_name == 'master') }}
needs: cli-test
runs-on: pdf
steps:
- name: get_actor
run: |
metion_list="dt-yy"
echo $GITHUB_ACTOR
if [[ $GITHUB_ACTOR == "drunkpig" ]]; then
metion_list="xuchao"
elif [[ $GITHUB_ACTOR == "myhloli" ]]; then
metion_list="zhaoxiaomeng"
elif [[ $GITHUB_ACTOR == "icecraft" ]]; then
metion_list="xurui1"
fi
echo $metion_list
echo "METIONS=$metion_list" >> "$GITHUB_ENV"
echo ${{ env.METIONS }}
- name: install&test
run: |
uv --version
uv venv --python 3.12
source .venv/bin/activate
uv pip install .[test]
cd $GITHUB_WORKSPACE && python tests/clean_coverage.py
cd $GITHUB_WORKSPACE && coverage run
cd $GITHUB_WORKSPACE && python tests/get_coverage.py

- name: notify
run: |
echo ${{ secrets.USER_ID }}
curl -X POST -H "Content-Type: application/json" -d '{"msg_type":"post","content":{"post":{"zh_cn":{"title":"'${{ github.repository }}' GitHubAction Failed","content":[[{"tag":"text","text":""},{"tag":"a","text":"Please click here for details ","href":"https://github.com/'${{ github.repository }}'/actions/runs/'${GITHUB_RUN_ID}'"},{"tag":"at","user_id":"'${{ secrets.USER_ID }}'"}]]}}}}' ${{ secrets.WEBHOOK_URL }}
# notify_to_feishu:
# if: ${{ always() && !cancelled() && contains(needs.*.result, 'failure')}}
# needs: cli-test
# runs-on: ubuntu-latest
# steps:
# - name: notify
# run: |
# curl -X POST -H "Content-Type: application/json" -d '{"msg_type":"post","content":{"post":{"zh_cn":{"title":"'${{ github.repository }}' GitHubAction Failed","content":[[{"tag":"text","text":""},{"tag":"a","text":"Please click here for details ","href":"https://github.com/'${{ github.repository }}'/actions/runs/'${GITHUB_RUN_ID}'"}]]}}}}' ${{ secrets.FEISHU_WEBHOOK_URL }}
55 changes: 0 additions & 55 deletions .github/workflows/daily.yml

This file was deleted.

Loading
Loading