Modernize Sphinx docs toolchain, add RTD build, and expand API docs#1089
Open
adcroft wants to merge 10 commits intoNOAA-GFDL:dev/gfdlfrom
Open
Modernize Sphinx docs toolchain, add RTD build, and expand API docs#1089adcroft wants to merge 10 commits intoNOAA-GFDL:dev/gfdlfrom
adcroft wants to merge 10 commits intoNOAA-GFDL:dev/gfdlfrom
Conversation
This is piece 1 of the sphinx toolchain upgrade. The docs build
previously depended on four forks which haven't been updated since 2020.
The chain pinned the entire build to Sphinx 3.2.1, requiring a growing
list of transitive version ceilings (jinja2<3.1, sphinxcontrib_*<1.0.x,
alabaster<0.7.14, setuptools<82).
This commit handles the sphinxcontrib-autodoc_doxygen dependency. The
upstream (rmcgibbo) has been dormant since June 2021; the fork was
effectively a rewrite (~90% of xmlutils.py changed, both documenters
replaced, the whole domain switched from cpp to f). Monkey-patching
the upstream package was rejected because there was nothing stable to
patch against. Instead the fork's code is vendored in-tree at
docs/_ext/autodoc_doxygen/ so it can be debugged, blamed, and edited
like any other project source.
Changes vs the fork at tag 0.7.13:
- Renamed doxynamespace.rst template to doxymodule.rst and dropped
the unused C++ doxyclass.rst (DoxygenClassDocumenter was already
unregistered in the fork).
- Removed ~80 lines of dead code: commented-out pdb.set_trace lines,
the unregistered DoxygenClassDocumenter class, the dead
visit_ref_angus alternative, _import_by_name_original, and the
unused try-import of flint.
- Dropped Python 2 compatibility cruft: from __future__ imports,
from six import itervalues (replaced with dict.values()).
- Ported four Sphinx 3 -> 8 API changes:
* DoxygenAutosummary.get_items now passes self.bridge (the
DocumenterBridge) to documenter constructors instead of self,
so Documenter.__init__ finds directive.genopt.
* self.warn / self.directive.warn calls (removed in Sphinx 8)
replaced with sphinx.util.logging.getLogger(...).warning.
* Wrapped env.doc2path in os.fspath to silence the
RemovedInSphinx90Warning about str paths.
docs/conf.py: prepend _ext to sys.path and rename the extension in
the extensions list from sphinxcontrib.autodoc_doxygen to
autodoc_doxygen.
docs/conf.py also carries a monkey-patch for a parallel-build bug in
upstream VACUMM/sphinx-fortran's FortranDomain.merge_domaindata. The
upstream method references an undefined `outNames` (typo for
`ourNames`) and also unpacks the wrong tuple shape from modules and
objects dicts. With -j > 1 the merge silently fails and the f domain
ends up empty, losing every Fortran cross-reference. The patch is
heavily commented with a TODO to submit an upstream PR and drop the
workaround during piece 2 of the upgrade.
docs/requirements.txt: drop the sphinxcontrib-autodoc_doxygen git
dependency. The other three forks remain pending pieces 2, 3, 4.
Validation: built against a new venv.sph8 (Python 3.11 + Sphinx 8.2.3 +
sphinx-rtd-theme 3.1.0 + sphinxcontrib-bibtex 2.6.5 + upstream
VACUMM/sphinx-fortran master). `make html` produces 220 warnings vs
224 on the Sphinx 3 baseline. The f domain indexes 48 modules and
538 objects. Equation pages are effectively byte-identical (off only
by Sphinx 8's section-id hoist optimization and the pilcrow-as-CSS
theme change). Module pages are slightly larger because more
cross-references now resolve than in the baseline build.
Piece 2 of the sphinx toolchain upgrade (see Notes-sphinx-upgrade.md). The jr3cermak/sphinx-fortran@1.2.2 fork existed to add Sphinx 3 API compatibility in 2020. Upstream VACUMM/sphinx-fortran has continued evolving since then (Sphinx logging API migration, parallel read support, modern setuptools, removal of the future dependency) and now works with Sphinx 8 directly. Upstream has not cut a PyPI release past 1.1.1 so we pin to a specific master commit for reproducibility. The pinned commit ships a broken FortranDomain.merge_domaindata that loses every f-domain object in parallel builds; the monkey-patch in conf.py from the previous commit handles that. We will revisit that patch and the pin together once an upstream PR is merged (see the TODO(piece-2) marker in conf.py setup). Validation: rebuilt docs with the pinned commit installed into venv.sph8. `make html` produces 220 warnings (identical to the pre-swap sph8 build), the f domain indexes 48 modules and 538 objects, mom.html has 36 internal cross-reference links, and f-modindex.html is generated. No behavior change from the sph8 build validated in the previous commit.
Piece 3 of the docs toolchain upgrade plan in Notes-sphinx-upgrade.md. This commit removes the last need for the patched sphinx fork and collapses the long list of transitive version ceilings it forced. Changes: - requirements.txt: drop git+https://github.com/jr3cermak/sphinx.git @v3.2.1mom6.4. Pin sphinx>=8,<9 from PyPI. Drop the eight ceiling lines (jinja2<3.1, sphinxcontrib_*<1.0.x, alabaster<0.7.14, setuptools<82.0.0) and the workaround comment around sphinx-fortran's broken requirements.txt; those existed only because Sphinx 3.2.1 dragged them in. Add lxml as an explicit dep (it is required by the vendored _ext/autodoc_doxygen extension to parse Doxygen XML). Keep six because the pinned sphinx-fortran commit imports it at module load time. - conf.py: add a monkey-patch for sphinx.util.math.wrap_displaymath that replicates the functional changes in the jr3cermak fork. The patch detects parts containing `\begin{equation}`, `\begin{eqnarray}`, or `\begin{align}` and emits them verbatim (no outer `\begin{equation}\begin{split}...\end{split} \end{equation}` wrapping). Without this, MOM6's math-heavy sources produce nested LaTeX environments that pdflatex chokes on. The patch is heavily commented with the rationale and a TODO marker for a possible upstream contribution. Verified by unit-testing the function directly (plain math still wrapped, explicit-environment math passed through verbatim) and by inspecting the generated _build/latex/ MOM6.tex: 0 double-wrapped equation/eqnarray/align blocks, 100 + 18 + 17 explicit environments preserved as top-level blocks. - conf.py: tighten exclude_patterns. Sphinx walks the entire source tree by default, which previously caused it to descend into local virtualenvs (`venv.sph3/`, `venv.sph8/`, `venv-3.11/`, etc.) and pick up LICENSE.rst, README.rst, and stray autosummary template files from inside site-packages. Each of those generated a "document isn't included in any toctree" warning and a corresponding stray .html file in the build output. The new exclusions cover venv*, venv-*, _build.*, and _ext/*/templates. Final build: 90 HTML files (down from 148, the lost 58 were all venv junk and template scaffolding) and 122 warnings (down from 220, with the same project- internal warnings as before plus the venv noise removed). - Makefile: add the equation post-processing hook to the html target, gated on UPDATEHTMLEQS=Y. The jr3cermak/sphinx fork carried this as a patch to sphinx/cmd/build.py so it ran after every sphinx-build invocation. Now invoked from the Makefile so it works against stock upstream Sphinx. The existing nortd target already had the same hook with different arguments (-p APIs -b doxygen vs -p html -b sphinx). The forked sphinx-fortran fix and the upstream PR for the math wrapping change are deferred to a follow-up. Validation: built from a fresh venv (venv.sph8.fresh) created by `python3.11 -m venv` and `pip install -r requirements.txt`. The install resolves cleanly with no manual intervention. `make html` succeeds with 122 warnings, indexes 48 modules and 538 f-domain objects, generates f-modindex.html, and produces 36 internal cross- reference links on mom.html. `make html UPDATEHTMLEQS=Y` runs the post-processing script successfully. `make latex` produces 0 double-wrapped math environments.
Piece 4 of the docs toolchain upgrade plan in Notes-sphinx-upgrade.md. With this commit the dependencies in docs/requirements.txt are only on maintained upstream sources. flint (marshallward/flint, vendored as jr3cermak/flint at 0.0.1) was historically pulled into the docs build with the intent of patching up Doxygen's incomplete parsing of Fortran functions with `result()` clauses. The fork's sphinxcontrib-autodoc_doxygen imported it at module load time but never actually called it; the import was a no-op behind a `try: import flint; except: pass`. We removed that import as part of piece 1's dead-code cleanup, so the vendored extension no longer touches flint at all. Verified empirically: uninstalled flint from venv.sph8.fresh and rebuilt. The build is byte-identical to the previous one with flint installed (122 warnings, 48 f-modules, 538 f-objects, 36 internal cross-reference links on mom.html, f-modindex.html present, 90 html files). Nothing in the current pipeline depends on flint. The two surviving "flint" mentions in docs/_ext/autodoc_doxygen/ are both in comments describing the original intent, not live code; they are left in place as historical context. If a future enhancement wants to actually fix Doxygen's result() clause parsing, that should be a separate piece of work and would not necessarily resurrect flint specifically. docs/README.md is updated to drop the flint bullet from the requirements list and the flint row from the credits table. Other stale entries in that section (sphinxcontrib_autodox-doxygen, the sphinx and sphinx-fortran fork rows) remain and will need a broader README sweep in a follow-up.
Consolidates four small fixes that followed the Sphinx 8 upgrade: - Switch all Makefile targets from `sphinx-build -b` to `-M` for consistent parallel build support. - Update docs/README.md to match the modernized toolchain (remove references to the four sphinx* forks, update requirements and credits sections). - Fix crash in autodoc_doxygen visit_image on empty <image> elements (node.text is None when doxygen produces <image/> without caption text). - Add __pycache__ and *.pyc to docs/.gitignore so the vendored extension's bytecode does not appear in git status.
Consolidates four fixes for the Read the Docs build environment: - Override RTD's default sphinx runner with a build.jobs.build.html entry in .readthedocs.yml that runs sphinx-build -M html -j auto, since RTD's high-level sphinx: key has no parallelism option. - Make doxygen_xml path resolution robust to cwd changes: resolve relative paths against app.confdir rather than the ambient cwd, which differs between local builds and RTD. - Switch Makefile html target from -j 4 to -j auto to match RTD. - Enable html_static_path = ['_static'] in conf.py so custom CSS files (autodoxysource.css) are copied into the build output. Previously commented out, which meant app.add_css_file() added the <link> tag but the file was never deployed.
Two instances of the same class of bug in the vendored
autodoc_doxygen extension, both scanning the entire merged doxygen
XML tree (1230 files, 109 MB with programlisting) on every call:
1. scanNode used `node.xpath('//latexonly')` etc. In XPath, `//`
starts from the document root, not from `node`. Since the
extension merges all XML into one tree, every call scanned
the whole tree. Fix: `//` -> `.//` (descendant-of-self).
Recovery: 6m33s -> 48.8s wall clock at full input scale.
2. visit_ref used `get_doxygen_root().findall('.//*[@id=X]')` to
resolve each prose <ref> by linearly scanning every element in
the merged tree. With XML_PROGRAMLISTING=YES the tree tripled
in size, making this the dominant cost (250s of 911s serial).
Fix: lazy {id: element} dict built once on first use, O(1)
lookup thereafter. Recovery: 911s -> 676s serial.
Generate one HTML page per Fortran source file with syntax highlighting and clickable cross-references. Clicking an identifier in the source jumps to its API documentation page; each API entry gets a [source] link back to the source listing. Implementation: - Enable XML_PROGRAMLISTING in Doxyfile_rtd (standalone commit originally, now folded in). - New autodoxysource directive in _ext/autodoc_doxygen/ that walks doxygen's <programlisting> XML, emitting per-line anchors, CSS-classed highlight spans, and pending_xref nodes for identifiers. - Stub generator produces 329 :orphan: source pages under api/generated/source/. - [source] links added to DoxygenMethodDocumenter, DoxygenTypeDocumenter, and DoxygenModuleDocumenter. - CSS styled to match the sphinx_rtd_theme's code blocks (font stack, size, colors, line gutter). - Node count optimization: coalesce text runs in _walk_highlight (10-30 nodes/line -> 3-5), set support_smartquotes=False on the source container to skip the smartquotes transform. Combined with the visit_ref fix, total build time with the source browser is 115s parallel (vs 360s before optimization).
Consolidates five improvements to the rendered API pages: - Show Fortran type declarations for function/subroutine parameters and derived-type members. Types are rendered as inline code (e.g. ``real, intent(in), optional``) by looking up the <param> elements on the parent <memberdef>. - Move [source] links from the top of each entry to the bottom, so the description and parameters appear first. - Fix functions (as opposed to subroutines) not being registered in the sphinx-fortran domain. format_name() was prepending the return type (e.g. "real") which sphinx-fortran's f_sig_re regex cannot parse, leaving function entries without anchors. - Add Functions and Source Files index pages to the API Reference, with cross-reference links to module pages and source browser pages respectively. - Exclude upstream CVMix sources from doxygen input to avoid warnings from undocumented third-party code.
Multi-line math content (e.g. \begin{align}...\end{align}) from
doxygen XML was emitted as:
.. math:: \begin{align}
\mathbf{v}
= \mathbf{u} + ...
\end{align}
RST requires directive body content to be indented. The
continuation lines at column 0 were parsed as regular text by
docutils, producing garbled LaTeX output with "Runaway argument"
and "Missing $" errors — 164 LaTeX errors on the Notation page
alone.
Fix: in visit_formula, detect multi-line math text and emit it
as a properly indented block:
.. math::
\begin{align}
\mathbf{v}
= \mathbf{u} + ...
\end{align}
Single-line math is unchanged (stays on the .. math:: line).
After fix: `make latexpdf` LaTeX errors drop from 164 to 10.
The remaining 10 are unrelated (9 Unicode Greek characters in
source comments that pdflatex cannot render, 1 duplicate label).
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dev/gfdl #1089 +/- ##
============================================
- Coverage 38.76% 37.31% -1.45%
============================================
Files 274 263 -11
Lines 92265 91065 -1200
Branches 17823 17795 -28
============================================
- Hits 35765 33984 -1781
- Misses 49796 50357 +561
- Partials 6704 6724 +20 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Modernizes the docs build and adds new capabilities.
(sphinxcontrib-autodoc_doxygen, sphinx, sphinx-fortran, flint) that had pinned the build to Sphinx 3.2.1 and a growing
list of transitive version ceilings. End state: stock Sphinx 8 from PyPI, upstream VACUMM sphinx-fortran at a pinned
commit, autodoc_doxygen vendored in-tree at docs/_ext/ (expanding the original additions from jr3cermak). Two small
compatibility gaps are handled by commented monkey-patches in conf.py with TODO markers for upstream PRs: i) a math-wrap behavior from the sphinx fork and ii) a parallel-merge bug in upstream sphinx-fortran.
sphinx: key has no parallelism option); html_static_path is enabled so custom CSS actually deploys; doxygen_xml paths
are resolved against app.confdir so they work under RTD's cwd.
derived-type members; [source] links moved below the description; a bug that dropped functions (vs subroutines) from
the sphinx-fortran domain is fixed; new Functions and Source Files index pages; upstream CVMix sources excluded from
doxygen input.
bi-directionally linked with the API pages. Includes a multi-line LaTeX math indentation fix that dropped LaTeX errors
from 164 to 10 on the Notation page.
scanNode used // where it should have used .// (6m33s → 48.8s); visit_ref linearly scanned all elements to resolve
each , replaced with a lazy id→element dict (911s → 676s serial). End-to-end build with the new source browser is
115s parallel.
Commits are structured so each piece stands alone but this PR can be squashed. I have the necessary commits to create PRs for sphinx-fortran and sphinx which will later let us remove the patches in conf.py.
Testing:
mom.html)
navigate back
There's no way I could have fixed this in a short time: this PR was mostly debugged and implemented by AI.
Todo: