Skip to content

Drop docopt, Werkzeug, and obsolete deps#779

Merged
skearnes merged 21 commits intomainfrom
cleanup/docopt-and-dependencies
Apr 14, 2026
Merged

Drop docopt, Werkzeug, and obsolete deps#779
skearnes merged 21 commits intomainfrom
cleanup/docopt-and-dependencies

Conversation

@skearnes
Copy link
Copy Markdown
Member

@skearnes skearnes commented Apr 4, 2026

Summary

  • Remove docopt (unmaintained since 2019): migrate all CLI scripts + tests to argparse
  • Remove Werkzeug: its only use was security.safe_join in id_filename(); replaced with posixpath.join plus an isalnum() shard check
  • Fix deprecated setuptools.distutils.strtobool in rdkit_mappers.py (replaced with a plain env-var check)
  • Remove obsolete deps: glob2 (built-in since Python 3.10), .style.yapf (project uses ruff)
  • Inline _COMPOUND_STRUCTURAL_IDENTIFIERS in resolvers.py; drop the duplicate from updates.py (unused)
  • _run_updates in process_dataset.py takes explicit kwargs instead of the whole argparse Namespace
  • Drop stale # pylint: directives in parse_uspto.py (pylint was removed when .pylintrc was deleted)
  • Update copyright year in docs/conf.py to 2020–2026 and bump .readthedocs.yml to ubuntu-22.04

Test plan

  • pytest -vv --cov=ord_schema passes (especially *_test.py for migrated scripts)
  • python -m ord_schema.scripts.validate_dataset --help works
  • CI green

🤖 Generated with Claude Code

skearnes and others added 11 commits April 4, 2026 00:10
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Check dative bond types directly instead of comparing SMILES strings,
which vary across RDKit versions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace deprecated setuptools.distutils.strtobool with string comparison
- Migrate all CLI scripts from unmaintained docopt to argparse
- Remove obsolete glob2 dependency (Python 3.10+ has this built-in)
- Remove tensorflow from examples extras (restored per user request)
- Update ReadTheDocs from EOL ubuntu-20.04 to ubuntu-22.04
- Update Node.js from EOL 16 to 22 in CI
- Bump actions/setup-python v4->v5, setup-node v3->v4, codecov v1->v4, checkout v3->v4
- Update copyright year in docs to 2020-2026
- Remove unused .style.yapf (project uses black)
- Remove accidental platform-specific binary artifact (.tar.gz)
- Deduplicate _COMPOUND_STRUCTURAL_IDENTIFIERS into ord_schema.__init__

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Use ord_schema.COMPOUND_STRUCTURAL_IDENTIFIERS directly at the call site
in resolvers.py. The constant was never used in updates.py at all, so
remove the unused import ord_schema there too.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Base automatically changed from migrate-pyproject-toml to main April 9, 2026 00:50
skearnes and others added 2 commits April 8, 2026 20:59
Resolve docopt vs argparse conflicts by keeping argparse CLI; bump version to 0.3.100; drop duplicate test_dataset_missing_name; black-format merged scripts.

Co-authored-by: Cursor <noreply@cursor.com>
…pendencies

# Conflicts:
#	.github/workflows/publish.yml
#	.github/workflows/run_tests.yml
#	ord_schema/orm/rdkit_mappers.py
#	ord_schema/orm/scripts/add_datasets.py
#	ord_schema/resolvers.py
#	ord_schema/scripts/check_pb.py
#	ord_schema/scripts/parse_uspto.py
#	ord_schema/scripts/process_dataset.py
#	ord_schema/scripts/validate_dataset.py
@skearnes skearnes changed the title Dependency and tooling cleanup Drop docopt and obsolete deps Apr 12, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 12, 2026

Codecov Report

❌ Patch coverage is 76.15894% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.78%. Comparing base (303f943) to head (03f0a9f).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
ord_schema/scripts/parse_uspto.py 0.00% 19 Missing ⚠️
ord_schema/scripts/process_dataset.py 82.97% 4 Missing and 4 partials ⚠️
ord_schema/orm/scripts/add_datasets.py 86.36% 1 Missing and 2 partials ⚠️
ord_schema/message_helpers.py 75.00% 1 Missing and 1 partial ⚠️
ord_schema/scripts/build_dataset.py 93.33% 1 Missing ⚠️
ord_schema/scripts/check_pb.py 90.00% 1 Missing ⚠️
ord_schema/scripts/enumerate_dataset.py 93.75% 1 Missing ⚠️
ord_schema/scripts/validate_dataset.py 92.30% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #779      +/-   ##
==========================================
+ Coverage   71.31%   71.78%   +0.47%     
==========================================
  Files          23       23              
  Lines        2426     2492      +66     
  Branches      565      567       +2     
==========================================
+ Hits         1730     1789      +59     
- Misses        583      590       +7     
  Partials      113      113              
Files with missing lines Coverage Δ
ord_schema/orm/rdkit_mappers.py 93.75% <100.00%> (-0.08%) ⬇️
ord_schema/updates.py 95.00% <ø> (-0.13%) ⬇️
ord_schema/scripts/build_dataset.py 92.30% <93.33%> (+3.41%) ⬆️
ord_schema/scripts/check_pb.py 90.47% <90.00%> (+2.97%) ⬆️
ord_schema/scripts/enumerate_dataset.py 91.30% <93.75%> (+5.59%) ⬆️
ord_schema/scripts/validate_dataset.py 89.58% <92.30%> (+1.48%) ⬆️
ord_schema/message_helpers.py 88.05% <75.00%> (+0.11%) ⬆️
ord_schema/orm/scripts/add_datasets.py 72.16% <86.36%> (+4.30%) ⬆️
ord_schema/scripts/process_dataset.py 80.00% <82.97%> (+2.30%) ⬆️
ord_schema/scripts/parse_uspto.py 0.00% <0.00%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Only one caller remains after updates.py dropped its copy, so the
public constant in ord_schema.__init__ is no longer justified.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@skearnes skearnes marked this pull request as ready for review April 12, 2026 01:28
skearnes and others added 4 commits April 11, 2026 21:33
- process_dataset: drop redundant dest= on --no-validate; _run_updates
  now takes explicit kwargs instead of the whole argparse Namespace
- parse_uspto: move parse_args above main for consistency
- resolvers: restore original constant order

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pylint was removed when .pylintrc was deleted; these disable comments
have been dead weight since. Also simplify one no-op comprehension to
list() while touching the file.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The only use was werkzeug.security.safe_join in id_filename() to build
a shard path like data/{shard}/{basename}. Replaced with posixpath.join
plus an isalnum() check on the shard, which is stricter than safe_join
and matches our actual (hex-like) dataset ID shape.

Werkzeug remains available transitively via tensorboard for the
examples extra, but ord-schema no longer pulls in a web framework as
a required runtime dependency.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…cies

Fold the Werkzeug removal into the dependency cleanup PR; it's a tiny
change and in the same spirit as dropping docopt + glob2.

# Conflicts:
#	pyproject.toml
@skearnes skearnes mentioned this pull request Apr 12, 2026
2 tasks
@skearnes skearnes changed the title Drop docopt and obsolete deps Drop docopt, Werkzeug, and obsolete deps Apr 12, 2026
skearnes and others added 2 commits April 11, 2026 21:59
Mirrors what werkzeug.security.safe_join used to enforce: verify that
the constructed path is normalized and still inside the data/ root.
Defense-in-depth — today the shard isalnum check is the only way a
traversal string could slip past os.path.basename + the "ord" prefix
rule, but this guards against future regressions if either of those
rules gets loosened.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
After replacing werkzeug.security.safe_join with an explicit prefix +
isalnum shard guard, lock in the rejection behavior for the shapes the
guard is meant to catch:

- "notord-..." — wrong "ord" prefix.
- "ord-..foo" — shard becomes "..", the only known traversal vector
  that survives os.path.basename.
- "ord-.foo"  — shard becomes ".f", non-alphanumeric.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@skearnes skearnes merged commit b288e3e into main Apr 14, 2026
18 of 19 checks passed
@skearnes skearnes deleted the cleanup/docopt-and-dependencies branch April 14, 2026 23:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants