Skip to content

refactor(web): extract scitex.web into standalone scitex-web package#245

Open
ywatanabe1989 wants to merge 1 commit intodevelopfrom
feat/extract-scitex-web
Open

refactor(web): extract scitex.web into standalone scitex-web package#245
ywatanabe1989 wants to merge 1 commit intodevelopfrom
feat/extract-scitex-web

Conversation

@ywatanabe1989
Copy link
Copy Markdown
Owner

Summary

Extracts `scitex.web` into scitex-web v0.1.0.

  • Web scraping, PubMed search, URL summarization preserved.
  • `[web]` extra collapsed to `scitex-web[readability]>=0.1.0` (drops a long list of unrelated deps that had accumulated: matplotlib/scikit-learn/joblib/seaborn/anthropic/openai/groq/etc).
  • Decoupling: scitex.logging→stdlib logging; scitex.str.printc→inline ANSI; scitex.ai.GenAI→deferred import.
  • 14/23 tests pass (7 pre-existing upstream bs4-mocking failures unrelated to extraction; 2 skipped).

🤖 Generated with Claude Code

Web scraping (get_urls, get_image_urls, download_images), PubMed search,
URL summarization now live in the standalone scitex-web package
(https://github.com/ywatanabe1989/scitex-web).

scitex.web/__init__.py becomes a sys.modules alias.
The [web] extra is collapsed to depend on scitex-web[readability]>=0.1.0
(transitively pulls requests/aiohttp/bs4/tqdm + readability-lxml).

Decoupling notes (handled in scitex-web's initial commit):
- scitex.logging.getLogger → stdlib logging.getLogger
- scitex.str.printc → tiny inline ANSI helper (NO_COLOR + TTY-aware)
- scitex.ai.GenAI (used by summarize_url) → deferred import that raises
  a clear ImportError if the umbrella scitex isn't installed.

14/23 tests pass (7 pre-existing upstream bs4-mocking failures, unrelated to
extraction; 2 skipped).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant