Skip to content

Add scripts/test-downstream.sh for pre-release downstream smoke checks#35

Closed
arch-colony wants to merge 1 commit intomainfrom
feature/downstream-test-script
Closed

Add scripts/test-downstream.sh for pre-release downstream smoke checks#35
arch-colony wants to merge 1 commit intomainfrom
feature/downstream-test-script

Conversation

@arch-colony
Copy link
Copy Markdown
Collaborator

Why

The v1.7.0 → v1.7.1 fiasco happened because the SDK's own unit tests passed but every downstream framework integration broke under strict mypy. The unit tests use raw dict access and never tripped the dict | Model union; downstream tools.py files do, because they process responses as dicts on behalf of LLMs.

This script catches that class of regression by actually running each downstream's test suite against a wheel built from the local SDK source, in an isolated venv per repo. It would have caught the v1.7.0 regression locally before tagging.

Usage

```bash
./scripts/test-downstream.sh # all known repos
./scripts/test-downstream.sh langchain-colony # one repo
COLONY_DOWNSTREAM_DIR=~/code ./scripts/test-downstream.sh
```

Repos auto-discovered in $COLONY_DOWNSTREAM_DIR, ../<repo>, /tmp/<repo> in that order. Missing repos skipped with clear messages.

Behaviour

  • Builds the wheel once via python -m build
  • Per repo: creates a fresh venv (uv if available, else python -m venv), installs the downstream's [dev] extras, force-reinstalls the local SDK wheel on top, runs pytest (excluding integration tests), and runs mypy if installed
  • pytest failures are release blockers
  • mypy errors are advisory — downstream packages have their own type-stub noise from upstream deps (smolagents/langgraph etc.) that lack py.typed
  • Reports a pass/fail/skip summary and exits non-zero on any pytest failure

Verified locally

Repo Result
smolagents-colony 68 tests pass against local SDK 1.7.1
langchain-colony 377 tests pass against local SDK 1.7.1

Also updates RELEASING.md

Added as step 4 in the pre-release checklist, between integration tests (step 3) and version bump (step 5). Renumbered the rest of the steps. Step text explicitly references the v1.7.0 fiasco as the motivating example.

Test plan

  • Script is executable and shellcheck-syntax-valid
  • Builds wheel successfully
  • Creates isolated venvs per repo
  • Installs local wheel via force-reinstall
  • Runs pytest from each repo's directory
  • Reports failures clearly with red/green output (or plain when not a TTY)
  • Skips missing repos cleanly
  • Verified against smolagents-colony and langchain-colony

🤖 Generated with Claude Code

The v1.7.0 → v1.7.1 fiasco happened because the SDK's own unit tests
passed but every downstream framework integration broke under strict
mypy. The unit tests use raw dict access and never trip the
``dict | Model`` union; the downstream tools.py files do, because
they're written for users who pass a ColonyClient and process raw
responses.

This script catches that class of regression by actually running each
downstream's test suite against a wheel built from the local SDK
source, in an isolated venv per repo.

Usage:
  ./scripts/test-downstream.sh                       # all known repos
  ./scripts/test-downstream.sh langchain-colony      # one repo
  COLONY_DOWNSTREAM_DIR=~/code ./scripts/test-downstream.sh

Repos auto-discovered in $COLONY_DOWNSTREAM_DIR, ../<repo>, /tmp/<repo>
in that order. Missing repos are skipped with a clear message.

Behaviour:
- Builds the wheel once via `python -m build`
- Per repo: creates a fresh venv (uv if available, else python -m venv),
  installs the downstream's [dev] extras, force-reinstalls the local
  SDK wheel on top, runs pytest (excluding integration tests), and
  runs mypy if installed.
- pytest failures are release blockers. mypy errors are advisory
  (downstream packages have their own type-stub noise from upstream
  deps like smolagents/langgraph that lack py.typed).
- Reports a summary with pass/fail/skip counts and exits non-zero on
  any pytest failure.

Verified locally against smolagents-colony (68 tests) and
langchain-colony (377 tests) — both pass against this branch's SDK
source after the v1.7.1 type-annotation revert.

Also added as step 4 in RELEASING.md as a required pre-release check,
sandwiched between integration tests (step 3) and version bump (step 5).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 12, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@ColonistOne
Copy link
Copy Markdown
Collaborator

Closing — opened from the wrong account. Reopening as ColonistOne.

@jackparnell jackparnell deleted the feature/downstream-test-script branch April 12, 2026 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants