Add scripts/test-downstream.sh for pre-release downstream smoke checks#35
Closed
arch-colony wants to merge 1 commit intomainfrom
Closed
Add scripts/test-downstream.sh for pre-release downstream smoke checks#35arch-colony wants to merge 1 commit intomainfrom
arch-colony wants to merge 1 commit intomainfrom
Conversation
The v1.7.0 → v1.7.1 fiasco happened because the SDK's own unit tests passed but every downstream framework integration broke under strict mypy. The unit tests use raw dict access and never trip the ``dict | Model`` union; the downstream tools.py files do, because they're written for users who pass a ColonyClient and process raw responses. This script catches that class of regression by actually running each downstream's test suite against a wheel built from the local SDK source, in an isolated venv per repo. Usage: ./scripts/test-downstream.sh # all known repos ./scripts/test-downstream.sh langchain-colony # one repo COLONY_DOWNSTREAM_DIR=~/code ./scripts/test-downstream.sh Repos auto-discovered in $COLONY_DOWNSTREAM_DIR, ../<repo>, /tmp/<repo> in that order. Missing repos are skipped with a clear message. Behaviour: - Builds the wheel once via `python -m build` - Per repo: creates a fresh venv (uv if available, else python -m venv), installs the downstream's [dev] extras, force-reinstalls the local SDK wheel on top, runs pytest (excluding integration tests), and runs mypy if installed. - pytest failures are release blockers. mypy errors are advisory (downstream packages have their own type-stub noise from upstream deps like smolagents/langgraph that lack py.typed). - Reports a summary with pass/fail/skip counts and exits non-zero on any pytest failure. Verified locally against smolagents-colony (68 tests) and langchain-colony (377 tests) — both pass against this branch's SDK source after the v1.7.1 type-annotation revert. Also added as step 4 in RELEASING.md as a required pre-release check, sandwiched between integration tests (step 3) and version bump (step 5). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Collaborator
|
Closing — opened from the wrong account. Reopening as ColonistOne. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The v1.7.0 → v1.7.1 fiasco happened because the SDK's own unit tests passed but every downstream framework integration broke under strict mypy. The unit tests use raw dict access and never tripped the
dict | Modelunion; downstream tools.py files do, because they process responses as dicts on behalf of LLMs.This script catches that class of regression by actually running each downstream's test suite against a wheel built from the local SDK source, in an isolated venv per repo. It would have caught the v1.7.0 regression locally before tagging.
Usage
```bash
./scripts/test-downstream.sh # all known repos
./scripts/test-downstream.sh langchain-colony # one repo
COLONY_DOWNSTREAM_DIR=~/code ./scripts/test-downstream.sh
```
Repos auto-discovered in
$COLONY_DOWNSTREAM_DIR,../<repo>,/tmp/<repo>in that order. Missing repos skipped with clear messages.Behaviour
python -m builduvif available, elsepython -m venv), installs the downstream's[dev]extras, force-reinstalls the local SDK wheel on top, runs pytest (excluding integration tests), and runs mypy if installedpy.typedVerified locally
Also updates RELEASING.md
Added as step 4 in the pre-release checklist, between integration tests (step 3) and version bump (step 5). Renumbered the rest of the steps. Step text explicitly references the v1.7.0 fiasco as the motivating example.
Test plan
🤖 Generated with Claude Code