Skip to content

split testing libraries from run libraries in dependencies.yaml#5447

Open
jayavenkatesh19 wants to merge 2 commits intorapidsai:mainfrom
jayavenkatesh19:split-test-libraries
Open

split testing libraries from run libraries in dependencies.yaml#5447
jayavenkatesh19 wants to merge 2 commits intorapidsai:mainfrom
jayavenkatesh19:split-test-libraries

Conversation

@jayavenkatesh19
Copy link
Copy Markdown
Contributor

@jayavenkatesh19 jayavenkatesh19 commented Feb 27, 2026

Towards rapidsai/docker#835

Splits test_python_common dependency group into runtime and testing parts:

  • run_python_common: pandas and scipy
  • test_python_common: pytest and its dependents

Adds run_python_common to all 5 file-keys that contain test_python_common to keep output of dependency-file-generator unchanged.

Adds a new run_notebooks file-key that includes run_python_common but NOT test_python_common which gives rapidsai/docker a clean set of runtime-only notebook dependencies without the testing libraries.

@jayavenkatesh19 jayavenkatesh19 self-assigned this Feb 27, 2026
@jayavenkatesh19 jayavenkatesh19 requested a review from a team as a code owner February 27, 2026 03:19
@jayavenkatesh19 jayavenkatesh19 added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Feb 27, 2026
Copy link
Copy Markdown
Member

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this @jayavenkatesh19 and I'm so sorry it took so long to review!

I think the run_python_common name is a bit confusing, and I'm worried it'll be hard for contributors to understand which lists to update when (and that that'll lead to testing dependencies getting back into the images accidentally).

I put up an alternative proposal, happy to pair on it if you have questions.

It'd also be helpful for this PR to show the exact contents of the environment. You can get that by running the same rapids-dependency-file-generator call that the docker repo does (code link), like this:

rapids-dependency-file-generator \
    --file-key run_notebooks \
    --matrix "cuda=13.1;arch=x86_64;py=3.13" \
    --output conda

On the current state of this branch, that produces:

# This file is generated by `rapids-dependency-file-generator`.
# To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.
channels:
- rapidsai-nightly
- rapidsai
- conda-forge
dependencies:
- certifi
- cuda-version=13.1
- cugraph==26.4.*,>=0.0.0a0
- ipython
- libcugraph==26.4.*,>=0.0.0a0
- networkx>=2.5.1
- notebook>=0.5.0
- numpy>=1.23,<3.0
- packaging
- pandas
- pylibcugraph==26.4.*,>=0.0.0a0
- python-louvain
- python=3.13
- scikit-learn>=0.23.1
- scipy
name: run_notebooks_cuda-131_arch-x86_64_py-313

I can see there some outputs we don't need. scikit-learn isn't used at all in this project and packaging isn't imported in the notebooks, for example.

- test_python_cugraph
- depends_on_libcugraph
- depends_on_pylibcugraph
- depends_on_cugraph
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like creating a new top-level list called just run_notebooks, but think it's confusing to see anything named test_* showing up in this list, especially to see something called test_notebook showing up in run_notebooks when there is also a test_notebooks.

I have an alternative proposal that I think might make this a bit easier to maintain and understand, let me know what you think:

  1. leave test_python_common completely unchanged
  2. create a new list under dependencies: called run_notebooks. Add to that list only packages that are needed for the notebooks and aren't already runtime dependencies of cugraph, pylibcugraph, and libcugraph. Use YAML anchors to reference other things already in dependencies.yaml. Add a code comment above it explaining its purpose.
  3. make this top-level run_notebooks list here contain cuda_version, py_version, run_notebooks, and the depends_on_*cugraph groups
  4. pass --file-key test_notebooks --file-key run_notebooks here https://github.com/rapidsai/docker/blob/2637556074567056aa69a7cb1850ac844ef89748/context/notebooks.sh#L34 (multiple --file-key are allowed)

That way, we could fine-tune this to exactly the libraries that are needed and nothing else, and the purpose will be clear.

Copy link
Copy Markdown
Contributor Author

@jayavenkatesh19 jayavenkatesh19 Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb I'm pretty clear on everything until step 3.

In step 4 however, we would need a if-elif statement to use only the run_notebooks file-key for cugraph right?
If the test-notebooks key is added, then dependency lists from both keys would be merged, bringing in unwanted libraries like pytest again.

Something like:

    if [ -f "$REPO/dependencies.yaml" ]; then
        # Prefer run_notebooks (runtime-only deps) over test_notebooks
        # (which may include testing libraries like pytest).
        if yq -e '.files.run_notebooks' "$REPO/dependencies.yaml" >/dev/null 2>&1; then
            FILE_KEY="run_notebooks"
        elif yq -e '.files.test_notebooks' "$REPO/dependencies.yaml" >/dev/null 2>&1; then
            FILE_KEY="test_notebooks"
        else
            continue
        fi
        echo "Running dfg on $REPO (file-key: $FILE_KEY)"
        rapids-dependency-file-generator \
            --config "$REPO/dependencies.yaml" \
            --file-key "$FILE_KEY" \
            --matrix "cuda=${CUDA_VER%.*};arch=$(arch);py=${PYTHON_VER}" \
            --output conda >"/dependencies/${REPO}_notebooks_tests_dependencies.yaml"
    fi

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I wasn't thinking about the fact that test_notebooks is still the key in the other repos.

Yes we'll need different keys for different repos, but let's make the code much stricter than it is today and just hard-code that expectation directly.

Something like:

DFG_ARGS=(
  --config "./dependencies.yaml"
  --matrix "cuda=${CUDA_VER%.*};arch=$(arch);py=${PYTHON_VER}"
  --output conda
)
for repo in repos; do
   if [[ "${repo}" == "cugraph" ]]; then
       DFG_ARGS+=(--file-key "run_notebooks")
   else
       DFG_ARGS+=(--file-key "test_notebooks")
   fi
done

That'll fail with a big loud error (instead of silently passing) if any of our expectations aren't met, like:

  • repo doesn't have a dependencies.yaml (<-- this would also catch being in the wrong working directory)
  • repo doesn't have the exact keys we expect

We're only doing this testing for 3 repos there, let's go with specific and strict over generic and permissive, to give ourselves a better chance of catching bugs at the source.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that sounds good!

I've made the changes on both this PR and the rapidsai/docker#853 PR on Docker.

Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>
@jameslamb
Copy link
Copy Markdown
Member

/ok to test f2ad392

@jameslamb
Copy link
Copy Markdown
Member

I noticed CI hadn't started. If your initial commit(s) were not signed, I think that might mean that for the rest of this PR we need to /ok to test, even though you force-pushed to rewrite the history?

@jayavenkatesh19
Copy link
Copy Markdown
Contributor Author

The PR run triggered by the manual /ok to test was immediately cancelled too. Should I push an empty commit and retry?

Copy link
Copy Markdown
Member

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good! Left one more set of comments on a few things to check.

Once those are addressed, let's try testing this in docker CI. On rapidsai/docker#853, change the cloning logic (https://github.com/rapidsai/docker/blob/2637556074567056aa69a7cb1850ac844ef89748/context/notebooks.sh#L17) to use this PR branch from your fork.

If we see CI pass there and things like pytest no longer showing up in logs, we could have high confidence that this is working.

Comment on lines +517 to +519
# Runtime dependencies for included cugraph notebooks, used in the
# rapidsai/notebooks Docker image. Only lists packages not already
# provided by cugraph, pylibcugraph, or libcugraph.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Runtime dependencies for included cugraph notebooks, used in the
# rapidsai/notebooks Docker image. Only lists packages not already
# provided by cugraph, pylibcugraph, or libcugraph.
# Runtime dependencies for cuGraph notebooks tested in CI.
# Also used in the rapidsai/notebooks Docker image.
# Only lists packages that are directly imported in notebooks.

I might not have been clear in my previous comments, sorry. This should be anything that's directly imported in the notebooks and isn't part of the standard library.

- *ipython
- *networkx
- *notebook
- *scipy
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! This is exactly what I was envisioning.

Looking at the output of this:

git grep 'import' notebooks/

And the subset of notebooks actually tested in rapidsai/docker CI:

cugraph/algorithms/centrality/Betweenness.ipynb
cugraph/algorithms/centrality/Centrality.ipynb
cugraph/algorithms/centrality/Degree.ipynb
cugraph/algorithms/centrality/Eigenvector.ipynb
cugraph/algorithms/centrality/Katz.ipynb
cugraph/algorithms/community/Community-Clustering.ipynb
cugraph/algorithms/community/ECG.ipynb
cugraph/algorithms/community/Induced-Subgraph.ipynb
cugraph/algorithms/community/Louvain.ipynb
cugraph/algorithms/community/Spectral-Clustering.ipynb
cugraph/algorithms/community/Triangle-Counting.ipynb
cugraph/algorithms/community/ktruss.ipynb
cugraph/algorithms/components/ConnectedComponents.ipynb
cugraph/algorithms/cores/core-number.ipynb
cugraph/algorithms/cores/kcore.ipynb
cugraph/algorithms/link_analysis/HITS.ipynb
cugraph/algorithms/link_analysis/Pagerank.ipynb
cugraph/algorithms/link_prediction/Jaccard-Similarity.ipynb
cugraph/algorithms/link_prediction/Overlap-Similarity.ipynb
cugraph/algorithms/link_prediction/Sorensen_coefficient.ipynb
cugraph/algorithms/link_prediction/similarity_combined.ipynb
cugraph/algorithms/sampling/RandomWalk.ipynb
cugraph/algorithms/structure/Renumber-2.ipynb
cugraph/algorithms/structure/Renumber.ipynb
cugraph/algorithms/structure/Symmetrize.ipynb
cugraph/algorithms/traversal/BFS.ipynb
cugraph/algorithms/traversal/SSSP.ipynb

(build link)

I think we need a few more entries here:

Note that notebooks/algorithms/layout/Force-Atlas2.ipynb, which requires some plotting libraries like cuxfilter and holoviews, is not tested there and so its dependencies can be skipped.

- cuda_version
- py_version
- run_notebooks
- depends_on_cugraph
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- depends_on_cugraph
- depends_on_cudf
- depends_on_cugraph

There are some direct imports of cudf in these notebooks. I think cudf should be added to this list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improvement / enhancement to an existing function non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants