split testing libraries from run libraries in dependencies.yaml#5447
split testing libraries from run libraries in dependencies.yaml#5447jayavenkatesh19 wants to merge 2 commits intorapidsai:mainfrom
Conversation
There was a problem hiding this comment.
Thanks for looking into this @jayavenkatesh19 and I'm so sorry it took so long to review!
I think the run_python_common name is a bit confusing, and I'm worried it'll be hard for contributors to understand which lists to update when (and that that'll lead to testing dependencies getting back into the images accidentally).
I put up an alternative proposal, happy to pair on it if you have questions.
It'd also be helpful for this PR to show the exact contents of the environment. You can get that by running the same rapids-dependency-file-generator call that the docker repo does (code link), like this:
rapids-dependency-file-generator \
--file-key run_notebooks \
--matrix "cuda=13.1;arch=x86_64;py=3.13" \
--output condaOn the current state of this branch, that produces:
# This file is generated by `rapids-dependency-file-generator`.
# To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.
channels:
- rapidsai-nightly
- rapidsai
- conda-forge
dependencies:
- certifi
- cuda-version=13.1
- cugraph==26.4.*,>=0.0.0a0
- ipython
- libcugraph==26.4.*,>=0.0.0a0
- networkx>=2.5.1
- notebook>=0.5.0
- numpy>=1.23,<3.0
- packaging
- pandas
- pylibcugraph==26.4.*,>=0.0.0a0
- python-louvain
- python=3.13
- scikit-learn>=0.23.1
- scipy
name: run_notebooks_cuda-131_arch-x86_64_py-313
I can see there some outputs we don't need. scikit-learn isn't used at all in this project and packaging isn't imported in the notebooks, for example.
| - test_python_cugraph | ||
| - depends_on_libcugraph | ||
| - depends_on_pylibcugraph | ||
| - depends_on_cugraph |
There was a problem hiding this comment.
I do like creating a new top-level list called just run_notebooks, but think it's confusing to see anything named test_* showing up in this list, especially to see something called test_notebook showing up in run_notebooks when there is also a test_notebooks.
I have an alternative proposal that I think might make this a bit easier to maintain and understand, let me know what you think:
- leave
test_python_commoncompletely unchanged - create a new list under
dependencies:calledrun_notebooks. Add to that list only packages that are needed for the notebooks and aren't already runtime dependencies ofcugraph,pylibcugraph, andlibcugraph. Use YAML anchors to reference other things already independencies.yaml. Add a code comment above it explaining its purpose. - make this top-level
run_notebookslist here containcuda_version,py_version,run_notebooks, and thedepends_on_*cugraphgroups - pass
--file-key test_notebooks --file-key run_notebookshere https://github.com/rapidsai/docker/blob/2637556074567056aa69a7cb1850ac844ef89748/context/notebooks.sh#L34 (multiple--file-keyare allowed)
That way, we could fine-tune this to exactly the libraries that are needed and nothing else, and the purpose will be clear.
There was a problem hiding this comment.
@jameslamb I'm pretty clear on everything until step 3.
In step 4 however, we would need a if-elif statement to use only the run_notebooks file-key for cugraph right?
If the test-notebooks key is added, then dependency lists from both keys would be merged, bringing in unwanted libraries like pytest again.
Something like:
if [ -f "$REPO/dependencies.yaml" ]; then
# Prefer run_notebooks (runtime-only deps) over test_notebooks
# (which may include testing libraries like pytest).
if yq -e '.files.run_notebooks' "$REPO/dependencies.yaml" >/dev/null 2>&1; then
FILE_KEY="run_notebooks"
elif yq -e '.files.test_notebooks' "$REPO/dependencies.yaml" >/dev/null 2>&1; then
FILE_KEY="test_notebooks"
else
continue
fi
echo "Running dfg on $REPO (file-key: $FILE_KEY)"
rapids-dependency-file-generator \
--config "$REPO/dependencies.yaml" \
--file-key "$FILE_KEY" \
--matrix "cuda=${CUDA_VER%.*};arch=$(arch);py=${PYTHON_VER}" \
--output conda >"/dependencies/${REPO}_notebooks_tests_dependencies.yaml"
fiThere was a problem hiding this comment.
You're right, I wasn't thinking about the fact that test_notebooks is still the key in the other repos.
Yes we'll need different keys for different repos, but let's make the code much stricter than it is today and just hard-code that expectation directly.
Something like:
DFG_ARGS=(
--config "./dependencies.yaml"
--matrix "cuda=${CUDA_VER%.*};arch=$(arch);py=${PYTHON_VER}"
--output conda
)
for repo in repos; do
if [[ "${repo}" == "cugraph" ]]; then
DFG_ARGS+=(--file-key "run_notebooks")
else
DFG_ARGS+=(--file-key "test_notebooks")
fi
done
That'll fail with a big loud error (instead of silently passing) if any of our expectations aren't met, like:
- repo doesn't have a
dependencies.yaml(<-- this would also catch being in the wrong working directory) - repo doesn't have the exact keys we expect
We're only doing this testing for 3 repos there, let's go with specific and strict over generic and permissive, to give ourselves a better chance of catching bugs at the source.
There was a problem hiding this comment.
Sure, that sounds good!
I've made the changes on both this PR and the rapidsai/docker#853 PR on Docker.
8dea84c to
15e04b3
Compare
Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>
|
/ok to test f2ad392 |
|
I noticed CI hadn't started. If your initial commit(s) were not signed, I think that might mean that for the rest of this PR we need to |
|
The PR run triggered by the manual |
jameslamb
left a comment
There was a problem hiding this comment.
This is looking pretty good! Left one more set of comments on a few things to check.
Once those are addressed, let's try testing this in docker CI. On rapidsai/docker#853, change the cloning logic (https://github.com/rapidsai/docker/blob/2637556074567056aa69a7cb1850ac844ef89748/context/notebooks.sh#L17) to use this PR branch from your fork.
If we see CI pass there and things like pytest no longer showing up in logs, we could have high confidence that this is working.
| # Runtime dependencies for included cugraph notebooks, used in the | ||
| # rapidsai/notebooks Docker image. Only lists packages not already | ||
| # provided by cugraph, pylibcugraph, or libcugraph. |
There was a problem hiding this comment.
| # Runtime dependencies for included cugraph notebooks, used in the | |
| # rapidsai/notebooks Docker image. Only lists packages not already | |
| # provided by cugraph, pylibcugraph, or libcugraph. | |
| # Runtime dependencies for cuGraph notebooks tested in CI. | |
| # Also used in the rapidsai/notebooks Docker image. | |
| # Only lists packages that are directly imported in notebooks. |
I might not have been clear in my previous comments, sorry. This should be anything that's directly imported in the notebooks and isn't part of the standard library.
| - *ipython | ||
| - *networkx | ||
| - *notebook | ||
| - *scipy |
There was a problem hiding this comment.
Very nice! This is exactly what I was envisioning.
Looking at the output of this:
git grep 'import' notebooks/And the subset of notebooks actually tested in rapidsai/docker CI:
cugraph/algorithms/centrality/Betweenness.ipynb
cugraph/algorithms/centrality/Centrality.ipynb
cugraph/algorithms/centrality/Degree.ipynb
cugraph/algorithms/centrality/Eigenvector.ipynb
cugraph/algorithms/centrality/Katz.ipynb
cugraph/algorithms/community/Community-Clustering.ipynb
cugraph/algorithms/community/ECG.ipynb
cugraph/algorithms/community/Induced-Subgraph.ipynb
cugraph/algorithms/community/Louvain.ipynb
cugraph/algorithms/community/Spectral-Clustering.ipynb
cugraph/algorithms/community/Triangle-Counting.ipynb
cugraph/algorithms/community/ktruss.ipynb
cugraph/algorithms/components/ConnectedComponents.ipynb
cugraph/algorithms/cores/core-number.ipynb
cugraph/algorithms/cores/kcore.ipynb
cugraph/algorithms/link_analysis/HITS.ipynb
cugraph/algorithms/link_analysis/Pagerank.ipynb
cugraph/algorithms/link_prediction/Jaccard-Similarity.ipynb
cugraph/algorithms/link_prediction/Overlap-Similarity.ipynb
cugraph/algorithms/link_prediction/Sorensen_coefficient.ipynb
cugraph/algorithms/link_prediction/similarity_combined.ipynb
cugraph/algorithms/sampling/RandomWalk.ipynb
cugraph/algorithms/structure/Renumber-2.ipynb
cugraph/algorithms/structure/Renumber.ipynb
cugraph/algorithms/structure/Symmetrize.ipynb
cugraph/algorithms/traversal/BFS.ipynb
cugraph/algorithms/traversal/SSSP.ipynb
I think we need a few more entries here:
pandaspython-louvain- import name is
community - may eventually be phased out in Drop dependence on python-louvain in testing #5384 but for now needs to be included
- import name is
Note that notebooks/algorithms/layout/Force-Atlas2.ipynb, which requires some plotting libraries like cuxfilter and holoviews, is not tested there and so its dependencies can be skipped.
| - cuda_version | ||
| - py_version | ||
| - run_notebooks | ||
| - depends_on_cugraph |
There was a problem hiding this comment.
| - depends_on_cugraph | |
| - depends_on_cudf | |
| - depends_on_cugraph |
There are some direct imports of cudf in these notebooks. I think cudf should be added to this list.
Towards rapidsai/docker#835
Splits
test_python_commondependency group into runtime and testing parts:run_python_common: pandas and scipytest_python_common: pytest and its dependentsAdds
run_python_commonto all 5 file-keys that containtest_python_commonto keep output ofdependency-file-generatorunchanged.Adds a new
run_notebooksfile-key that includesrun_python_commonbut NOTtest_python_commonwhich givesrapidsai/dockera clean set of runtime-only notebook dependencies without the testing libraries.