Open
Conversation
I ran across a few typos and link formatting issues (`.rst` vs. `.md` syntax) while reading through the docs recently. This PR contains fixes for those and a few other typos that claude found. Authors: - James Bourbeau (https://github.com/jrbourbeau) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #1973
Author
|
FAILURE - Unable to forward-merge due to an error, manual merge is necessary. Do not use the IMPORTANT: When merging this PR, do not use the auto-merger (i.e. the |
Merge after #1880 This PR adds support for streaming out of core (dataset on host) kmeans clustering. The idea is simple: Batched accumulation of centroid updates: Data is processed in batches and batch-wise means and cluster counts are accumulated until all the batches i.e., the full dataset pass has completed. This PR just brings a batch-size parameter to load and compute cluster assignments and (weighted) centroid adjustments on batches of the dataset. The final centroid 'updates' i.e. a single kmeans iteration only completes when all these accumulated sums are averaged once the whole dataset pass has completed. Authors: - Tarang Jain (https://github.com/tarang-jain) Approvers: - Victor Lafargue (https://github.com/viclafargue) - Anupam (https://github.com/aamijar) - Micka (https://github.com/lowener) - Jinsol Park (https://github.com/jinsolp) - Ben Frederickson (https://github.com/benfred) URL: #1886
This is to ensure the config does not accidentally get corrupted or start with garbage values. Furthermore, the CUDA [docs](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__EXECUTION.html#group__CUDART__EXECUTION_1ge236ecdbbaf7cf47a806bba71c1d03c4) recommend setting `config.attrs = nullptr` even if `config.numAttrs = 0`. The need of this update comes from rapidsai/cuml#7906, where in cuML nightly wheel tests we are intermittently observing CUDA context corruption from the JIT path. While I am not sure if this PR will resolve them, it is still a step in the right direction. Authors: - Divye Gala (https://github.com/divyegala) Approvers: - Dante Gama Dessavre (https://github.com/dantegd) URL: #1974
- Add sync_stream for D2H copies of the graph - strided copy to copy the strided dataset Authors: - Tarang Jain (https://github.com/tarang-jain) Approvers: - Ben Karsin (https://github.com/bkarsin) - Jinsol Park (https://github.com/jinsolp) - Corey J. Nolet (https://github.com/cjnolet) URL: #1966
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Forward-merge triggered by push to release/26.04 that creates a PR to keep main up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge. See forward-merger docs for more info.