See https://github.com/SqueezeAILab/SqueezeLLM/issues/67 that claims to have a KMeans library that can speedup clustering a model from 2 hours down to 6 minutes.
See SqueezeAILab/SqueezeLLM#67 that claims to have a KMeans library that can speedup clustering a model from 2 hours down to 6 minutes.