-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
DeepSpeed is a parallelization package for training large neural nets across multiple GPUs. One of their features is pipeline parallelism, which works by taking a PyTorch sequential nn object and splitting the sequential layers across different GPUs. Since Caskade is a nested, sequential code, I suspect we could modify it to enable such sequential evaluation, which would open the door to extremely large models and complex nested module structures.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels