-
Notifications
You must be signed in to change notification settings - Fork 41
Description
Thank you very much for making the paper public. I believe dFlash is an outstanding and forward-looking piece of work.
I’m writing to ask if there are plans to release the training code in the future.
My primary reasons for asking are as follows:
- Once the training code is available, the community can adapt dFlash to more models and datasets, which will further strengthen the dFlash ecosystem.
2.Exploring 3-layer Configurations: I am particularly interested in testing the performance of a 3-layer setup. As I mentioned previously, a 5-layer configuration can impact TTFT and throughput under high concurrency. I would like to try training a model with fewer layers myself to optimize for these metrics.
3.Integration with SpecForge: After the official training code is released, we can quickly integrate it into the SpecForge project (part of sgl-project). This would enable dFlash to support training for larger-scale models and make it much easier for users to get started with dFlash training.
Thank you again for your contribution~