Rushil Bhat
Posts
May 25, 2025
How To Implement Tensor Parallel Cross Entropy Loss
Jan 27, 2025
Inside FSDP: A Look at the Flat-Parameter Design
Dec 18, 2024
Backpropagating through GPT-2