Rushil Bhat

Posts

  • May 25, 2025

    How To Implement Tensor Parallel Cross Entropy Loss

  • Jan 27, 2025

    Inside FSDP: A Look at the Flat-Parameter Design

  • Dec 18, 2024

    Backpropagating through GPT-2

Contact

  • Email
  • GitHub
  • X