Blog
- Muon Optimizer
A structured walk-through of the Muon optimizer — from vector vs. matrix steepest descent, to spectral-norm geometry, to Stiefel manifolds, learning-rate/batch-size relations, and practical deployment.
A structured walk-through of the Muon optimizer — from vector vs. matrix steepest descent, to spectral-norm geometry, to Stiefel manifolds, learning-rate/batch-size relations, and practical deployment.