Commit Graph

2 Commits

Author SHA1 Message Date
Jacob-Junqi Tian
6fd75b4340
Implemented automated broadcasting in weight rescale when number of model shards is less than number of experts. 2024-03-21 17:23:45 -04:00
Igor Babuschkin
be76c959fa Add initial code 2024-03-17 11:11:31 -07:00