Compare commits

...

2 Commits

Author SHA1 Message Date
Andrew Kean Gao
72d3a0dd94
Merge 1f4350143d into d6d9447e2d 2024-03-19 00:13:46 +05:30
Andrew Kean Gao
1f4350143d
Update README.md to specify the active parameters
This is important because users may think they need to have the full 314B at one time, but only 86B which is much more manageable!
2024-03-18 10:41:38 -07:00

View File

@ -22,7 +22,7 @@ The implementation of the MoE layer in this repository is not efficient. The imp
Grok-1 is currently designed with the following specifications: Grok-1 is currently designed with the following specifications:
- **Parameters:** 314B - **Parameters:** 314B (86B active)
- **Architecture:** Mixture of 8 Experts (MoE) - **Architecture:** Mixture of 8 Experts (MoE)
- **Experts Utilization:** 2 experts used per token - **Experts Utilization:** 2 experts used per token
- **Layers:** 64 - **Layers:** 64