From 4e2e30bd6fa425fdebf3062d6ee3da24d36cffde Mon Sep 17 00:00:00 2001 From: Andrew Kean Gao Date: Mon, 18 Mar 2024 10:43:55 -0700 Subject: [PATCH] Update README.md to have context length higher up In my original summary of the model specifications, I had put the context length near the bottom, but upon thought, it is probably one of the relevant details to end-users, so it should be higher. Also, "Additional Features" should be the final bullet point for editorial reasons. --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index f4d9b61..4c0a000 100644 --- a/README.md +++ b/README.md @@ -25,6 +25,7 @@ Grok-1 is currently designed with the following specifications: - **Parameters:** 314B - **Architecture:** Mixture of 8 Experts (MoE) - **Experts Utilization:** 2 experts used per token +- **Maximum Sequence Length (context):** 8,192 tokens - **Layers:** 64 - **Attention Heads:** 48 for queries, 8 for keys/values - **Embedding Size:** 6,144 @@ -32,7 +33,6 @@ Grok-1 is currently designed with the following specifications: - **Additional Features:** - Rotary embeddings (RoPE) - Supports activation sharding and 8-bit quantization -- **Maximum Sequence Length (context):** 8,192 tokens # Downloading the weights