From 4e2e30bd6fa425fdebf3062d6ee3da24d36cffde Mon Sep 17 00:00:00 2001
From: Andrew Kean Gao <olafblitz@gmail.com>
Date: Mon, 18 Mar 2024 10:43:55 -0700
Subject: [PATCH] Update README.md to have context length higher up

In my original summary of the model specifications, I had put the context length near the bottom, but upon thought, it is probably one of the relevant details to end-users, so it should be higher.

Also, "Additional Features" should be the final bullet point for editorial reasons.
---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index f4d9b61..4c0a000 100644
--- a/README.md
+++ b/README.md
@@ -25,6 +25,7 @@ Grok-1 is currently designed with the following specifications:
 - **Parameters:** 314B
 - **Architecture:** Mixture of 8 Experts (MoE)
 - **Experts Utilization:** 2 experts used per token
+- **Maximum Sequence Length (context):** 8,192 tokens
 - **Layers:** 64
 - **Attention Heads:** 48 for queries, 8 for keys/values
 - **Embedding Size:** 6,144
@@ -32,7 +33,6 @@ Grok-1 is currently designed with the following specifications:
 - **Additional Features:**
   - Rotary embeddings (RoPE)
   - Supports activation sharding and 8-bit quantization
-- **Maximum Sequence Length (context):** 8,192 tokens
 
 # Downloading the weights