Update README.md

2025-11-21 19:01:41 +03:00 · 2024-03-19 20:00:46 +01:00
parent 7050ed204b
commit 91f135ff7e
1 changed files with 1 additions and 30 deletions
--- a/README.md
+++ b/README.md
@ -2,36 +2,7 @@

 This repository contains JAX example code for loading and running the Grok-1 open-weights model.

-Make sure to download the checkpoint and place the `ckpt-0` directory in `checkpoints` - see [Downloading the weights](#downloading-the-weights)
-
-Then, run
-
-```shell
-pip install -r requirements.txt
-python run.py
-```
-
-to test the code.
-
-The script loads the checkpoint and samples from the model on a test input.
-
-Due to the large size of the model (314B parameters), a machine with enough GPU memory is required to test the model with the example code.
-The implementation of the MoE layer in this repository is not efficient. The implementation was chosen to avoid the need for custom kernels to validate the correctness of the model.
-
-# Model Specifications
-
-Grok-1 is currently designed with the following specifications:
-
- **Parameters:** 314B
- **Architecture:** Mixture of 8 Experts (MoE)
- **Experts Utilization:** 2 experts used per token
- **Layers:** 64
- **Attention Heads:** 48 for queries, 8 for keys/values
- **Embedding Size:** 6,144
- **Tokenization:** SentencePiece tokenizer with 131,072 tokens
- **Additional Features:**
-  - Rotary embeddings (RoPE)
-  - Supports activation sharding and 8-bit quantization
+zation
 - **Maximum Sequence Length (context):** 8,192 tokens

 # Downloading the weights