diff --git a/README.md b/README.md index f501a07..73d3ea7 100644 --- a/README.md +++ b/README.md @@ -2,36 +2,7 @@ This repository contains JAX example code for loading and running the Grok-1 open-weights model. -Make sure to download the checkpoint and place the `ckpt-0` directory in `checkpoints` - see [Downloading the weights](#downloading-the-weights) - -Then, run - -```shell -pip install -r requirements.txt -python run.py -``` - -to test the code. - -The script loads the checkpoint and samples from the model on a test input. - -Due to the large size of the model (314B parameters), a machine with enough GPU memory is required to test the model with the example code. -The implementation of the MoE layer in this repository is not efficient. The implementation was chosen to avoid the need for custom kernels to validate the correctness of the model. - -# Model Specifications - -Grok-1 is currently designed with the following specifications: - -- **Parameters:** 314B -- **Architecture:** Mixture of 8 Experts (MoE) -- **Experts Utilization:** 2 experts used per token -- **Layers:** 64 -- **Attention Heads:** 48 for queries, 8 for keys/values -- **Embedding Size:** 6,144 -- **Tokenization:** SentencePiece tokenizer with 131,072 tokens -- **Additional Features:** - - Rotary embeddings (RoPE) - - Supports activation sharding and 8-bit quantization +zation - **Maximum Sequence Length (context):** 8,192 tokens # Downloading the weights