From 073f74eaf67a10a128c0698bdf1930a61c57cc44 Mon Sep 17 00:00:00 2001 From: BIEMAX Date: Mon, 18 Mar 2024 21:29:52 -0300 Subject: [PATCH 1/3] Update readme with new instructions --- README.md | 46 ++++++++++++++++++++++++++++++---------------- 1 file changed, 30 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index eaded5c..4b35f7c 100644 --- a/README.md +++ b/README.md @@ -2,28 +2,42 @@ This repository contains JAX example code for loading and running the Grok-1 open-weights model. -Make sure to download the checkpoint and place `ckpt-0` directory in `checkpoint`. -Then, run +Make sure to download the checkpoint and place `ckpt-0` directory in `checkpoints` before run the project. -```shell -pip install -r requirements.txt -python run.py -``` +## 1. Downloading the weights -to test the code. - -The script loads the checkpoint and samples from the model on a test input. - -Due to the large size of the model (314B parameters), a machine with enough GPU memory is required to test the model with the example code. -The implementation of the MoE layer in this repository is not efficient. The implementation was chosen to avoid the need for custom kernels to validate the correctness of the model. - -# Downloading the weights - -You can download the weights using a torrent client and this magnet link: +You can download the weights using a torrent client in the following magnet link: ``` magnet:?xt=urn:btih:5f96d43576e3d386c9ba65b883210a393b68210e&tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce ``` +## 2. Installation + +1. Install the project dependencies + +```bash +pip install -r requirements.txt +``` + +2. Run the project + +```bash +python run.py +``` + +The script loads the checkpoint and samples from the model on a test input. + +Due to the large size of the model (314B/Billion parameters), a machine with enough GPU memory is required to test the model with the example code. + +The implementation of the MoE layer in this repository is not efficient. The implementation was chosen to avoid the need for custom kernels to validate the correctness of the model. + +## 3. Requirements + +Make sure to attend the following requirements before run the project: + + - Needs either either a TPU or GPU (NVIDIA/AMD supported only) + - They have to be 8 devices (in the context of TPUs (Tensor Processing Units) or GPUs (Graphics Processing Units), they are typically talking about having access to a total of 8 individual processing units) + # License The code and associated Grok-1 weights in this release are licensed under the From adacffe69f6568c1f135bfb9dc2ea67edc0dba32 Mon Sep 17 00:00:00 2001 From: BIEMAX Date: Mon, 18 Mar 2024 21:30:43 -0300 Subject: [PATCH 2/3] Update wrong dependency in requirements.txt --- requirements.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/requirements.txt b/requirements.txt index f6d124e..09e9e15 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,4 +1,4 @@ -dm_haiku==0.0.12 -jax[cuda12_pip]==0.4.25 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html +dm-haiku==0.0.12 +jax[cuda12-pip]==0.4.25 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html numpy==1.26.4 sentencepiece==0.2.0 From 7fead60864c73806fa249b0b35e2646cf149cff4 Mon Sep 17 00:00:00 2001 From: BIEMAX Date: Mon, 18 Mar 2024 21:39:40 -0300 Subject: [PATCH 3/3] fix conflicts in README.md --- README.md | 46 ++++++++++++++++++++++++++++++++-------------- 1 file changed, 32 insertions(+), 14 deletions(-) diff --git a/README.md b/README.md index 4b35f7c..f9a770c 100644 --- a/README.md +++ b/README.md @@ -2,16 +2,9 @@ This repository contains JAX example code for loading and running the Grok-1 open-weights model. -Make sure to download the checkpoint and place `ckpt-0` directory in `checkpoints` before run the project. +Make sure to download the checkpoint and place the `ckpt-0` directory in `checkpoints` - see [Downloading the weights](#downloading-the-weights) -## 1. Downloading the weights - -You can download the weights using a torrent client in the following magnet link: -``` -magnet:?xt=urn:btih:5f96d43576e3d386c9ba65b883210a393b68210e&tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce -``` - -## 2. Installation +## 1. Installation 1. Install the project dependencies @@ -27,16 +20,41 @@ python run.py The script loads the checkpoint and samples from the model on a test input. -Due to the large size of the model (314B/Billion parameters), a machine with enough GPU memory is required to test the model with the example code. +Due to the large size of the model (314 Billion parameters), a machine with enough GPU memory is required to test the model with the example code. The implementation of the MoE layer in this repository is not efficient. The implementation was chosen to avoid the need for custom kernels to validate the correctness of the model. -## 3. Requirements +## 2. Model Specifications -Make sure to attend the following requirements before run the project: +Grok-1 is currently designed with the following specifications: - - Needs either either a TPU or GPU (NVIDIA/AMD supported only) - - They have to be 8 devices (in the context of TPUs (Tensor Processing Units) or GPUs (Graphics Processing Units), they are typically talking about having access to a total of 8 individual processing units) +- **Parameters:** 314B +- **Architecture:** Mixture of 8 Experts (MoE) +- **Experts Utilization:** 2 experts used per token +- **Layers:** 64 +- **Attention Heads:** 48 for queries, 8 for keys/values +- **Embedding Size:** 6,144 +- **Tokenization:** SentencePiece tokenizer with 131,072 tokens +- **Additional Features:** + - Rotary embeddings (RoPE) + - Supports activation sharding and 8-bit quantization +- **Maximum Sequence Length (context):** 8,192 tokens +- **TPU/GPU:** NVIDIA/AMD supported only + +## 4. Downloading the weights + +You can download the weights using a torrent client and this magnet link: + +``` +magnet:?xt=urn:btih:5f96d43576e3d386c9ba65b883210a393b68210e&tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php&tr=udp%3A%2F%2Ftracker.coppersurfer.tk%3A6969&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce +``` + +or directly using [HuggingFace 🤗 Hub](https://huggingface.co/xai-org/grok-1): +``` +git clone https://github.com/xai-org/grok-1.git && cd grok-1 +pip install huggingface_hub[hf_transfer] +huggingface-cli download xai-org/grok-1 --repo-type model --include ckpt-0/* --local-dir checkpoints --local-dir-use-symlinks False +``` # License