# How to Run Distributed Llama on πŸ“ Raspberry Pi This article describes how to run Distributed Llama on 4 Raspberry Pi devices, but you can also run it on 1, 2, 4, 8... devices. Please adjust the commands and topology according to your configuration. ```` [πŸ”€ SWITCH OR ROUTER] | | | | | | | |_______ πŸ”Έ raspberrypi1 (ROOT) 10.0.0.1 | | |_________ πŸ”Ή raspberrypi2 (WORKER 1) 10.0.0.2:9999 | |___________ πŸ”Ή raspberrypi3 (WORKER 2) 10.0.0.3:9999 |_____________ πŸ”Ή raspberrypi4 (WORKER 3) 10.0.0.4:9999 ```` 1. Install `Raspberry Pi OS Lite (64 bit)` on your **πŸ”ΈπŸ”Ή ALL** Raspberry Pi devices. This OS doesn't have desktop environment but you can easily connect via SSH to manage it. 2. Connect **πŸ”ΈπŸ”Ή ALL** devices to your **πŸ”€ SWITCH OR ROUTER** via Ethernet cable. If you're using only two devices, it's better to connect them directly without a switch. 3. Connect to all devices via SSH from your computer. ``` ssh user@raspberrypi1.local ssh user@raspberrypi2.local ssh user@raspberrypi3.local ssh user@raspberrypi4.local ``` 4. Install Git on **πŸ”ΈπŸ”Ή ALL** devices: ```sh sudo apt install git ``` 5. Clone this repository and compile Distributed Llama on **πŸ”ΈπŸ”Ή ALL** devices: ```sh git clone https://github.com/b4rtaz/distributed-llama.git cd distributed-llama make dllama make dllama-api ``` 6. Download the model to the **πŸ”Έ ROOT** device using the `launch.py` script. You don't need to download the model on worker devices. ```sh python3 launch.py # Prints a list of available models python3 launch.py llama3_2_3b_instruct_q40 # Downloads the model to the root device ``` 7. Assign static IP addresses on **πŸ”ΈπŸ”Ή ALL** devices. Each device must have a unique IP address in the same subnet. ```sh sudo ip addr add 10.0.0.1/24 dev eth0 # πŸ”Έ ROOT sudo ip addr add 10.0.0.2/24 dev eth0 # πŸ”Ή WORKER 1 sudo ip addr add 10.0.0.3/24 dev eth0 # πŸ”Ή WORKER 2 sudo ip addr add 10.0.0.4/24 dev eth0 # πŸ”Ή WORKER 3 ``` 8. Start workers on all **πŸ”Ή WORKER** devices: ```sh sudo nice -n -20 ./dllama worker --port 9999 --nthreads 4 ``` 9. Run the inference to test if everything works fine on the **πŸ”Έ ROOT** device: ```sh sudo nice -n -20 ./dllama inference \ --prompt "Hello world" \ --steps 32 \ --model models/llama3_2_3b_instruct_q40/dllama_model_llama3_2_3b_instruct_q40.m \ --tokenizer models/llama3_2_3b_instruct_q40/dllama_tokenizer_llama3_2_3b_instruct_q40.t \ --buffer-float-type q80 \ --nthreads 4 \ --max-seq-len 4096 \ --workers 10.0.0.2:9999 10.0.0.3:9999 10.0.0.4:9999 ``` 10. To run the API server, start it on the **πŸ”Έ ROOT** device: ```sh sudo nice -n -20 ./dllama-api \ --port 9999 \ --model models/llama3_2_3b_instruct_q40/dllama_model_llama3_2_3b_instruct_q40.m \ --tokenizer models/llama3_2_3b_instruct_q40/dllama_tokenizer_llama3_2_3b_instruct_q40.t \ --buffer-float-type q80 \ --nthreads 4 \ --max-seq-len 4096 \ --workers 10.0.0.2:9999 10.0.0.3:9999 10.0.0.4:9999 ``` Now you can connect to the API server from your computer: ``` http://raspberrypi1.local:9999/v1/models ``` 11. When the API server is running, you can open the web chat in your browser, open [llama-ui.js.org](https://llama-ui.js.org/), go to the settings and set the base URL to: `http://raspberrypi1.local:9999`. Press the "save" button and start chatting!