# Distributed Llama Docker Setup for Raspberry Pi This directory contains Docker configurations to run Distributed Llama on Raspberry Pi devices using containers. There are two variants: 1. **Controller** (`Dockerfile.controller`) - Downloads models and runs the API server 2. **Worker** (`Dockerfile.worker`) - Runs worker nodes that connect to the controller ## Quick Start with Docker Compose ### 1. Download a Model First, download a model using the controller container: ```bash # Create a models directory mkdir -p models # Download a model (this will take some time) docker-compose run --rm controller --download llama3_2_3b_instruct_q40 ``` ### 2. Start the Distributed Setup ```bash # Start all services (1 controller + 3 workers) docker-compose up ``` The API will be available at `http://localhost:9999` ### 3. Test the API ```bash # List available models curl http://localhost:9999/v1/models # Send a chat completion request curl -X POST http://localhost:9999/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "llama", "messages": [{"role": "user", "content": "Hello, how are you?"}], "max_tokens": 100 }' ``` ## Manual Docker Usage ### Building the Images ```bash # Build controller image docker build -f Dockerfile.controller -t distributed-llama-controller . # Build worker image docker build -f Dockerfile.worker -t distributed-llama-worker . ``` ### Running the Controller ```bash # Download a model first docker run -v ./models:/app/models distributed-llama-controller --download llama3_2_3b_instruct_q40 # Run API server (standalone mode, no workers) docker run -p 9999:9999 -v ./models:/app/models distributed-llama-controller \ --model llama3_2_3b_instruct_q40 # Run API server with workers docker run -p 9999:9999 -v ./models:/app/models distributed-llama-controller \ --model llama3_2_3b_instruct_q40 \ --workers 10.0.0.2:9999 10.0.0.3:9999 10.0.0.4:9999 ``` ### Running Workers ```bash # Run a worker on default port 9999 docker run -p 9999:9999 distributed-llama-worker # Run a worker with custom settings docker run -p 9998:9998 distributed-llama-worker --port 9998 --nthreads 2 ``` ## Available Models You can download any of these models: - `llama3_1_8b_instruct_q40` - `llama3_1_405b_instruct_q40` (very large, 56 parts) - `llama3_2_1b_instruct_q40` - `llama3_2_3b_instruct_q40` - `llama3_3_70b_instruct_q40` - `deepseek_r1_distill_llama_8b_q40` - `qwen3_0.6b_q40` - `qwen3_1.7b_q40` - `qwen3_8b_q40` - `qwen3_14b_q40` - `qwen3_30b_a3b_q40` ## Configuration Options ### Controller Options - `--model `: Model name to use (required) - `--port `: API server port (default: 9999) - `--nthreads `: Number of threads (default: 4) - `--max-seq-len `: Maximum sequence length (default: 4096) - `--buffer-float-type `: Buffer float type (default: q80) - `--workers `: Space-separated worker addresses - `--download `: Download a model and exit ### Worker Options - `--port `: Worker port (default: 9999) - `--nthreads `: Number of threads (default: 4) ## Environment Variables (Docker Compose) You can customize the setup using environment variables: ```bash # Set model and thread counts MODEL_NAME=llama3_2_1b_instruct_q40 \ CONTROLLER_NTHREADS=2 \ WORKER_NTHREADS=2 \ docker-compose up ``` Available variables: - `MODEL_NAME`: Model to use (default: llama3_2_3b_instruct_q40) - `CONTROLLER_NTHREADS`: Controller threads (default: 4) - `WORKER_NTHREADS`: Worker threads (default: 4) - `MAX_SEQ_LEN`: Maximum sequence length (default: 4096) - `BUFFER_FLOAT_TYPE`: Buffer float type (default: q80) ## Multi-Device Setup To run across multiple Raspberry Pi devices: ### Device 1 (Controller) ```bash # Run controller docker run -p 9999:9999 -v ./models:/app/models distributed-llama-controller \ --model llama3_2_3b_instruct_q40 \ --workers 10.0.0.2:9999 10.0.0.3:9999 10.0.0.4:9999 ``` ### Devices 2-4 (Workers) ```bash # Run worker on each device docker run -p 9999:9999 distributed-llama-worker --nthreads 4 ``` ## Performance Tips 1. **Thread Count**: Set `--nthreads` to the number of CPU cores on each device 2. **Memory**: Larger models require more RAM. Monitor usage with `docker stats` 3. **Network**: Use wired Ethernet connections for better performance between devices 4. **Storage**: Use fast SD cards (Class 10 or better) or USB 3.0 storage for model files ## Troubleshooting ### Model Download Issues ```bash # Check if model files exist ls -la models/llama3_2_3b_instruct_q40/ # Re-download if corrupted docker-compose run --rm controller --download llama3_2_3b_instruct_q40 ``` ### Worker Connection Issues ```bash # Check worker logs docker-compose logs worker1 # Test network connectivity docker exec -it ping 172.20.0.11 ``` ### Resource Issues ```bash # Monitor resource usage docker stats # Reduce thread count if CPU usage is too high CONTROLLER_NTHREADS=2 WORKER_NTHREADS=2 docker-compose up ``` ## Web Interface You can use the web chat interface at [llama-ui.js.org](https://llama-ui.js.org/): 1. Open the website 2. Go to settings 3. Set base URL to: `http://your-pi-ip:9999` 4. Save and start chatting ## License This Docker setup follows the same license as the main Distributed Llama project.