init

2025-10-24 11:42:14 +02:00
commit 42172cbb6f
85 changed files with 40316 additions and 0 deletions
--- a/DOCKER_README.md
+++ b/DOCKER_README.md
@@ -0,0 +1,202 @@
+# Distributed Llama Docker Setup for Raspberry Pi
+
+This directory contains Docker configurations to run Distributed Llama on Raspberry Pi devices using containers. There are two variants:
+
+1. **Controller** (`Dockerfile.controller`) - Downloads models and runs the API server
+2. **Worker** (`Dockerfile.worker`) - Runs worker nodes that connect to the controller
+
+## Quick Start with Docker Compose
+
+### 1. Download a Model
+
+First, download a model using the controller container:
+
+```bash
+# Create a models directory
+mkdir -p models
+
+# Download a model (this will take some time)
+docker-compose run --rm controller --download llama3_2_3b_instruct_q40
+```
+
+### 2. Start the Distributed Setup
+
+```bash
+# Start all services (1 controller + 3 workers)
+docker-compose up
+```
+
+The API will be available at `http://localhost:9999`
+
+### 3. Test the API
+
+```bash
+# List available models
+curl http://localhost:9999/v1/models
+
+# Send a chat completion request
+curl -X POST http://localhost:9999/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "llama",
+    "messages": [{"role": "user", "content": "Hello, how are you?"}],
+    "max_tokens": 100
+  }'
+```
+
+## Manual Docker Usage
+
+### Building the Images
+
+```bash
+# Build controller image
+docker build -f Dockerfile.controller -t distributed-llama-controller .
+
+# Build worker image  
+docker build -f Dockerfile.worker -t distributed-llama-worker .
+```
+
+### Running the Controller
+
+```bash
+# Download a model first
+docker run -v ./models:/app/models distributed-llama-controller --download llama3_2_3b_instruct_q40
+
+# Run API server (standalone mode, no workers)
+docker run -p 9999:9999 -v ./models:/app/models distributed-llama-controller \
+  --model llama3_2_3b_instruct_q40
+
+# Run API server with workers
+docker run -p 9999:9999 -v ./models:/app/models distributed-llama-controller \
+  --model llama3_2_3b_instruct_q40 \
+  --workers 10.0.0.2:9999 10.0.0.3:9999 10.0.0.4:9999
+```
+
+### Running Workers
+
+```bash
+# Run a worker on default port 9999
+docker run -p 9999:9999 distributed-llama-worker
+
+# Run a worker with custom settings
+docker run -p 9998:9998 distributed-llama-worker --port 9998 --nthreads 2
+```
+
+## Available Models
+
+You can download any of these models:
+
+- `llama3_1_8b_instruct_q40`
+- `llama3_1_405b_instruct_q40` (very large, 56 parts)
+- `llama3_2_1b_instruct_q40`
+- `llama3_2_3b_instruct_q40`
+- `llama3_3_70b_instruct_q40`
+- `deepseek_r1_distill_llama_8b_q40`
+- `qwen3_0.6b_q40`
+- `qwen3_1.7b_q40`
+- `qwen3_8b_q40`
+- `qwen3_14b_q40`
+- `qwen3_30b_a3b_q40`
+
+## Configuration Options
+
+### Controller Options
+
+- `--model <name>`: Model name to use (required)
+- `--port <port>`: API server port (default: 9999)
+- `--nthreads <n>`: Number of threads (default: 4)
+- `--max-seq-len <n>`: Maximum sequence length (default: 4096)
+- `--buffer-float-type <type>`: Buffer float type (default: q80)
+- `--workers <addresses>`: Space-separated worker addresses
+- `--download <model>`: Download a model and exit
+
+### Worker Options
+
+- `--port <port>`: Worker port (default: 9999)
+- `--nthreads <n>`: Number of threads (default: 4)
+
+## Environment Variables (Docker Compose)
+
+You can customize the setup using environment variables:
+
+```bash
+# Set model and thread counts
+MODEL_NAME=llama3_2_1b_instruct_q40 \
+CONTROLLER_NTHREADS=2 \
+WORKER_NTHREADS=2 \
+docker-compose up
+```
+
+Available variables:
+- `MODEL_NAME`: Model to use (default: llama3_2_3b_instruct_q40)
+- `CONTROLLER_NTHREADS`: Controller threads (default: 4)
+- `WORKER_NTHREADS`: Worker threads (default: 4)
+- `MAX_SEQ_LEN`: Maximum sequence length (default: 4096)
+- `BUFFER_FLOAT_TYPE`: Buffer float type (default: q80)
+
+## Multi-Device Setup
+
+To run across multiple Raspberry Pi devices:
+
+### Device 1 (Controller)
+```bash
+# Run controller
+docker run -p 9999:9999 -v ./models:/app/models distributed-llama-controller \
+  --model llama3_2_3b_instruct_q40 \
+  --workers 10.0.0.2:9999 10.0.0.3:9999 10.0.0.4:9999
+```
+
+### Devices 2-4 (Workers)
+```bash
+# Run worker on each device
+docker run -p 9999:9999 distributed-llama-worker --nthreads 4
+```
+
+## Performance Tips
+
+1. **Thread Count**: Set `--nthreads` to the number of CPU cores on each device
+2. **Memory**: Larger models require more RAM. Monitor usage with `docker stats`
+3. **Network**: Use wired Ethernet connections for better performance between devices
+4. **Storage**: Use fast SD cards (Class 10 or better) or USB 3.0 storage for model files
+
+## Troubleshooting
+
+### Model Download Issues
+```bash
+# Check if model files exist
+ls -la models/llama3_2_3b_instruct_q40/
+
+# Re-download if corrupted
+docker-compose run --rm controller --download llama3_2_3b_instruct_q40
+```
+
+### Worker Connection Issues
+```bash
+# Check worker logs
+docker-compose logs worker1
+
+# Test network connectivity
+docker exec -it <controller_container> ping 172.20.0.11
+```
+
+### Resource Issues
+```bash
+# Monitor resource usage
+docker stats
+
+# Reduce thread count if CPU usage is too high
+CONTROLLER_NTHREADS=2 WORKER_NTHREADS=2 docker-compose up
+```
+
+## Web Interface
+
+You can use the web chat interface at [llama-ui.js.org](https://llama-ui.js.org/):
+
+1. Open the website
+2. Go to settings
+3. Set base URL to: `http://your-pi-ip:9999`
+4. Save and start chatting
+
+## License
+
+This Docker setup follows the same license as the main Distributed Llama project.