init
This commit is contained in:
202
DOCKER_README.md
Normal file
202
DOCKER_README.md
Normal file
@@ -0,0 +1,202 @@
|
||||
# Distributed Llama Docker Setup for Raspberry Pi
|
||||
|
||||
This directory contains Docker configurations to run Distributed Llama on Raspberry Pi devices using containers. There are two variants:
|
||||
|
||||
1. **Controller** (`Dockerfile.controller`) - Downloads models and runs the API server
|
||||
2. **Worker** (`Dockerfile.worker`) - Runs worker nodes that connect to the controller
|
||||
|
||||
## Quick Start with Docker Compose
|
||||
|
||||
### 1. Download a Model
|
||||
|
||||
First, download a model using the controller container:
|
||||
|
||||
```bash
|
||||
# Create a models directory
|
||||
mkdir -p models
|
||||
|
||||
# Download a model (this will take some time)
|
||||
docker-compose run --rm controller --download llama3_2_3b_instruct_q40
|
||||
```
|
||||
|
||||
### 2. Start the Distributed Setup
|
||||
|
||||
```bash
|
||||
# Start all services (1 controller + 3 workers)
|
||||
docker-compose up
|
||||
```
|
||||
|
||||
The API will be available at `http://localhost:9999`
|
||||
|
||||
### 3. Test the API
|
||||
|
||||
```bash
|
||||
# List available models
|
||||
curl http://localhost:9999/v1/models
|
||||
|
||||
# Send a chat completion request
|
||||
curl -X POST http://localhost:9999/v1/chat/completions \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{
|
||||
"model": "llama",
|
||||
"messages": [{"role": "user", "content": "Hello, how are you?"}],
|
||||
"max_tokens": 100
|
||||
}'
|
||||
```
|
||||
|
||||
## Manual Docker Usage
|
||||
|
||||
### Building the Images
|
||||
|
||||
```bash
|
||||
# Build controller image
|
||||
docker build -f Dockerfile.controller -t distributed-llama-controller .
|
||||
|
||||
# Build worker image
|
||||
docker build -f Dockerfile.worker -t distributed-llama-worker .
|
||||
```
|
||||
|
||||
### Running the Controller
|
||||
|
||||
```bash
|
||||
# Download a model first
|
||||
docker run -v ./models:/app/models distributed-llama-controller --download llama3_2_3b_instruct_q40
|
||||
|
||||
# Run API server (standalone mode, no workers)
|
||||
docker run -p 9999:9999 -v ./models:/app/models distributed-llama-controller \
|
||||
--model llama3_2_3b_instruct_q40
|
||||
|
||||
# Run API server with workers
|
||||
docker run -p 9999:9999 -v ./models:/app/models distributed-llama-controller \
|
||||
--model llama3_2_3b_instruct_q40 \
|
||||
--workers 10.0.0.2:9999 10.0.0.3:9999 10.0.0.4:9999
|
||||
```
|
||||
|
||||
### Running Workers
|
||||
|
||||
```bash
|
||||
# Run a worker on default port 9999
|
||||
docker run -p 9999:9999 distributed-llama-worker
|
||||
|
||||
# Run a worker with custom settings
|
||||
docker run -p 9998:9998 distributed-llama-worker --port 9998 --nthreads 2
|
||||
```
|
||||
|
||||
## Available Models
|
||||
|
||||
You can download any of these models:
|
||||
|
||||
- `llama3_1_8b_instruct_q40`
|
||||
- `llama3_1_405b_instruct_q40` (very large, 56 parts)
|
||||
- `llama3_2_1b_instruct_q40`
|
||||
- `llama3_2_3b_instruct_q40`
|
||||
- `llama3_3_70b_instruct_q40`
|
||||
- `deepseek_r1_distill_llama_8b_q40`
|
||||
- `qwen3_0.6b_q40`
|
||||
- `qwen3_1.7b_q40`
|
||||
- `qwen3_8b_q40`
|
||||
- `qwen3_14b_q40`
|
||||
- `qwen3_30b_a3b_q40`
|
||||
|
||||
## Configuration Options
|
||||
|
||||
### Controller Options
|
||||
|
||||
- `--model <name>`: Model name to use (required)
|
||||
- `--port <port>`: API server port (default: 9999)
|
||||
- `--nthreads <n>`: Number of threads (default: 4)
|
||||
- `--max-seq-len <n>`: Maximum sequence length (default: 4096)
|
||||
- `--buffer-float-type <type>`: Buffer float type (default: q80)
|
||||
- `--workers <addresses>`: Space-separated worker addresses
|
||||
- `--download <model>`: Download a model and exit
|
||||
|
||||
### Worker Options
|
||||
|
||||
- `--port <port>`: Worker port (default: 9999)
|
||||
- `--nthreads <n>`: Number of threads (default: 4)
|
||||
|
||||
## Environment Variables (Docker Compose)
|
||||
|
||||
You can customize the setup using environment variables:
|
||||
|
||||
```bash
|
||||
# Set model and thread counts
|
||||
MODEL_NAME=llama3_2_1b_instruct_q40 \
|
||||
CONTROLLER_NTHREADS=2 \
|
||||
WORKER_NTHREADS=2 \
|
||||
docker-compose up
|
||||
```
|
||||
|
||||
Available variables:
|
||||
- `MODEL_NAME`: Model to use (default: llama3_2_3b_instruct_q40)
|
||||
- `CONTROLLER_NTHREADS`: Controller threads (default: 4)
|
||||
- `WORKER_NTHREADS`: Worker threads (default: 4)
|
||||
- `MAX_SEQ_LEN`: Maximum sequence length (default: 4096)
|
||||
- `BUFFER_FLOAT_TYPE`: Buffer float type (default: q80)
|
||||
|
||||
## Multi-Device Setup
|
||||
|
||||
To run across multiple Raspberry Pi devices:
|
||||
|
||||
### Device 1 (Controller)
|
||||
```bash
|
||||
# Run controller
|
||||
docker run -p 9999:9999 -v ./models:/app/models distributed-llama-controller \
|
||||
--model llama3_2_3b_instruct_q40 \
|
||||
--workers 10.0.0.2:9999 10.0.0.3:9999 10.0.0.4:9999
|
||||
```
|
||||
|
||||
### Devices 2-4 (Workers)
|
||||
```bash
|
||||
# Run worker on each device
|
||||
docker run -p 9999:9999 distributed-llama-worker --nthreads 4
|
||||
```
|
||||
|
||||
## Performance Tips
|
||||
|
||||
1. **Thread Count**: Set `--nthreads` to the number of CPU cores on each device
|
||||
2. **Memory**: Larger models require more RAM. Monitor usage with `docker stats`
|
||||
3. **Network**: Use wired Ethernet connections for better performance between devices
|
||||
4. **Storage**: Use fast SD cards (Class 10 or better) or USB 3.0 storage for model files
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Model Download Issues
|
||||
```bash
|
||||
# Check if model files exist
|
||||
ls -la models/llama3_2_3b_instruct_q40/
|
||||
|
||||
# Re-download if corrupted
|
||||
docker-compose run --rm controller --download llama3_2_3b_instruct_q40
|
||||
```
|
||||
|
||||
### Worker Connection Issues
|
||||
```bash
|
||||
# Check worker logs
|
||||
docker-compose logs worker1
|
||||
|
||||
# Test network connectivity
|
||||
docker exec -it <controller_container> ping 172.20.0.11
|
||||
```
|
||||
|
||||
### Resource Issues
|
||||
```bash
|
||||
# Monitor resource usage
|
||||
docker stats
|
||||
|
||||
# Reduce thread count if CPU usage is too high
|
||||
CONTROLLER_NTHREADS=2 WORKER_NTHREADS=2 docker-compose up
|
||||
```
|
||||
|
||||
## Web Interface
|
||||
|
||||
You can use the web chat interface at [llama-ui.js.org](https://llama-ui.js.org/):
|
||||
|
||||
1. Open the website
|
||||
2. Go to settings
|
||||
3. Set base URL to: `http://your-pi-ip:9999`
|
||||
4. Save and start chatting
|
||||
|
||||
## License
|
||||
|
||||
This Docker setup follows the same license as the main Distributed Llama project.
|
||||
Reference in New Issue
Block a user