5.2 KiB
5.2 KiB
Distributed Llama Docker Setup for Raspberry Pi
This directory contains Docker configurations to run Distributed Llama on Raspberry Pi devices using containers. There are two variants:
- Controller (
Dockerfile.controller) - Downloads models and runs the API server - Worker (
Dockerfile.worker) - Runs worker nodes that connect to the controller
Quick Start with Docker Compose
1. Download a Model
First, download a model using the controller container:
# Create a models directory
mkdir -p models
# Download a model (this will take some time)
docker-compose run --rm controller --download llama3_2_3b_instruct_q40
2. Start the Distributed Setup
# Start all services (1 controller + 3 workers)
docker-compose up
The API will be available at http://localhost:9999
3. Test the API
# List available models
curl http://localhost:9999/v1/models
# Send a chat completion request
curl -X POST http://localhost:9999/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama",
"messages": [{"role": "user", "content": "Hello, how are you?"}],
"max_tokens": 100
}'
Manual Docker Usage
Building the Images
# Build controller image
docker build -f Dockerfile.controller -t distributed-llama-controller .
# Build worker image
docker build -f Dockerfile.worker -t distributed-llama-worker .
Running the Controller
# Download a model first
docker run -v ./models:/app/models distributed-llama-controller --download llama3_2_3b_instruct_q40
# Run API server (standalone mode, no workers)
docker run -p 9999:9999 -v ./models:/app/models distributed-llama-controller \
--model llama3_2_3b_instruct_q40
# Run API server with workers
docker run -p 9999:9999 -v ./models:/app/models distributed-llama-controller \
--model llama3_2_3b_instruct_q40 \
--workers 10.0.0.2:9999 10.0.0.3:9999 10.0.0.4:9999
Running Workers
# Run a worker on default port 9999
docker run -p 9999:9999 distributed-llama-worker
# Run a worker with custom settings
docker run -p 9998:9998 distributed-llama-worker --port 9998 --nthreads 2
Available Models
You can download any of these models:
llama3_1_8b_instruct_q40llama3_1_405b_instruct_q40(very large, 56 parts)llama3_2_1b_instruct_q40llama3_2_3b_instruct_q40llama3_3_70b_instruct_q40deepseek_r1_distill_llama_8b_q40qwen3_0.6b_q40qwen3_1.7b_q40qwen3_8b_q40qwen3_14b_q40qwen3_30b_a3b_q40
Configuration Options
Controller Options
--model <name>: Model name to use (required)--port <port>: API server port (default: 9999)--nthreads <n>: Number of threads (default: 4)--max-seq-len <n>: Maximum sequence length (default: 4096)--buffer-float-type <type>: Buffer float type (default: q80)--workers <addresses>: Space-separated worker addresses--download <model>: Download a model and exit
Worker Options
--port <port>: Worker port (default: 9999)--nthreads <n>: Number of threads (default: 4)
Environment Variables (Docker Compose)
You can customize the setup using environment variables:
# Set model and thread counts
MODEL_NAME=llama3_2_1b_instruct_q40 \
CONTROLLER_NTHREADS=2 \
WORKER_NTHREADS=2 \
docker-compose up
Available variables:
MODEL_NAME: Model to use (default: llama3_2_3b_instruct_q40)CONTROLLER_NTHREADS: Controller threads (default: 4)WORKER_NTHREADS: Worker threads (default: 4)MAX_SEQ_LEN: Maximum sequence length (default: 4096)BUFFER_FLOAT_TYPE: Buffer float type (default: q80)
Multi-Device Setup
To run across multiple Raspberry Pi devices:
Device 1 (Controller)
# Run controller
docker run -p 9999:9999 -v ./models:/app/models distributed-llama-controller \
--model llama3_2_3b_instruct_q40 \
--workers 10.0.0.2:9999 10.0.0.3:9999 10.0.0.4:9999
Devices 2-4 (Workers)
# Run worker on each device
docker run -p 9999:9999 distributed-llama-worker --nthreads 4
Performance Tips
- Thread Count: Set
--nthreadsto the number of CPU cores on each device - Memory: Larger models require more RAM. Monitor usage with
docker stats - Network: Use wired Ethernet connections for better performance between devices
- Storage: Use fast SD cards (Class 10 or better) or USB 3.0 storage for model files
Troubleshooting
Model Download Issues
# Check if model files exist
ls -la models/llama3_2_3b_instruct_q40/
# Re-download if corrupted
docker-compose run --rm controller --download llama3_2_3b_instruct_q40
Worker Connection Issues
# Check worker logs
docker-compose logs worker1
# Test network connectivity
docker exec -it <controller_container> ping 172.20.0.11
Resource Issues
# Monitor resource usage
docker stats
# Reduce thread count if CPU usage is too high
CONTROLLER_NTHREADS=2 WORKER_NTHREADS=2 docker-compose up
Web Interface
You can use the web chat interface at llama-ui.js.org:
- Open the website
- Go to settings
- Set base URL to:
http://your-pi-ip:9999 - Save and start chatting
License
This Docker setup follows the same license as the main Distributed Llama project.