christian/dllama

Fork 0

Files

Chris 42172cbb6f

main / Linux (amd64, ubuntu-22.04) (push) Successful in 49s

Details

main / Linux (arm64, ubuntu-24.04-arm) (push) Has been cancelled

Details

main / Windows (push) Has been cancelled

Details

init

2025-10-24 11:42:14 +02:00

5.2 KiB

Raw Permalink Blame History

Distributed Llama Docker Setup for Raspberry Pi

This directory contains Docker configurations to run Distributed Llama on Raspberry Pi devices using containers. There are two variants:

Controller (Dockerfile.controller) - Downloads models and runs the API server
Worker (Dockerfile.worker) - Runs worker nodes that connect to the controller

Quick Start with Docker Compose

1. Download a Model

First, download a model using the controller container:

# Create a models directory
mkdir -p models

# Download a model (this will take some time)
docker-compose run --rm controller --download llama3_2_3b_instruct_q40

2. Start the Distributed Setup

# Start all services (1 controller + 3 workers)
docker-compose up

The API will be available at http://localhost:9999

3. Test the API

# List available models
curl http://localhost:9999/v1/models

# Send a chat completion request
curl -X POST http://localhost:9999/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama",
    "messages": [{"role": "user", "content": "Hello, how are you?"}],
    "max_tokens": 100
  }'

Manual Docker Usage

Building the Images

# Build controller image
docker build -f Dockerfile.controller -t distributed-llama-controller .

# Build worker image  
docker build -f Dockerfile.worker -t distributed-llama-worker .

Running the Controller

# Download a model first
docker run -v ./models:/app/models distributed-llama-controller --download llama3_2_3b_instruct_q40

# Run API server (standalone mode, no workers)
docker run -p 9999:9999 -v ./models:/app/models distributed-llama-controller \
  --model llama3_2_3b_instruct_q40

# Run API server with workers
docker run -p 9999:9999 -v ./models:/app/models distributed-llama-controller \
  --model llama3_2_3b_instruct_q40 \
  --workers 10.0.0.2:9999 10.0.0.3:9999 10.0.0.4:9999

Running Workers

# Run a worker on default port 9999
docker run -p 9999:9999 distributed-llama-worker

# Run a worker with custom settings
docker run -p 9998:9998 distributed-llama-worker --port 9998 --nthreads 2

Available Models

You can download any of these models:

llama3_1_8b_instruct_q40
llama3_1_405b_instruct_q40 (very large, 56 parts)
llama3_2_1b_instruct_q40
llama3_2_3b_instruct_q40
llama3_3_70b_instruct_q40
deepseek_r1_distill_llama_8b_q40
qwen3_0.6b_q40
qwen3_1.7b_q40
qwen3_8b_q40
qwen3_14b_q40
qwen3_30b_a3b_q40

Configuration Options

Controller Options

--model <name>: Model name to use (required)
--port <port>: API server port (default: 9999)
--nthreads <n>: Number of threads (default: 4)
--max-seq-len <n>: Maximum sequence length (default: 4096)
--buffer-float-type <type>: Buffer float type (default: q80)
--workers <addresses>: Space-separated worker addresses
--download <model>: Download a model and exit

Worker Options

--port <port>: Worker port (default: 9999)
--nthreads <n>: Number of threads (default: 4)

Environment Variables (Docker Compose)

You can customize the setup using environment variables:

# Set model and thread counts
MODEL_NAME=llama3_2_1b_instruct_q40 \
CONTROLLER_NTHREADS=2 \
WORKER_NTHREADS=2 \
docker-compose up

Available variables:

MODEL_NAME: Model to use (default: llama3_2_3b_instruct_q40)
CONTROLLER_NTHREADS: Controller threads (default: 4)
WORKER_NTHREADS: Worker threads (default: 4)
MAX_SEQ_LEN: Maximum sequence length (default: 4096)
BUFFER_FLOAT_TYPE: Buffer float type (default: q80)

Multi-Device Setup

To run across multiple Raspberry Pi devices:

Device 1 (Controller)

# Run controller
docker run -p 9999:9999 -v ./models:/app/models distributed-llama-controller \
  --model llama3_2_3b_instruct_q40 \
  --workers 10.0.0.2:9999 10.0.0.3:9999 10.0.0.4:9999

Devices 2-4 (Workers)

# Run worker on each device
docker run -p 9999:9999 distributed-llama-worker --nthreads 4

Performance Tips

Thread Count: Set --nthreads to the number of CPU cores on each device
Memory: Larger models require more RAM. Monitor usage with docker stats
Network: Use wired Ethernet connections for better performance between devices
Storage: Use fast SD cards (Class 10 or better) or USB 3.0 storage for model files

Troubleshooting

Model Download Issues

# Check if model files exist
ls -la models/llama3_2_3b_instruct_q40/

# Re-download if corrupted
docker-compose run --rm controller --download llama3_2_3b_instruct_q40

Worker Connection Issues

# Check worker logs
docker-compose logs worker1

# Test network connectivity
docker exec -it <controller_container> ping 172.20.0.11

Resource Issues

# Monitor resource usage
docker stats

# Reduce thread count if CPU usage is too high
CONTROLLER_NTHREADS=2 WORKER_NTHREADS=2 docker-compose up

Web Interface

You can use the web chat interface at llama-ui.js.org:

Open the website
Go to settings
Set base URL to: http://your-pi-ip:9999
Save and start chatting

License

This Docker setup follows the same license as the main Distributed Llama project.

5.2 KiB Raw Permalink Blame History

Distributed Llama Docker Setup for Raspberry Pi

Quick Start with Docker Compose

1. Download a Model

2. Start the Distributed Setup

3. Test the API

Manual Docker Usage

Building the Images

Running the Controller

Running Workers

Available Models

Configuration Options

Controller Options

Worker Options

Environment Variables (Docker Compose)

Multi-Device Setup

Device 1 (Controller)

Devices 2-4 (Workers)

Performance Tips

Troubleshooting

Model Download Issues

Worker Connection Issues

Resource Issues

Web Interface

License

5.2 KiB

Raw Permalink Blame History