Files
dllama/docs/HOW_TO_RUN_GPU.md
Chris 42172cbb6f
Some checks failed
main / Linux (amd64, ubuntu-22.04) (push) Successful in 49s
main / Linux (arm64, ubuntu-24.04-arm) (push) Has been cancelled
main / Windows (push) Has been cancelled
init
2025-10-24 11:42:14 +02:00

1.3 KiB

How to Run Distributed Llama on 🧠 GPU

Distributed Llama can run on GPU devices using Vulkan API. This article describes how to build and run the project on GPU.

Before you start here, please check how to build and run Distributed Llama on CPU:

To run on GPU, please follow these steps:

  1. Install Vulkan SDK for your platform.
  1. Build Distributed Llama with GPU support:
DLLAMA_VULKAN=1 make dllama
DLLAMA_VULKAN=1 make dllama-api
  1. Now dllama and dllama-api binaries supports arguments related to GPU usage.
--gpu-index <index>   Use GPU device with given index (use `0` for first device)
  1. You can run the root node or worker node on GPU by specifying the --gpu-index argument. Vulkan backend requires single thread, so you should also set --nthreads 1.
./dllama inference ... --nthreads 1 --gpu-index 0 
./dllama chat      ... --nthreads 1 --gpu-index 0 
./dllama worker    ... --nthreads 1 --gpu-index 0 
./dllama-api       ... --nthreads 1 --gpu-index 0