init

2025-10-24 11:42:14 +02:00
commit 42172cbb6f
85 changed files with 40316 additions and 0 deletions
--- a/docs/HOW_TO_RUN_GPU.md
+++ b/docs/HOW_TO_RUN_GPU.md
@@ -0,0 +1,34 @@
+# How to Run Distributed Llama on 🧠 GPU
+
+Distributed Llama can run on GPU devices using Vulkan API. This article describes how to build and run the project on GPU.
+
+Before you start here, please check how to build and run Distributed Llama on CPU:
+* [🍓 How to Run on Raspberry Pi](./HOW_TO_RUN_RASPBERRYPI.md)
+* [💻 How to Run on Linux, MacOS or Windows](./HOW_TO_RUN_LINUX_MACOS_WIN.md)
+
+To run on GPU, please follow these steps:
+
+1. Install Vulkan SDK for your platform.
+  * Linux: please check [this article](https://vulkan.lunarg.com/doc/view/latest/linux/getting_started_ubuntu.html).
+  * MacOS: download SDK [here](https://vulkan.lunarg.com/sdk/home#mac).
+2. Build Distributed Llama with GPU support:
+
+```bash
+DLLAMA_VULKAN=1 make dllama
+DLLAMA_VULKAN=1 make dllama-api
+```
+
+3. Now `dllama` and `dllama-api` binaries supports arguments related to GPU usage.
+
+```
+--gpu-index <index>   Use GPU device with given index (use `0` for first device)
+```
+
+4. You can run the root node or worker node on GPU by specifying the `--gpu-index` argument. Vulkan backend requires single thread, so you should also set `--nthreads 1`.
+
+```bash
+./dllama inference ... --nthreads 1 --gpu-index 0 
+./dllama chat      ... --nthreads 1 --gpu-index 0 
+./dllama worker    ... --nthreads 1 --gpu-index 0 
+./dllama-api       ... --nthreads 1 --gpu-index 0 
+```