Skip to content

Commit 29e8399

Browse files
committed
update RPC docs
1 parent e93d69c commit 29e8399

1 file changed

Lines changed: 30 additions & 12 deletions

File tree

docs/rpc.md

Lines changed: 30 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
# Building and Using the RPC Server with `stable-diffusion.cpp`
22

3-
This guide covers how to build a version of the RPC server from `llama.cpp` that is compatible with your version of `stable-diffusion.cpp` to manage multi-backends setups. RPC allows you to offload specific model components to a remote server.
3+
This guide covers how to build a version of [the RPC server from `llama.cpp`](https://github.com/ggml-org/llama.cpp/blob/master/tools/rpc/README.md) that is compatible with your version of `stable-diffusion.cpp` to manage multi-backends setups. RPC allows you to offload specific model components to a remote server.
44

55
> **Note on Model Location:** The model files (e.g., `.safetensors` or `.gguf`) remain on the **Client** machine. The client parses the file and transmits the necessary tensor data and computational graphs to the server. The server does not need to store the model files locally.
66
77
## 1. Building `stable-diffusion.cpp` with RPC client
88

99
First, you should build the client application from source. It requires `GGML_RPC=ON` to include the RPC backend to your client.
10+
1011
```bash
1112
mkdir build
1213
cd build
@@ -16,7 +17,7 @@ cmake .. \
1617
cmake --build . --config Release -j $(nproc)
1718
```
1819

19-
> **Note:** Ensure you add the other flags you would normally use (e.g., `-DSD_VULKAN=ON`, `-DSD_CUDA=ON`, `-DSD_HIPBLAS=ON`, or `-DGGML_METAL=ON`), for more information about building `stable-diffusion.cpp` from source, please refer to the `build.md` documentation.
20+
> **Note:** Ensure you add the other flags you would normally use (e.g., `-DSD_VULKAN=ON`, `-DSD_CUDA=ON`, `-DSD_HIPBLAS=ON`, or `-DGGML_METAL=ON`), for more information about building `stable-diffusion.cpp` from source, please refer to the [build.md](build.md) documentation.
2021
2122
## 2. Ensure `llama.cpp` is at the correct commit
2223

@@ -25,6 +26,7 @@ cmake --build . --config Release -j $(nproc)
2526
> **Start from Root:** Perform these steps from the root of your `stable-diffusion.cpp` directory.
2627
2728
1. Read the target commit hash from the submodule tracker:
29+
2830
```bash
2931
# Linux / WSL / MacOS
3032
HASH=$(cat ggml/scripts/sync-llama.last)
@@ -39,8 +41,7 @@ cmake --build . --config Release -j $(nproc)
3941
cd llama.cpp
4042
git checkout $HASH
4143
```
42-
43-
To save on download time and storage, you can use a shallow clone to download only the target commit:
44+
To save on download time and storage, you can use a shallow clone to download only the target commit:
4445
```bash
4546
mkdir -p llama.cpp
4647
cd llama.cpp
@@ -54,15 +55,16 @@ To save on download time and storage, you can use a shallow clone to download on
5455

5556
The RPC server acts as the worker. You must explicitly enable the **backend** (the hardware interface, such as CUDA for Nvidia, Metal for Apple Silicon, or Vulkan) when building, otherwise the server will default to using only the CPU.
5657

57-
To find the correct flags, refer to the official documentation for the `llama.cpp` repository.
58+
To find the correct flags for your system, refer to the official documentation for the [`llama.cpp`](https://github.com/ggml-org/llama.cpp/blob/master/docs/build.md) repository.
5859

5960
> **Crucial:** You must include the compiler flags required to satisfy the API compatibility with `stable-diffusion.cpp` (`-DGGML_MAX_NAME=128`). Without this flag, `GGML_MAX_NAME` will default to `64` for the server, and data transfers between the client and server will fail. Of course, `-DGGML_RPC` must also be enabled.
6061
>
6162
> I recommend disabling the `LLAMA_CURL` flag to avoid unnecessary dependencies, and disabling shared library builds to avoid potential conflicts.
6263

63-
> **Build Target:** We are specifically building the `rpc-server` target. This prevents the build system from compiling the entire `llama.cpp` suite (like `llama-cli`), making the build significantly faster.
64+
> **Build Target:** We are specifically building the `rpc-server` target. This prevents the build system from compiling the entire `llama.cpp` suite (like `llama-server`), making the build significantly faster.
6465

6566
### Linux / WSL (Vulkan)
67+
6668
```bash
6769
mkdir build
6870
cd build
@@ -76,6 +78,7 @@ cmake --build . --config Release --target rpc-server -j $(nproc)
7678
```
7779

7880
### macOS (Metal)
81+
7982
```bash
8083
mkdir build
8184
cd build
@@ -89,6 +92,7 @@ cmake --build . --config Release --target rpc-server
8992
```
9093

9194
### Windows (Visual Studio 2022, Vulkan)
95+
9296
```powershell
9397
mkdir build
9498
cd build
@@ -112,10 +116,13 @@ Start the server. It listens for connections on the default address (usually `lo
112116

113117
**On the Server :**
114118
If running on the same machine, you can use the default address:
119+
115120
```bash
116121
./rpc-server
117122
```
123+
118124
If you want to allow connections from other machines on the network:
125+
119126
```bash
120127
./rpc-server --host 0.0.0.0
121128
```
@@ -129,13 +136,16 @@ If you want to allow connections from other machines on the network:
129136
We're assuming the server is running on your local machine, and listening on the default port `50052`. If it's running on a different machine, you can replace `localhost` with the IP address of the server.
130137

131138
**On the Client:**
139+
132140
```bash
133141
./sd-cli --rpc localhost:50052 --list-devices
134142
```
143+
135144
If the server is running and the client is able to connect, you should see `RPC0 localhost:50052` in the list of devices.
136145

137-
Example output:
146+
Example output:
138147
(Client built without GPU acceleration, two GPUs available on the server)
148+
139149
```
140150
List of available GGML devices:
141151
Name Description
@@ -166,23 +176,31 @@ Example: A main machine (192.168.1.10) with 3 GPUs, with one GPU running CUDA an
166176
**On the first machine (Running two server instances):**
167177

168178
**Terminal 1 (CUDA):**
179+
169180
```bash
170-
# Linux / macOS / WSL
181+
# Linux / WSL
171182
export CUDA_VISIBLE_DEVICES=0
172-
./rpc-server-cuda --host 0.0.0.0
183+
cd ./build_cuda/bin/Release
184+
./rpc-server --host 0.0.0.0
173185

174186
# Windows PowerShell
175187
$env:CUDA_VISIBLE_DEVICES="0"
176-
./rpc-server-cuda --host 0.0.0.0
188+
cd .\build_cuda\bin\Release
189+
./rpc-server --host 0.0.0.0
177190
```
178191

179192
**Terminal 2 (Vulkan):**
193+
180194
```bash
181-
./rpc-server-vulkan --host 0.0.0.0 --port 50053 -d Vulkan1,Vulkan2
195+
cd ./build_vulkan/bin/Release
196+
# ignore the first GPU (used by CUDA server)
197+
./rpc-server --host 0.0.0.0 --port 50053 -d Vulkan1,Vulkan2
182198
```
183199

184200
**On the second machine:**
201+
185202
```bash
203+
cd ./build/bin/Release
186204
./rpc-server --host 0.0.0.0
187205
```
188206

@@ -199,4 +217,4 @@ The client will map these servers to sequential device IDs (e.g., RPC0 from the
199217

200218
## 6. Performance Considerations
201219

202-
RPC performance is heavily dependent on network bandwidth, as large weights and activations must be transferred back and forth over the network, especially for large models, or when using high resolutions. For best results, ensure your network connection is stable and has sufficient bandwidth (>1Gbps recommended).
220+
RPC performance is heavily dependent on network bandwidth, as large weights and activations must be transferred back and forth over the network, especially for large models, or when using high resolutions. For best results, ensure your network connection is stable and has sufficient bandwidth (>1Gbps recommended).

0 commit comments

Comments
 (0)