Skip to content

Commit 719853c

Browse files
authored
Merge pull request #169 from realyota/mbak_kb_changes
Changes for clickhouse-keeper article
2 parents 8069ff3 + 82ac9ba commit 719853c

1 file changed

Lines changed: 157 additions & 74 deletions

File tree

content/en/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper.md

Lines changed: 157 additions & 74 deletions
Original file line numberDiff line numberDiff line change
@@ -12,60 +12,79 @@ Since 2021 the development of built-in ClickHouse® alternative for Zookeeper is
1212

1313
See slides: https://presentations.clickhouse.com/meetup54/keeper.pdf and video https://youtu.be/IfgtdU1Mrm0?t=2682
1414

15-
## Current status (last updated: July 2023)
15+
## Current status (last updated: March 2026)
1616

17-
Since version 23.3 we recommend using clickhouse-keeper for new installations.
17+
ClickHouse Keeper is the recommended choice for new installations. It yields better performance in many cases due to the new features, like async replication or multi read. Some ClickHouse server features cannot be used without Keeper, for example the S3Queue.
1818

19-
Even better if you will use the latest version of clickhouse-keeper (currently it's 23.7), and it's not necessary to use the same version of clickhouse-keeper as ClickHouse itself.
19+
- Use the latest Keeper version available in your supported upgrade path whenever possible.
20+
- The Keeper version doesn’t need to match the ClickHouse server version
21+
- Modern Keeper usually performs better than older versions because the codebase has matured significantly, new protocol feature flags have been added, and internal replication has improved.
2022

2123
For existing systems that currently use Apache Zookeeper, you can consider upgrading to clickhouse-keeper especially if you will [upgrade ClickHouse](https://altinity.com/clickhouse-upgrade-overview/) also.
2224

23-
But please remember that on very loaded systems the change can give no performance benefits or can sometimes lead to a worse performance.
25+
{{% alert title="Warning" color="warning" %}}
26+
Before upgrading ClickHouse Keeper from version older than 23.9 please check Upgrade caveat for async_replication [Upgrade caveat for async_replication](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper#upgrade-caveat-for-async_replication)
27+
{{% /alert %}}
2428

25-
The development pace of keeper code is [still high](https://github.com/ClickHouse/ClickHouse/pulls?q=is%3Apr+keeper)
26-
so every new version should bring improvements / cover the issues, and stability/maturity grows from version to version, so
27-
if you want to play with clickhouse-keeper in some environment - please use [the most recent ClickHouse releases](https://altinity.com/altinity-stable/)! And of course: share your feedback :)
29+
## How does clickhouse-keeper differ from Zookeeper?
2830

29-
## How does clickhouse-keeper work?
31+
Keeper is optimized for ClickHouse workloads and written in C++ (and can be used as single-binary), so it don't need any external dependencies. It uses the same **client** protocol but both are implementing different consensus protocol: Zookeeper is using ZAB, while ClickHouse Keeper implements eBay NuRAFT [GitHub - eBay/NuRaft: C++ implementation of Raft core logic as a replication library](https://github.com/eBay/NuRaft) which improves stability and performance of base RAFT protocol.
3032

31-
Official docs: https://clickhouse.com/docs/en/guides/sre/keeper/clickhouse-keeper/
33+
ClickHouse Keeper can also run in embedded mode, operating as a separate thread within the ClickHouse server process, which may be suitable for testing purposes or smaller instances where some performance can be sacrificed for simplicity
3234

33-
ClickHouse-keeper still need to be started additionally on few nodes (similar to 'normal' zookeeper) and speaks normal zookeeper protocol - needed to simplify A/B tests with real zookeeper.
35+
## Migration and upgrade guide
3436

35-
To test that you need to run 3 instances of clickhouse-server (which will mimic zookeeper) with an extra config like that:
37+
- A mixed ZooKeeper / ClickHouse Keeper quorum is not supported. Those are different consensus protocols.
38+
- ZooKeeper snapshots and transaction logs are not format-compatible with Keeper. For data migration use `clickhouse-keeper-converter`.
39+
- If the above is too complex you can switch to new, empty Keeper ensemble and recreate the Keeper metadata using `SYSTEM RESTORE REPLICA` calls. This method takes longer time but it is suitable for smaller clusters. Check [procedure to restore multiple tables in RO mode article](https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-check-replication-ddl-queue/#procedure-to-restore-multiple-tables-in-read-only-mode-per-replica)
40+
- Keep in mind that some metadata is available in ZooKeeper only and will be lost if you don't migrate with clickhouse-keeper-converter using above guide. For example: Distributed DDL queue, RBAC data (if configured), etc. Check [Keeper depended features](https://kb.altinity.com/altinity-kb-setup-and-maintenance/keeper-dependent-features) for more information.
3641

37-
[https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_keeper_multinode_simple/configs/enable_keeper1.xml](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_keeper_multinode_simple/configs/enable_keeper1.xml)
42+
### Upgrade caveat for `async_replication`
3843

39-
[https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_keeper_snapshots/configs/enable_keeper.xml](https://github.com/ClickHouse/ClickHouse/blob/master/tests/integration/test_keeper_snapshots/configs/enable_keeper.xml)
44+
`async_replication` is an internal Keeper optimization for RAFT replication and it's turned on by default starting from [25.10](https://github.com/ClickHouse/ClickHouse/pull/88515) . It does not change ClickHouse replicated table semantics, but it can improve Keeper performance.
4045

41-
or event single instance with config like that: [https://github.com/ClickHouse/ClickHouse/blob/master/tests/config/config.d/keeper_port.xml](https://github.com/ClickHouse/ClickHouse/blob/master/tests/config/config.d/keeper_port.xml)
42-
[https://github.com/ClickHouse/ClickHouse/blob/master/tests/config/config.d/zookeeper.xml](https://github.com/ClickHouse/ClickHouse/blob/master/tests/config/config.d/zookeeper.xml)
46+
If you upgrade directly from a version older than `23.9` to `25.10+`:
4347

44-
And point all the ClickHouses (zookeeper config section) to those nodes / ports.
48+
- either upgrade Keeper to `23.9+` first, and then continue to `25.10+`
49+
- or temporarily set `keeper_server.coordination_settings.async_replication=0` during the upgrade and enable it after the upgrade is finished
4550

46-
Latest version is recommended (even testing / master builds). We will be thankful for any feedback.
51+
### Keeper in kubernetes
52+
53+
If you run ClickHouse on Kubernetes with Altinity operator, Keeper can be managed as a dedicated `ClickHouseKeeperInstallation` resource (often abbreviated as CHK). That is usually the cleanest way to run and upgrade a separate Keeper ensemble on Kubernetes. Please check examples [here](https://github.com/Altinity/clickhouse-operator/blob/master/docs/chk-examples/01-chi-simple-with-keeper.yaml).
4754

4855
## systemd service file
4956

50-
See
51-
https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper-service/
57+
See https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper-service/
5258

5359
## init.d script
5460

55-
See
56-
https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper-initd/
61+
See https://kb.altinity.com/altinity-kb-setup-and-maintenance/altinity-kb-zookeeper/clickhouse-keeper-initd/
62+
63+
## More than 3 Keeper nodes
64+
65+
The main issue with a larger Keeper ensemble is that it takes more time to re-elect a leader, and commits take longer, which can slow down insertions and DDL queries.
66+
67+
It should be fine, but we don’t recommend running more than three Keeper nodes (excluding observers).
68+
69+
Increasing the number of nodes offers no significant advantages (unless you need to tolerate the simultaneous failure of two Keeper nodes). In terms of performance, it doesn’t perform better—and may even perform worse—and it consumes additional resources (ZooKeeper requires fast, dedicated disks to perform well, as well as some RAM and CPU).
5770

58-
## Example of a simple cluster with 2 nodes of ClickHouse using built-in keeper
71+
## clickhouse-keeper-client
5972

60-
For example you can start two ClickHouse nodes (hostname1, hostname2)
73+
In clickhouse-keeper-client, paths are now parsed more strictly and must be passed as string literals. In practice, this means using single quotes around paths—for example, ls '/' instead of ls /, and get '/clickhouse/path' instead of get /clickhouse/path.
74+
75+
## Example of a simple cluster
76+
77+
The Keeper ensemble size must be odd because it requires a majority (50% + 1 nodes) to form a quorum. A 2-node Keeper setup will lose quorum after a single node failure, so the recommended number of Keeper replicas is 3.
78+
79+
On `hostname1` and `hostname2` below, ClickHouse can use the embedded Keeper cluster from `<keeper_server>`, so a separate client-side `<keeper>` section is not required. If your ClickHouse servers connect to an external Keeper or ZooKeeper ensemble, see [ClickHouse config for Keeper]({{< ref "clickhouse-keeper-clickhouse-config" >}}).
6180

6281
### hostname1
6382

6483
```xml
6584
$ cat /etc/clickhouse-server/config.d/keeper.xml
6685

6786
<?xml version="1.0" ?>
68-
<yandex>
87+
<clickhouse>
6988
<keeper_server>
7089
<tcp_port>2181</tcp_port>
7190
<server_id>1</server_id>
@@ -76,46 +95,43 @@ $ cat /etc/clickhouse-server/config.d/keeper.xml
7695
<operation_timeout_ms>10000</operation_timeout_ms>
7796
<session_timeout_ms>30000</session_timeout_ms>
7897
<raft_logs_level>trace</raft_logs_level>
79-
<rotate_log_storage_interval>10000</rotate_log_storage_interval>
98+
<rotate_log_storage_interval>10000</rotate_log_storage_interval>
8099
</coordination_settings>
81100

82-
<raft_configuration>
101+
<raft_configuration>
83102
<server>
84-
<id>1</id>
85-
<hostname>hostname1</hostname>
86-
<port>9444</port>
87-
</server>
88-
<server>
89-
<id>2</id>
90-
<hostname>hostname2</hostname>
91-
<port>9444</port>
92-
</server>
93-
</raft_configuration>
94-
103+
<id>1</id>
104+
<hostname>hostname1</hostname>
105+
<port>9444</port>
106+
</server>
107+
<server>
108+
<id>2</id>
109+
<hostname>hostname2</hostname>
110+
<port>9444</port>
111+
</server>
112+
<server>
113+
<id>3</id>
114+
<hostname>hostname3</hostname>
115+
<port>9444</port>
116+
</server>
117+
</raft_configuration>
95118
</keeper_server>
96119

97-
<zookeeper>
98-
<node>
99-
<host>localhost</host>
100-
<port>2181</port>
101-
</node>
102-
</zookeeper>
103-
104120
<distributed_ddl>
105121
<path>/clickhouse/testcluster/task_queue/ddl</path>
106122
</distributed_ddl>
107-
</yandex>
123+
</clickhouse>
108124

109125
$ cat /etc/clickhouse-server/config.d/macros.xml
110126

111127
<?xml version="1.0" ?>
112-
<yandex>
128+
<clickhouse>
113129
<macros>
114130
<cluster>testcluster</cluster>
115131
<replica>replica1</replica>
116132
<shard>1</shard>
117133
</macros>
118-
</yandex>
134+
</clickhouse>
119135
```
120136

121137
### hostname2
@@ -124,7 +140,7 @@ $ cat /etc/clickhouse-server/config.d/macros.xml
124140
$ cat /etc/clickhouse-server/config.d/keeper.xml
125141

126142
<?xml version="1.0" ?>
127-
<yandex>
143+
<clickhouse>
128144
<keeper_server>
129145
<tcp_port>2181</tcp_port>
130146
<server_id>2</server_id>
@@ -135,55 +151,95 @@ $ cat /etc/clickhouse-server/config.d/keeper.xml
135151
<operation_timeout_ms>10000</operation_timeout_ms>
136152
<session_timeout_ms>30000</session_timeout_ms>
137153
<raft_logs_level>trace</raft_logs_level>
138-
<rotate_log_storage_interval>10000</rotate_log_storage_interval>
154+
<rotate_log_storage_interval>10000</rotate_log_storage_interval>
139155
</coordination_settings>
140156

141-
<raft_configuration>
157+
<raft_configuration>
142158
<server>
143-
<id>1</id>
144-
<hostname>hostname1</hostname>
145-
<port>9444</port>
146-
</server>
147-
<server>
148-
<id>2</id>
149-
<hostname>hostname2</hostname>
150-
<port>9444</port>
151-
</server>
152-
</raft_configuration>
153-
159+
<id>1</id>
160+
<hostname>hostname1</hostname>
161+
<port>9444</port>
162+
</server>
163+
<server>
164+
<id>2</id>
165+
<hostname>hostname2</hostname>
166+
<port>9444</port>
167+
</server>
168+
<server>
169+
<id>3</id>
170+
<hostname>hostname3</hostname>
171+
<port>9444</port>
172+
</server>
173+
</raft_configuration>
154174
</keeper_server>
155175

156-
<zookeeper>
157-
<node>
158-
<host>localhost</host>
159-
<port>2181</port>
160-
</node>
161-
</zookeeper>
162-
163176
<distributed_ddl>
164177
<path>/clickhouse/testcluster/task_queue/ddl</path>
165178
</distributed_ddl>
166-
</yandex>
179+
</clickhouse>
167180

168181
$ cat /etc/clickhouse-server/config.d/macros.xml
169182

170183
<?xml version="1.0" ?>
171-
<yandex>
184+
<clickhouse>
172185
<macros>
173186
<cluster>testcluster</cluster>
174187
<replica>replica2</replica>
175188
<shard>1</shard>
176189
</macros>
177-
</yandex>
190+
</clickhouse>
178191
```
179192

180-
### on both
193+
### hostname3
194+
195+
```xml
196+
$ cat /etc/clickhouse-keeper/keeper_config.xml
197+
198+
<?xml version="1.0" ?>
199+
<clickhouse>
200+
<keeper_server>
201+
<tcp_port>2181</tcp_port>
202+
<server_id>3</server_id>
203+
<log_storage_path>/var/lib/clickhouse/coordination/log</log_storage_path>
204+
<snapshot_storage_path>/var/lib/clickhouse/coordination/snapshots</snapshot_storage_path>
205+
206+
<coordination_settings>
207+
<operation_timeout_ms>10000</operation_timeout_ms>
208+
<session_timeout_ms>30000</session_timeout_ms>
209+
<raft_logs_level>trace</raft_logs_level>
210+
<rotate_log_storage_interval>10000</rotate_log_storage_interval>
211+
</coordination_settings>
212+
213+
<raft_configuration>
214+
<server>
215+
<id>1</id>
216+
<hostname>hostname1</hostname>
217+
<port>9444</port>
218+
</server>
219+
<server>
220+
<id>2</id>
221+
<hostname>hostname2</hostname>
222+
<port>9444</port>
223+
</server>
224+
<server>
225+
<id>3</id>
226+
<hostname>hostname3</hostname>
227+
<port>9444</port>
228+
</server>
229+
</raft_configuration>
230+
</keeper_server>
231+
</clickhouse>
232+
233+
$ clickhouse-keeper --config /etc/clickhouse-keeper/keeper_config.xml
234+
```
235+
236+
### on both ClickHouse nodes
181237

182238
```xml
183239
$ cat /etc/clickhouse-server/config.d/clusters.xml
184240

185241
<?xml version="1.0" ?>
186-
<yandex>
242+
<clickhouse>
187243
<remote_servers>
188244
<testcluster>
189245
<shard>
@@ -198,7 +254,7 @@ $ cat /etc/clickhouse-server/config.d/clusters.xml
198254
</shard>
199255
</testcluster>
200256
</remote_servers>
201-
</yandex>
257+
</clickhouse>
202258
```
203259

204260
Then create a table
@@ -213,3 +269,30 @@ insert into test select number, '' from numbers(100000000);
213269
-- on both nodes:
214270
select count() from test;
215271
```
272+
273+
## Useful references
274+
275+
- Official Keeper guide:
276+
https://clickhouse.com/docs/en/guides/sre/keeper/clickhouse-keeper/
277+
- `clickhouse-keeper-client`:
278+
https://clickhouse.com/docs/en/operations/utilities/clickhouse-keeper-client
279+
- Keeper HTTP API and dashboard (`26.1+`):
280+
https://clickhouse.com/docs/operations/utilities/clickhouse-keeper-http-api
281+
- `system.zookeeper`:
282+
https://clickhouse.com/docs/operations/system-tables/zookeeper
283+
- `system.zookeeper_connection`:
284+
https://clickhouse.com/docs/operations/system-tables/zookeeper_connection
285+
- `system.zookeeper_connection_log`:
286+
https://clickhouse.com/docs/operations/system-tables/zookeeper_connection_log
287+
- `system.zookeeper_info` (`26.1+`):
288+
https://clickhouse.com/docs/operations/system-tables/zookeeper_info
289+
- `system.zookeeper_log`:
290+
https://clickhouse.com/docs/operations/system-tables/zookeeper_log
291+
- `aggregated_zookeeper_log` upstream PR:
292+
resubmit https://github.com/ClickHouse/ClickHouse/pull/87208
293+
- Altinity operator CHK examples:
294+
https://github.com/Altinity/clickhouse-operator/tree/master/docs/chk-examples
295+
- Altinity operator Keeper dashboard JSON:
296+
https://github.com/Altinity/clickhouse-operator/blob/master/grafana-dashboard/ClickHouseKeeper_dashboard.json
297+
- Altinity operator Keeper alert rules:
298+
https://github.com/Altinity/clickhouse-operator/blob/master/deploy/prometheus/prometheus-alert-rules-chkeeper.yaml

0 commit comments

Comments
 (0)