Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 107 additions & 0 deletions docs/configuring.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,113 @@ liveness detection. Default: `300` (5 minutes).
spock.apply_idle_timeout = 300
```

### Logical Slot Failover (HA Standby)

Spock creates logical replication slots on each provider node. For high
availability with a physical standby, these slots must be synchronized to the
standby so that replication can resume without data loss after a failover.

See [Logical Slot Failover](logical_slot_failover.md) for full setup instructions.

The behaviour depends on the PostgreSQL version:

| PostgreSQL | Slot sync mechanism | Spock worker |
|---|---|---|
| 15, 16 | Spock built-in `spock_failover_slots` worker | Always runs |
| 17 | Spock worker OR native `sync_replication_slots` | Spock worker yields to native if enabled |
| 18+ | Native `sync_replication_slots` (required) | Not registered |
Comment thread
coderabbitai[bot] marked this conversation as resolved.

#### PostgreSQL 17 and Later (Native Slot Sync)

On PostgreSQL 17+, Spock marks every logical slot with the `FAILOVER` flag
at creation time. PostgreSQL's built-in slotsync worker then synchronizes
those slots automatically.

On **PostgreSQL 18+**, Spock's own failover worker is not registered. You
must configure the native mechanism:

**Primary (`postgresql.conf`):**
```ini
synchronized_standby_slots = 'physical_slot_name'
```
Comment thread
coderabbitai[bot] marked this conversation as resolved.

**Standby (`postgresql.conf`):**
```ini
sync_replication_slots = on
primary_conninfo = 'host=<primary_host> dbname=<dbname> ...'
primary_slot_name = 'physical_slot_name'
hot_standby_feedback = on
```

After a failover, subscribers only need to update their `host=` in the
connection string — replication resumes from the last synchronized LSN with
no data loss.

#### PostgreSQL 15, 16, and 17 (Spock Built-in Worker)

On PostgreSQL 15, 16, and 17, Spock's `spock_failover_slots` background worker
handles slot synchronization. On PostgreSQL 17 it yields to the native
slotsync worker when `sync_replication_slots = on` is enabled. Configure it
with the GUCs below.

### `spock.synchronize_slot_names`

List of slot name patterns to synchronize from primary to physical standby.
Accepts name prefixes (`name:foo`) or LIKE patterns (`name_like:spock%`).
Default: `name_like:%%` (synchronize all logical slots).

```ini
spock.synchronize_slot_names = 'name_like:%%'
```

Used on PostgreSQL 15, 16, and 17 (when `sync_replication_slots` is not
enabled). On PostgreSQL 17 with native slotsync active, or on PostgreSQL 18+,
this setting is ignored.

### `spock.drop_extra_slots`

When `on` (the default), the `spock_failover_slots` worker drops any slots
on the standby that do not match `spock.synchronize_slot_names`.

```ini
spock.drop_extra_slots = on
```

### `spock.primary_dsn`

Connection string used by the `spock_failover_slots` worker to connect to
the primary and read slot state. If empty, `primary_conninfo` from
`postgresql.conf` is used.

```ini
spock.primary_dsn = ''
```

### `spock.pg_standby_slot_names`

Comma-separated list of physical replication slot names that must confirm
durable flush of a given LSN before the walsender is allowed to replicate
logical changes beyond that LSN. This prevents a physical standby from
falling behind a logical subscriber.

```ini
spock.pg_standby_slot_names = 'physical_slot_1,physical_slot_2'
```

Used on PostgreSQL 15, 16, and 17 (when `sync_replication_slots` is not
enabled). On PostgreSQL 17+ with native slotsync, or on PostgreSQL 18+, use
`synchronized_standby_slots` instead.

### `spock.standby_slots_min_confirmed`

Number of slots from `spock.pg_standby_slot_names` that must confirm a
given LSN before logical replication is allowed to proceed. The default
`-1` requires all listed slots to confirm. `0` disables the check.

```ini
spock.standby_slots_min_confirmed = -1
```

### `spock.include_ddl_repset`

`spock.include_ddl_repset` enables spock to automatically add tables to
Expand Down
159 changes: 159 additions & 0 deletions docs/logical_slot_failover.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
# Logical Slot Failover

Spock creates logical replication slots on each provider node. For high
availability with a physical standby, these slots must be synchronized to the
standby so that replication can resume without data loss after a failover.

## How It Works

When a primary server fails and a physical standby is promoted, any active
logical subscribers must be able to continue replicating from the new primary.
This requires the logical replication slots — which track each subscriber's
replication position — to be present and up to date on the standby before the
failover occurs.

Without slot synchronization, a failover would require manual slot recreation
and a full re-sync of all subscriber tables.

## PostgreSQL Version Behaviour

| PostgreSQL | Slot sync mechanism | Spock worker |
|---|---|---|
| 15, 16 | Spock built-in `spock_failover_slots` worker | Always runs on standby |
| 17 | Spock worker **or** native `sync_replication_slots` | Yields to native if enabled |
| 18+ | Native `sync_replication_slots` (required) | Not registered |

On **PostgreSQL 17+**, Spock marks every logical slot with the `FAILOVER` flag
at creation time. This enables PostgreSQL's built-in slotsync worker to pick
them up automatically.

On **PostgreSQL 18+**, Spock's own failover worker is not registered at all.
The native slotsync worker is the only mechanism.

## Setup: PostgreSQL 18+ (Native)

### 1. Create a physical replication slot on the primary

```sql
SELECT pg_create_physical_replication_slot('spock_standby_slot');
```

### 2. Configure the primary (`postgresql.conf`)

```ini
# Hold walsenders back until the standby has confirmed this LSN,
# preventing logical subscribers from getting ahead of the standby.
synchronized_standby_slots = 'spock_standby_slot'
```

### 3. Configure the standby (`postgresql.conf`)

```ini
sync_replication_slots = on
primary_conninfo = 'host=<primary_host> port=5432 dbname=<dbname> user=replicator'
primary_slot_name = 'spock_standby_slot'
hot_standby_feedback = on
```

### 4. Verify slot synchronization

On the standby, confirm that Spock's logical slots are synchronized:

```sql
SELECT slot_name, synced, failover, invalidation_reason
FROM pg_replication_slots
WHERE NOT temporary;
```

All Spock slots should show `synced = true` and `failover = true`.

### 5. After failover

After promoting the standby, subscribers only need to update their connection
string to point to the new primary. Replication resumes from the last
synchronized LSN with no data loss and no slot recreation required.

## Setup: PostgreSQL 15 and 16 (Spock Worker)

On PostgreSQL 15 and 16, the `spock_failover_slots` background worker runs
on the standby and periodically copies slot state from the primary.

### Requirements

- `hot_standby_feedback = on` on the standby (required for the worker to run)
- The standby must be able to connect to the primary

### Configuration GUCs

| GUC | Default | Description |
|---|---|---|
| `spock.synchronize_slot_names` | `name_like:%%` | Slot name patterns to sync (all by default) |
| `spock.drop_extra_slots` | `on` | Drop standby slots not matching the pattern |
| `spock.primary_dsn` | `''` | DSN to connect to primary (falls back to `primary_conninfo`) |
| `spock.pg_standby_slot_names` | `''` | Physical slots that must confirm LSN before logical replication advances |
| `spock.standby_slots_min_confirmed` | `-1` | How many slots from `pg_standby_slot_names` must confirm (`-1` = all) |

### Example (`postgresql.conf` on standby)

```ini
hot_standby_feedback = on
spock.synchronize_slot_names = 'name_like:%%'
spock.drop_extra_slots = on

# Optional: hold walsenders on primary until this standby confirms
# (set this on the PRIMARY, not the standby)
# spock.pg_standby_slot_names = 'physical_slot_name'
```

## Monitoring

### Check slot sync status (PG17+)

```sql
SELECT slot_name,
failover,
synced,
active,
invalidation_reason,
confirmed_flush_lsn
FROM pg_replication_slots
WHERE NOT temporary
ORDER BY slot_name;
```

### Check if native slotsync worker is active (PG17+)

```sql
SELECT pid, wait_event_type, wait_event, state
FROM pg_stat_activity
WHERE backend_type = 'slot sync worker';
```

### Check spock worker is running (PG15/16)

```sql
SELECT pid, application_name, state
FROM pg_stat_activity
WHERE application_name = 'spock_failover_slots worker';
```

## FAQ

**Q: Do I need to do anything after a failover?**

On PG17+: Just update the subscriber's `host=` in their DSN. No slot
recreation needed.

On PG15/16: Spock's worker on the standby (now primary) stops running
since it is no longer in recovery. Subscribers reconnect automatically.

**Q: What if `sync_replication_slots` is not configured on PG18?**

Spock's worker is not registered on PG18. If `sync_replication_slots = on`
is not set, logical slots will **not** be synchronized to standbys, and a
failover will require manual slot recreation and table re-sync.

**Q: Can I use both mechanisms on PG17?**

No. If `sync_replication_slots = on` is set on PG17, Spock's worker detects
this and skips its sync loop, deferring to the native worker entirely.
16 changes: 16 additions & 0 deletions docs/spock_release_notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,22 @@

## Spock 5.1 on xxx

### Logical Slot Failover Improvements

* On **PostgreSQL 17+**, Spock now creates all logical replication slots with
the `FAILOVER` flag, allowing PostgreSQL's built-in slotsync worker
(`sync_replication_slots = on`) to automatically synchronize them to
physical standbys.
* On **PostgreSQL 18+**, Spock's own `spock_failover_slots` background worker
is no longer registered. The native PostgreSQL slotsync worker fully
replaces it. See the [Logical Slot Failover](configuring.md#logical-slot-failover-ha-standby)
section in the configuration guide for required `postgresql.conf` settings.
* On **PostgreSQL 17**, Spock's worker remains active but automatically yields
to the native slotsync worker if `sync_replication_slots = on` is set,
preventing conflicts.



This release deprecates the spock.exception_replay_queue_size GUC. Previously Spock restored transaction changes up to the size defined by the spock.exception_replay_queue_size GUC. If an error occurred, the transaction was replayed, and if the size was less than the exception queue, the cache was used. If the size was greater than the limit, it was resent from the origin.

Now no restriction exists. Spock will use memory until memory is exhausted (improving performance for huge transactions). If an allocation fails, Spock performs as specified by the spock.exception_behavior GUC:
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ nav:
- Installing and Configuring Spock: install_spock.md
- Creating a Two-Node Cluster: two_node_cluster.md
- Using Advanced Configuration Options: configuring.md
- Logical Slot Failover (HA Standby): logical_slot_failover.md
- Upgrading a Spock Installation: upgrading_spock.md
- Conflict Types and Resolution: conflict_types.md
- Conflict Avoidance and Delta-Apply Columns: conflicts.md
Expand Down
Loading
Loading