Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 49 additions & 0 deletions docs/how-xtrabackup-works-explanation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Explanation: how Percona XtraBackup achieves consistency

PXB’s behavior—redo, undo, locks, and how far “lockless” can go. Procedures: [How-to: backup lifecycle operations](how-xtrabackup-works-how-to.md). Facts and defaults: [Reference: backup lifecycle](how-xtrabackup-works-reference.md).

## Why a live copy isn’t consistent yet

InnoDB keeps writing while PXB reads files, so the copied pages alone are not one committed state. PXB adds a continuous redo capture from backup start to end—physical page changes. `--prepare` then replays that redo on the copy: same idea as InnoDB crash recovery, but offline on the backup tree.

Consistency appears in the prepare step, not in the raw copy.

## The redo thread

PXB stamps an LSN and copies InnoDB data while a background thread streams redo for the whole run. Redo files wrap and recycle; the thread must keep every record prepare will need through the backup’s end point. Data files plus captured redo let prepare roll pages forward to one moment.

Optional `--register-redo-log-consumer` registers PXB so the server won’t purge redo before PXB reads redo—vital on brutally write-heavy hosts—at the cost of more redo on disk and tighter free space.

## Why locking splits into three beats

PXB stages work so InnoDB DML keeps moving during the bulk copy:

1. Open copy: copy InnoDB data and redo while transactions run.

2. Backup lock: copy non-InnoDB engines without a coarse global read lock. DDL tightens here; exactly what is blocked depends on the server.

3. Short binlog lock: `LOCK BINLOG FOR BACKUP` briefly freezes coordinate changes while PXB reads `performance_schema.log_status` and finishes coordination redo.

`--lock-ddl=ON` grabs the backup lock immediately; `--lock-ddl=REDUCED` waits until InnoDB is copied—you trade how long DDL stays restricted against when DDL can race the InnoDB pass.

## When “lockless” is even possible

PXB can skip backup locks only if every table in every schema—including `mysql`—sits on InnoDB. Most sites still have CSV or MyISAM in `mysql` (e.g. `general_log`), so truly lockless runs are rare unless you’ve forced all storage to InnoDB.

Replication quirks differ by build: Percona Server may fold relay coordinates into `log_status`, easing `--slave-info`; stock MySQL can still demand `FLUSH TABLES WITH READ LOCK` for some relay-position needs.

## Prepare: redo alone isn’t enough

Redo replay is physical: logged page changes get reapplied. Those records can include uncommitted work—the server may have flushed dirty pages and redo for transactions still open at backup end.

After redo, undo plus Serialized Dictionary Information (SDI) in the tablespaces drives the logical rollback of that in-flight work (SDI stands in for old `.frm` layout during rollback). When prepare finishes, InnoDB matches the backup’s close-out point with open transactions stripped out; non-InnoDB files already match because backup copied them under the right locks.

## Cloud and streaming

`xbcloud` or streaming doesn’t change the mental model—slow networks just stretch Final Sync, so short locks can linger. Plan longer critical sections.

## See also

* [Tutorial: walk through a physical backup](how-xtrabackup-works-tutorial.md)

* [How Percona XtraBackup works](how-xtrabackup-works.md) (hub)
92 changes: 92 additions & 0 deletions docs/how-xtrabackup-works-how-to.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Backup lifecycle

* Grant only the privileges your workflow requires.

* Run backup, prepare, and restore in sequence.

* Capture binary log coordinates for recovery and replication.

* Choose operational flags such as `--lock-ddl` and `--register-redo-log-consumer` to match your workload and lock tolerance.

## Grant the minimum privileges

1. Grant `BACKUP_ADMIN` so PXB can read `performance_schema.log_status` and take backup locks (`LOCK INSTANCE FOR BACKUP`, `LOCK TABLES FOR BACKUP`, or `LOCK BINLOG FOR BACKUP`) as your server and options require.

2. Add `RELOAD`, `LOCK TABLES`, or `REPLICATION CLIENT` only when the workflow requires them (for example, `FLUSH TABLES WITH READ LOCK` or `--slave-info`).

See lists and examples in [Connection and privileges needed](privileges.md).

## Run a backup and capture binary log coordinates

1. Run `xtrabackup` against your target directory with your usual options.

2. Redirect STDERR to capture the binary log position—PXB writes coordinates there—for example, `xtrabackup … 2> backupout.log`.

3. Confirm the command returns exit code `0`.

See the file list created by a backup in [Index of files created by Percona XtraBackup](xtrabackup-files.md).

## Pick `--lock-ddl` (backup locks)

When backup locks are available, PXB uses them to copy non-InnoDB files without stalling InnoDB DML.

| Option | Effect |
|--------|--------|
| `--lock-ddl=ON` (default) | Take the backup lock at start. DDL stays blocked for the full run (unless you change the workflow). |
| `--lock-ddl=REDUCED` | Take the lock after InnoDB is copied. This gives a shorter DDL block with a different trade-off vs DDL during the InnoDB phase. |

See a compact comparison in [Reference: backup lifecycle](how-xtrabackup-works-reference.md).

## Turn on `--register-redo-log-consumer` or not

Enable this option when heavy write throughput risks purging redo before PXB finishes reading it. Enabling it prevents failures caused by missing redo.

Review these checks before you enable it.

* Plan for redo bloat and possible disk exhaustion, because the server retains redo longer.

* Monitor free space and I/O throughout the run.

* Abort the backup (Ctrl+C or SIGTERM to `xtrabackup`) if disk runs critical. The consumer then releases, the server purges redo, and you can free space and retry.

The default is off. Enable it only when you have spare disk.

## Prepare (`--prepare`)

1. Run `xtrabackup --prepare` with `--target-dir` set to the backup directory.

2. On large or incremental backups, tune prepare performance:

* Raise `--use-memory` to 1G–2G when RAM allows (the default is small).

* Pass `--parallel` to apply `.delta` files in parallel on incremental backups (8.4.0-3+). It does not parallelize first-pass redo on a full backup.

```bash
xtrabackup --prepare --use-memory=2G --parallel=4 --target-dir=/data/backups/
```

If prepare runs on the same host as production `mysqld`, leave RAM headroom for the OS and server to avoid OOM kills.

[`--use-memory`](xtrabackup-option-reference.md#use-memory), [`--parallel`](xtrabackup-option-reference.md#parallel).

## Restore (`--copy-back` or `--move-back`)

1. Stop `mysqld` if it owns the destination `datadir`.

2. Run `xtrabackup --copy-back` or `--move-back` with `--target-dir` set to the prepared backup. PXB reads destination paths from config (`datadir`, InnoDB paths, log paths, and related settings).

3. Set correct ownership and permissions before starting `mysqld` (for example, `chown -R mysql:mysql /var/lib/mysql`). Backup files belong to the user who ran `xtrabackup`, but the server expects the `mysql` user.

4. Start `mysqld`.

Use `--move-back` only when you have limited disk space. Unlike `--copy-back`, this command moves files out of the backup directory instead of copying them, which destroys the original backup during the restore.

See full restore flows in [Restore full, incremental, and compressed backups](restore-a-backup.md).

## Stream or send backups to the cloud

The steps are the same. Slow networks stretch PXB’s final sync and hold the short backup locks—including the binary log lock—longer. Run streaming backups inside a maintenance window, and size your bandwidth and destination throughput to keep the final sync short.

* [Take a streaming backup](take-streaming-backup.md)

* [xbcloud binary overview](xbcloud-binary-overview.md)
92 changes: 92 additions & 0 deletions docs/how-xtrabackup-works-reference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Backup lifecycle

Quick-scan facts—privileges, phases, defaults, behavior. For a walkthrough see the [tutorial](how-xtrabackup-works-tutorial.md); for procedures see the [how-to](how-xtrabackup-works-how-to.md).


## Privileges (summary)

| Privilege | Typical use |
|-----------|-------------|
| `BACKUP_ADMIN` | Read `performance_schema.log_status`; use `LOCK INSTANCE FOR BACKUP`, `LOCK TABLES FOR BACKUP`, `LOCK BINLOG FOR BACKUP` when applicable |
| `RELOAD` | Workflows using `FLUSH TABLES WITH READ LOCK` or similar |
| `LOCK TABLES` | Workflows that issue table-level locks |
| `REPLICATION CLIENT` | `--slave-info` and related replication metadata |

[Connection and privileges needed](privileges.md) has the full list.

## Three operator-facing phases

| Step | Phase | What happens |
|------|-------|----------------|
| 1 | Backup (hot copy) | Copy data files; stream redo for the whole run |
| 2 | Prepare | Replay redo; roll back uncommitted work |
| 3 | Restore | `--copy-back` or `--move-back` into `datadir` |

## Three in-backup sub-phases (locking)

With backup locks, Percona XtraBackup (PXB) typically runs the stages in the table below. Each numbered row is a phase in order: the Locks column is the global lock posture for that phase, and the Work column is what PXB copies or records while those locks apply—InnoDB pages and redo first (often with no global backup lock), then non-InnoDB files under the backup lock (with DDL restricted), then a short `LOCK BINLOG FOR BACKUP` to pin binlog or replica coordinates. Exact steps depend on server version and which storage engines are in use.

| # | Locks | Work |
|---|--------|------|
| 1 | None global | Copy InnoDB files and redo while data manipulation language (DML) runs |
| 2 | Backup lock | Copy non-InnoDB files (e.g. `.frm`, `.MRG`, `.MYD`, `.MYI`, `.CSM`, `.CSV`, `.sdi` (serialized dictionary information), `.par`); data definition language (DDL) restricted, DML usually allowed—details are server-specific |
| 3 | `LOCK BINLOG FOR BACKUP` (short) | Pin binlog/replica coordinates; read `performance_schema.log_status`; drop locks |

Backup locks vs FTWRL (`FLUSH TABLES WITH READ LOCK`): [Percona Server backup locks](https://docs.percona.com/percona-server/innovation-release/backup-locks.html).
MySQL {{vers}}: [`LOCK INSTANCE FOR BACKUP`](https://dev.mysql.com/doc/refman/{{vers}}/en/lock-instance-for-backup.html).

## `--lock-ddl`

| Value | When the backup lock is taken (typical) |
|-------|----------------------------------------|
| `ON` (default) | Backup start |
| `REDUCED` | After InnoDB data copy finishes |

`ON` blocks DDL for the whole window from the beginning; `REDUCED` delays that until InnoDB is copied (shorter DDL stall, different overlap with DDL during the InnoDB phase). InnoDB DML usually keeps running through the main copy.

## `--register-redo-log-consumer`

| Field | Detail |
|-------|--------|
| Default | Off |
| Does | Registers PXB as a redo consumer so the server won’t purge a redo file until PXB copies that file |
| Cost | Redo can pile up (“redo bloat”); disk use may spike |
| Writes | Server may stall writes briefly while the consumer advances |

## Prepare options (subset)

| Option | Notes |
|--------|--------|
| `--use-memory` | Random-access memory (RAM) for prepare only (default 100MB); larger often helps |
| `--parallel` | 8.4.0-3+: parallel `.delta` apply for incrementals; not the same as parallel full-backup redo replay |

Full flags: [xtrabackup option reference](xtrabackup-option-reference.md).

## Restore (facts)

* Pulls paths from `my.cnf` (e.g. `datadir`, `innodb_data_home_dir`, `innodb_data_file_path`, `innodb_log_group_home_dir`).

* Order: MyISAM-family files first (`.MRG`, `.MYD`, `.MYI`, `.CSM`, `.CSV`, `.sdi`, `.par`), then InnoDB tables/indexes, then logs.

* Keeps file attributes.

* Successful backup prints binlog coordinates to standard error (STDERR)—redirect if you need a file.

## When backup locks are skipped (“lockless”)

Locks stay off only if every table in every schema—including `mysql`—is InnoDB. Commonly `mysql` still holds CSV/MyISAM tables (e.g. `general_log`), so PXB usually takes backup locks anyway.

| Server | Notes |
|--------|--------|
| Percona Server for MySQL {{vers}} | `log_status` may carry relay coordinates; `--slave-info` can skip extra locks |
| Standard MySQL {{vers}} | May still need `FLUSH TABLES WITH READ LOCK` with `--slave-info` when you need relay position |

## Cloud / streaming

Same phases; slow networks stretch Final Sync and can lengthen short locks. [Streaming backup](take-streaming-backup.md), [xbcloud overview](xbcloud-binary-overview.md).

## See also

* [Index of files created by Percona XtraBackup](xtrabackup-files.md)

* [Restore full, incremental, and compressed backups](restore-a-backup.md)
79 changes: 79 additions & 0 deletions docs/how-xtrabackup-works-tutorial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# PXB in practice: backup, prepare, restore

Follow PXB’s order—backup, prepare, restore—and you’ll see where consistency appears. Use the [hub](how-xtrabackup-works.md) to jump to runbooks, tables, or deeper theory.

## Outcomes

You’ll be able to:

* Name the three phases: backup, prepare, restore.

* Say why a live file copy isn’t consistent yet, and what makes the dataset consistent.

* Explain why the redo thread runs in parallel with the file copy, and why `.ibd` files alone would not suffice.

* Describe roll forward versus roll back, and how undo plus SDI handle transactions still open when backup ends.

* Sketch when `--register-redo-log-consumer` helps on busy servers, and why redo bloat threatens disk space.

* Contrast `--lock-ddl=ON` with `--lock-ddl=REDUCED` in terms of when the backup lock appears.

## Step 1 — Hot copy (backup)

PXB copies InnoDB and other files while MySQL keeps serving traffic. The copy takes time, so on-disk pages at any instant don’t match one commit point—that’s normal. You still can’t start `mysqld` on that raw tree.

PXB records a start LSN and runs a redo thread until backup ends. That stream records every change in the window so a later step can bring the copied pages current.

Takeaway: backup holds files plus redo, not a frozen disk image.

### Why a parallel redo stream matters

The `.ibd` read spans real clock time: pages copied early reflect older LSNs than pages copied later. The redo thread exists so PXB also captures every InnoDB change from backup start through backup end. Without that continuous redo capture, there would be no complete log to replay and converge on a single consistent point.

If PXB copied only `.ibd` data files and skipped redo, the directory would hold physically mixed page images with no way to run InnoDB’s normal recovery to one LSN—the dataset would not be a valid starting point for a server.

### When `--register-redo-log-consumer` enters the picture

On a high-traffic host, redo files can rotate or get purged before PXB finishes reading them, which risks a failed or unusable backup. Optional `--register-redo-log-consumer` registers PXB so the server retains redo until PXB has copied the needed files (see [How-to: backup lifecycle operations](how-xtrabackup-works-how-to.md)). The trade-off is redo bloat: redo stays on disk longer while backup runs, so a long backup on a heavy-write system can consume a lot of space—monitor free disk closely.

### `--lock-ddl=ON` versus `--lock-ddl=REDUCED`

Both modes relate to when PXB takes the backup lock used for copying non-InnoDB files without stopping InnoDB DML.

* With `--lock-ddl=ON` (default), the backup lock is taken at backup start, so DDL faces restriction from the first moment of the run.

* With `--lock-ddl=REDUCED`, the backup lock is taken only after InnoDB data has been copied, which shortens how long DDL must wait under that lock, but changes overlap with DDL during the InnoDB phase.

Compact timing table: [Reference: backup lifecycle](how-xtrabackup-works-reference.md).

## Step 2 — Prepare (make the dataset consistent)

After backup completes, the target directory is still raw: InnoDB pages reflect mixed points in time even though redo for the whole window sits beside them. A MySQL server must not use that tree as a live datadir until `--prepare` runs.

`--prepare` performs two coordinated steps:

* Roll forward: replay captured redo onto the copied tablespaces (physical page updates).

* Roll back: remove changes from transactions that had not committed at the backup end point. Redo may have replayed those changes; undo records describe how to reverse row-level effects, and Serialized Dictionary Information (SDI) in the tablespaces supplies table and index definitions so rollback can interpret undo logically (the role older `.frm` files once played).

Non-InnoDB tables match the same moment because PXB copied them under the right locks during backup.

Takeaway: the prepare step turns raw files plus redo into a crash-recovery-consistent dataset—the same class of state InnoDB expects after offline recovery—not the copy step alone.

## Step 3 — Restore (deploy)

`--copy-back` or `--move-back` puts the prepared files in the datadir; set ownership and permissions, then start `mysqld`.

Takeaway: restore mostly moves files; the prepare step did the hard part.

## How the phases fit together

InnoDB crash recovery does the same thing online: redo advances pages, undo drops half-finished transactions. The backup workflow runs that pipeline offline on the backup.

## Where to go next

| Goal | Page |
|------|------|
| Run commands, pick options | [How-to: backup lifecycle operations](how-xtrabackup-works-how-to.md) |
| Look up privileges and phases | [Reference: backup lifecycle](how-xtrabackup-works-reference.md) |
| Redo, undo, locks, lockless | [Explanation: how PXB achieves consistency](how-xtrabackup-works-explanation.md) |
Loading
Loading