Skip to content

docs(acp): add StatefulSet application DR solution#129

Merged
tossmilestone merged 1 commit intomainfrom
feat/statefulset-dr
Apr 14, 2026
Merged

docs(acp): add StatefulSet application DR solution#129
tossmilestone merged 1 commit intomainfrom
feat/statefulset-dr

Conversation

@tossmilestone
Copy link
Copy Markdown
Member

@tossmilestone tossmilestone commented Mar 12, 2026

Add StatefulSet application DR solution with VolSync.

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive Cross‑Cluster Disaster Recovery guide for stateful applications: Active‑Passive model, data replication with VolSync, and GitOps‑based workload/config overrides.
    • Covers prerequisites, deployment and replication setup (scheduled and one‑time sync), verification, and runbooks for planned migration, emergency failover, and failback, plus limitations (RPO/RTO, cold‑standby cost, snapshot compatibility).

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 12, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a new documentation page that describes an Active‑Passive cross‑cluster disaster recovery workflow for StatefulSets using VolSync (asynchronous CSI snapshots + rsync‑tls), GitOps with replica‑override patterns, deployment prerequisites, ReplicationSource/Destination configs, operational runbooks for migration/failover/failback, and limitations (RPO/RTO, snapshot caveats).

Changes

Cohort / File(s) Summary
Documentation
docs/en/solutions/acp/StatefulSet_Application_DR.md
New comprehensive guide for Active‑Passive cross‑cluster DR for StatefulSets: architecture and roles, prerequisites (VolSync, CSI snapshots), GitOps base/overrides (replica=0 for standby), ReplicationSource/ReplicationDestination examples (cron & one‑time), verification via .status fields, runbooks for planned migration, emergency failover, failback, and limitations/RPO‑RTO considerations.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant GSLB
    participant GitOps
    participant PrimaryCluster
    participant VolSync
    participant StandbyCluster

    User->>GSLB: Resolve service DNS
    GSLB->>PrimaryCluster: Route traffic to Primary
    Note right of PrimaryCluster: Primary serves live workloads
    PrimaryCluster->>VolSync: Publish PV diffs (snapshots + rsync‑tls)
    VolSync->>StandbyCluster: Apply replicated data to PVCs
    GitOps->>StandbyCluster: Sync manifests & overrides (replica=0)
    alt Planned migration / failover
        GitOps->>PrimaryCluster: Apply replica override (scale to 0)
        PrimaryCluster->>VolSync: Trigger final one‑time sync
        GSLB->>StandbyCluster: Switch traffic
        StandbyCluster->>User: Serve traffic
    end
    alt Failback
        StandbyCluster->>VolSync: Replicate back to Primary (reverse roles)
        GitOps->>PrimaryCluster: Restore primary manifests (replicas >0)
        GSLB->>PrimaryCluster: Route traffic back
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • tyzhou111
  • sinbadonline

Poem

🐰 I hopped through YAML and secrets tight,
VolSync whispered snapshots through the night,
GitOps kept replicas in tidy rows,
When storms arise, the standby grows,
I munch on uptime — steady, soft, and bright. 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding documentation for a StatefulSet disaster recovery solution using VolSync.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/statefulset-dr

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/en/solutions/acp/StatefulSet_Application_DR.md`:
- Around line 291-295: Replace the indented command block for the
ReplicationSource status check with a fenced shell code block (```bash ... ```)
so it no longer triggers MD046, and update the explanatory text that references
trigger.manual to describe it as a spec field (e.g., "trigger.manual is a spec
field") rather than a label; apply the same fenced-block and wording correction
to the later occurrence that spans the other referenced lines and ensure the
command shown is the kubectl -n <application-namespace> get ReplicationSource
rs-pvc-01 -o jsonpath='{.status}' example.
- Around line 135-148: The example only creates a headless Service
(metadata.name: my-stateful-app-headless with clusterIP: None and selector app:
my-stateful-app) but omits the client-facing exposure needed for GSLB traffic
switching; update the doc to either add a separate external
Service/Ingress/Gateway example (showing a Service with a clusterIP and/or type:
LoadBalancer or an Ingress/Gateway pointing to the headless selector) or add a
clear sentence after the headless Service block explicitly stating that a
separate client-facing Service/Ingress/Gateway is required for external traffic
and must target the same selector (app: my-stateful-app) to enable the
traffic-switching flow.
- Around line 174-178: The Parameter tables (the block starting with
"**Parameters**:" that lists **application-namespace** and **psk.txt**) need
blank lines inserted immediately before and after the table to satisfy MD058 and
ensure proper rendering; update the markdown around the "**Parameters**:" table
and the other similar tables (the ones containing **application-namespace** /
**psk.txt** entries) by adding one empty line above the table header and one
empty line below the table end so each table is separated from surrounding
paragraphs.
- Around line 40-45: The diagram incorrectly shows a StatefulSet Pod in the
passive cluster; remove or hide the Pod node (PodS / "StatefulSet Pod
<br>replicas=0") from the "Secondary Cluster Passive" subgraph so the visual
matches the text (replicas=0), leaving only the Service (SvcS), Destination
PVC/Snapshot (PVCS), and VolSync ReplicationDestination (VSR_Dest) nodes in that
subgraph.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e347fff8-7779-4eed-b814-323c0b74c76b

📥 Commits

Reviewing files that changed from the base of the PR and between cfa09a1 and 50e0a99.

📒 Files selected for processing (1)
  • docs/en/solutions/acp/StatefulSet_Application_DR.md

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (4)
docs/en/solutions/acp/StatefulSet_Application_DR.md (4)

174-178: ⚠️ Potential issue | 🟡 Minor

Add blank lines before each parameters table (MD058).

At Line 175, Line 208, and Line 248, the table starts immediately after **Parameters**:. Markdownlint MD058 expects a blank line before the table header.

Also applies to: 207-216, 247-257

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/solutions/acp/StatefulSet_Application_DR.md` around lines 174 - 178,
The tables after each "**Parameters**:" heading start immediately on the next
line and violate MD058; insert a single blank line between every
"**Parameters**:" line and its following table header (the lines beginning with
"| **Parameter** | **Explanation** |" / the table rows such as "|
**application-namespace** | ..." and "| **psk.txt**| ...") so each Parameters
section has one blank line before the table.

40-43: ⚠️ Potential issue | 🟡 Minor

Remove the standby Pod node from the passive-cluster diagram.

At Line 40–43, showing StatefulSet Pod <br>replicas=0 in the passive subgraph conflicts with the “no running workload” standby model and can mislead readers during runbook execution.

Suggested doc patch
 subgraph Secondary[Secondary Cluster Passive]
     SvcS[Service]
-    PodS[StatefulSet Pod <br>replicas=0]
     PVCS[(Destination PVC / Snapshot)]
     VSR_Dest[VolSync ReplicationDestination]
 end
@@
-GSLB -. "Standby Traffic <br>(0 Weight)" .-> SvcS
+GSLB -. "Standby Traffic <br>(0 Weight)" .-> SvcS
@@
-SvcP --> PodP
+SvcP --> PodP
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/solutions/acp/StatefulSet_Application_DR.md` around lines 40 - 43,
The passive-cluster diagram incorrectly includes the standby Pod node
"PodS[StatefulSet Pod <br>replicas=0]" which contradicts the intended "no
running workload" standby model; remove the PodS node (and its label
"StatefulSet Pod <br>replicas=0") from the Secondary subgraph so only Service
(SvcS) and PVC/Snapshot (PVCS) remain, ensuring the diagram reflects a passive
cluster with no pods running.

135-148: ⚠️ Potential issue | 🟠 Major

Document a separate client-facing Service/Ingress for GSLB traffic switching.

At Line 135–148, only a headless Service is defined (clusterIP: None). That is fine for StatefulSet identity, but it does not represent the client-facing endpoint required by the GSLB flow described earlier.

Suggested doc patch
 apiVersion: v1
 kind: Service
 metadata:
   name: my-stateful-app-headless
@@
   - name: tcp
     port: 80
     targetPort: 8080
+---
+apiVersion: v1
+kind: Service
+metadata:
+  name: my-stateful-app
+  namespace: <application-namespace>
+spec:
+  selector:
+    app: my-stateful-app
+  ports:
+  - name: tcp
+    port: 80
+    targetPort: 8080

Or add one explicit sentence after the block: “A separate client-facing Service/Ingress/Gateway is required for GSLB and must target app: my-stateful-app.”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/solutions/acp/StatefulSet_Application_DR.md` around lines 135 - 148,
The current manifest only defines a headless Service (clusterIP: None) for
StatefulSet identity but lacks a client-facing endpoint for GSLB; update the doc
by adding either a separate Service/Ingress/Gateway example that selects app:
my-stateful-app (i.e., a normal ClusterIP/LoadBalancer/Ingress/Gateway resource)
or insert one clear sentence immediately after the headless Service block
stating: “A separate client-facing Service/Ingress/Gateway is required for GSLB
and must target app: my-stateful-app.” Reference the headless Service block
(name: my-stateful-app-headless / selector: app: my-stateful-app) when adding
the new Service/Ingress/Gateway guidance.

291-295: ⚠️ Potential issue | 🟡 Minor

Use a proper fenced block and call trigger.manual a spec field.

At Line 291–295, the command block is still indented, which triggers MD046. At Line 313, trigger.manual is described as a “label,” but in your YAML it is a spec field.

Suggested doc patch
-    Check synchronization from the `ReplicationSource`.
-
-    ```bash
-    kubectl -n <application-namespace> get ReplicationSource rs-pvc-01 -o jsonpath='{.status}'
-    ```
+Check synchronization from the `ReplicationSource`.
+
+```bash
+kubectl -n <application-namespace> get ReplicationSource rs-pvc-01 -o jsonpath='{.status}'
+```
@@
-- Create a One-Time `ReplicationSource` with the `trigger.manual` label on the primary cluster.
+- Create a One-Time `ReplicationSource` with the `trigger.manual` spec field on the primary cluster.

Also applies to: 313-313

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/solutions/acp/StatefulSet_Application_DR.md` around lines 291 - 295,
Remove the indentation for the bash snippet so it uses a proper fenced code
block (```bash ... ```) around the kubectl command for ReplicationSource and
update the text that mentions trigger.manual to describe it as a spec field
rather than a label (i.e., change "trigger.manual label" to "trigger.manual spec
field"); ensure references to ReplicationSource and trigger.manual are corrected
consistently (e.g., the Get command block and the "Create a One-Time
ReplicationSource" sentence).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@docs/en/solutions/acp/StatefulSet_Application_DR.md`:
- Around line 174-178: The tables after each "**Parameters**:" heading start
immediately on the next line and violate MD058; insert a single blank line
between every "**Parameters**:" line and its following table header (the lines
beginning with "| **Parameter** | **Explanation** |" / the table rows such as "|
**application-namespace** | ..." and "| **psk.txt**| ...") so each Parameters
section has one blank line before the table.
- Around line 40-43: The passive-cluster diagram incorrectly includes the
standby Pod node "PodS[StatefulSet Pod <br>replicas=0]" which contradicts the
intended "no running workload" standby model; remove the PodS node (and its
label "StatefulSet Pod <br>replicas=0") from the Secondary subgraph so only
Service (SvcS) and PVC/Snapshot (PVCS) remain, ensuring the diagram reflects a
passive cluster with no pods running.
- Around line 135-148: The current manifest only defines a headless Service
(clusterIP: None) for StatefulSet identity but lacks a client-facing endpoint
for GSLB; update the doc by adding either a separate Service/Ingress/Gateway
example that selects app: my-stateful-app (i.e., a normal
ClusterIP/LoadBalancer/Ingress/Gateway resource) or insert one clear sentence
immediately after the headless Service block stating: “A separate client-facing
Service/Ingress/Gateway is required for GSLB and must target app:
my-stateful-app.” Reference the headless Service block (name:
my-stateful-app-headless / selector: app: my-stateful-app) when adding the new
Service/Ingress/Gateway guidance.
- Around line 291-295: Remove the indentation for the bash snippet so it uses a
proper fenced code block (```bash ... ```) around the kubectl command for
ReplicationSource and update the text that mentions trigger.manual to describe
it as a spec field rather than a label (i.e., change "trigger.manual label" to
"trigger.manual spec field"); ensure references to ReplicationSource and
trigger.manual are corrected consistently (e.g., the Get command block and the
"Create a One-Time ReplicationSource" sentence).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 9aaeeecf-2453-4f9a-9f56-81069ae4dfe9

📥 Commits

Reviewing files that changed from the base of the PR and between 50e0a99 and 425569e.

📒 Files selected for processing (1)
  • docs/en/solutions/acp/StatefulSet_Application_DR.md

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
docs/en/solutions/acp/StatefulSet_Application_DR.md (3)

178-178: Use standard line break syntax.

The <br></br> tag is non-standard HTML. Use either <br/> (self-closing tag) or double-space markdown line breaks for better compatibility.

✏️ Proposed fix
-| **psk.txt**| This field adheres to the format expected by stunnel: `<id>:<at least 32 hex digits>`. <br></br>for example, `1:23b7395fafc3e842bd8ac0fe142e6ad1`. |
+| **psk.txt**| This field adheres to the format expected by stunnel: `<id>:<at least 32 hex digits>`. <br/>For example, `1:23b7395fafc3e842bd8ac0fe142e6ad1`. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/solutions/acp/StatefulSet_Application_DR.md` at line 178, Replace the
non-standard <br></br> usage in the table cell that describes **psk.txt** with a
standard line break: either use a self-closing HTML break tag (<br/>) or convert
to a Markdown line break (end the previous line with two spaces) so the example
`1:23b7395fafc3e842bd8ac0fe142e6ad1` renders consistently; update the table cell
text containing "**psk.txt**| This field adheres to the format expected by
stunnel: `<id>:<at least 32 hex digits>`. <br></br>for example, ..."
accordingly.

403-403: Consider using "before" instead of "prior to".

The phrase "prior to a disaster" can be simplified to "before a disaster" for more concise writing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/solutions/acp/StatefulSet_Application_DR.md` at line 403, Replace the
phrase "prior to a disaster" in the sentence that reads "The storage and
computational resources sit idle prior to a disaster, inaccessible to the
external network, and incapable of offloading query pressure like an
Active-Active architecture would." with the simpler "before a disaster" so the
sentence reads "...sit idle before a disaster..."; update the document text in
StatefulSet_Application_DR.md accordingly.

80-80: Consider simplifying "In order to construct".

The phrase "In order to construct" can be shortened to "To construct" for more concise writing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/solutions/acp/StatefulSet_Application_DR.md` at line 80, Replace the
verbose phrase "In order to construct the \"Active-Passive\" setup" with the
concise "To construct the \"Active-Passive\" setup" in the sentence that
describes using Kustomize overrides to set `replicas` (the line mentioning
GitOps Base Configuration Sync and Kustomize overrides manipulating `replicas`
so Primary is `N` and Secondary is `replicas=0`).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@docs/en/solutions/acp/StatefulSet_Application_DR.md`:
- Line 178: Replace the non-standard <br></br> usage in the table cell that
describes **psk.txt** with a standard line break: either use a self-closing HTML
break tag (<br/>) or convert to a Markdown line break (end the previous line
with two spaces) so the example `1:23b7395fafc3e842bd8ac0fe142e6ad1` renders
consistently; update the table cell text containing "**psk.txt**| This field
adheres to the format expected by stunnel: `<id>:<at least 32 hex digits>`.
<br></br>for example, ..." accordingly.
- Line 403: Replace the phrase "prior to a disaster" in the sentence that reads
"The storage and computational resources sit idle prior to a disaster,
inaccessible to the external network, and incapable of offloading query pressure
like an Active-Active architecture would." with the simpler "before a disaster"
so the sentence reads "...sit idle before a disaster..."; update the document
text in StatefulSet_Application_DR.md accordingly.
- Line 80: Replace the verbose phrase "In order to construct the
\"Active-Passive\" setup" with the concise "To construct the \"Active-Passive\"
setup" in the sentence that describes using Kustomize overrides to set
`replicas` (the line mentioning GitOps Base Configuration Sync and Kustomize
overrides manipulating `replicas` so Primary is `N` and Secondary is
`replicas=0`).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a25e62d4-791b-4fae-a089-b14154f78d96

📥 Commits

Reviewing files that changed from the base of the PR and between 425569e and 4b96dd6.

📒 Files selected for processing (1)
  • docs/en/solutions/acp/StatefulSet_Application_DR.md

@tossmilestone tossmilestone requested review from vivindu-1213 and removed request for vivindu-1213 March 27, 2026 08:46
@tossmilestone tossmilestone self-assigned this Mar 27, 2026
Copy link
Copy Markdown

@vivindu-1213 vivindu-1213 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

统一专有名词

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (4)
docs/en/solutions/acp/StatefulSet_Application_DR.md (4)

40-45: ⚠️ Potential issue | 🟡 Minor

Remove the standby Pod from the passive-cluster diagram.

The secondary cluster is described as replicas=0, but the diagram still renders a StatefulSet Pod node, which is contradictory to the topology narrative.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/solutions/acp/StatefulSet_Application_DR.md` around lines 40 - 45,
The diagram shows a StatefulSet Pod node (PodS) in the Secondary Cluster despite
replicas=0; remove the PodS node (or the label "StatefulSet Pod") from the
subgraph Secondary declaration so the passive cluster no longer renders a pod;
update the subgraph block containing Secondary, PodS, PVCS, and VSR_Dest to omit
PodS (or replace it with a comment/placeholder) so the visuals match the
replicas=0 narrative.

291-295: ⚠️ Potential issue | 🟡 Minor

Use fenced shell block formatting and fix trigger.manual terminology.

Line 291 still introduces an indented code block context (MD046), and Line 313 refers to trigger.manual as a label even though it is a spec field.

Also applies to: 311-313

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/solutions/acp/StatefulSet_Application_DR.md` around lines 291 - 295,
Replace the indented code block for the ReplicationSource check with a fenced
shell block (triple backticks + bash) to satisfy MD046 and update the wording
that refers to trigger.manual: clarify that trigger.manual is a spec field
(e.g., "the spec field trigger.manual") rather than a label; apply the same
fenced-block and spec-field wording fix to the other occurrence covering the
311-313 area and ensure references to ReplicationSource and rs-pvc-01 remain
unchanged.

135-148: ⚠️ Potential issue | 🟠 Major

Document client-facing exposure separately from the headless Service.

This sample defines only a headless Service (clusterIP: None). Please add either a client-facing Service/Ingress/Gateway example or an explicit note that external traffic switching requires a separate exposure resource targeting the same selector.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/solutions/acp/StatefulSet_Application_DR.md` around lines 135 - 148,
The sample only creates a headless Service (metadata.name:
my-stateful-app-headless with spec.clusterIP: None and selector app:
my-stateful-app) but does not show how to expose the app to clients; update the
document by either adding a second example resource that exposes the same
selector (e.g., a client-facing Service/LoadBalancer or Service type: ClusterIP
+ Ingress/Gateway) targeting selector: app: my-stateful-app, or insert an
explicit note right after the headless Service block stating that
external/client traffic requires a separate exposure resource
(Service/Ingress/Gateway) that selects the same app label and demonstrate the
required selector/name to use.

174-179: ⚠️ Potential issue | 🟡 Minor

Add blank lines before and after each parameter table.

These tables still violate MD058 and may render inconsistently across Markdown processors.

Also applies to: 207-216, 247-257

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/solutions/acp/StatefulSet_Application_DR.md` around lines 174 - 179,
The parameter tables under the "Parameters" section (the table containing rows
for "application-namespace" and "psk.txt") must have a blank line before and
after the table to satisfy MD058 and ensure consistent rendering; update each
table instance (including the other two occurrences that list parameters) by
inserting one empty line directly above the table header and one empty line
directly below the final | row so the tables are separated from surrounding
text.
🧹 Nitpick comments (1)
docs/en/solutions/acp/StatefulSet_Application_DR.md (1)

14-21: Unify terminology for secondary role naming.

The doc alternates between “Secondary Cluster” and “Standby.” Consider using one primary term consistently to reduce cognitive load in runbook steps.

Also applies to: 79-82

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/solutions/acp/StatefulSet_Application_DR.md` around lines 14 - 21,
The document mixes the terms "Secondary Cluster" and "Standby" for the same
role; pick one term (e.g., "Secondary Cluster") and replace all occurrences of
the alternate term (including the phrase "Secondary Cluster (Standby)" and any
uses of "Standby") so terminology is consistent across headings, bullet points,
runbook steps, and later references (e.g., where the role is described under
"Cluster Roles" and the later mentions around the runbook steps). Update any
parenthetical mentions and any inline references so the chosen term is used
everywhere and preserve existing meaning and context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@docs/en/solutions/acp/StatefulSet_Application_DR.md`:
- Around line 40-45: The diagram shows a StatefulSet Pod node (PodS) in the
Secondary Cluster despite replicas=0; remove the PodS node (or the label
"StatefulSet Pod") from the subgraph Secondary declaration so the passive
cluster no longer renders a pod; update the subgraph block containing Secondary,
PodS, PVCS, and VSR_Dest to omit PodS (or replace it with a comment/placeholder)
so the visuals match the replicas=0 narrative.
- Around line 291-295: Replace the indented code block for the ReplicationSource
check with a fenced shell block (triple backticks + bash) to satisfy MD046 and
update the wording that refers to trigger.manual: clarify that trigger.manual is
a spec field (e.g., "the spec field trigger.manual") rather than a label; apply
the same fenced-block and spec-field wording fix to the other occurrence
covering the 311-313 area and ensure references to ReplicationSource and
rs-pvc-01 remain unchanged.
- Around line 135-148: The sample only creates a headless Service
(metadata.name: my-stateful-app-headless with spec.clusterIP: None and selector
app: my-stateful-app) but does not show how to expose the app to clients; update
the document by either adding a second example resource that exposes the same
selector (e.g., a client-facing Service/LoadBalancer or Service type: ClusterIP
+ Ingress/Gateway) targeting selector: app: my-stateful-app, or insert an
explicit note right after the headless Service block stating that
external/client traffic requires a separate exposure resource
(Service/Ingress/Gateway) that selects the same app label and demonstrate the
required selector/name to use.
- Around line 174-179: The parameter tables under the "Parameters" section (the
table containing rows for "application-namespace" and "psk.txt") must have a
blank line before and after the table to satisfy MD058 and ensure consistent
rendering; update each table instance (including the other two occurrences that
list parameters) by inserting one empty line directly above the table header and
one empty line directly below the final | row so the tables are separated from
surrounding text.

---

Nitpick comments:
In `@docs/en/solutions/acp/StatefulSet_Application_DR.md`:
- Around line 14-21: The document mixes the terms "Secondary Cluster" and
"Standby" for the same role; pick one term (e.g., "Secondary Cluster") and
replace all occurrences of the alternate term (including the phrase "Secondary
Cluster (Standby)" and any uses of "Standby") so terminology is consistent
across headings, bullet points, runbook steps, and later references (e.g., where
the role is described under "Cluster Roles" and the later mentions around the
runbook steps). Update any parenthetical mentions and any inline references so
the chosen term is used everywhere and preserve existing meaning and context.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 92984c46-4725-4667-8ce0-da8db7dce130

📥 Commits

Reviewing files that changed from the base of the PR and between 4b96dd6 and 381abaf.

📒 Files selected for processing (1)
  • docs/en/solutions/acp/StatefulSet_Application_DR.md

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
docs/en/solutions/acp/StatefulSet_Application_DR.md (4)

291-295: ⚠️ Potential issue | 🟡 Minor

Use consistent fenced block formatting and fix trigger.manual terminology.

At Lines 291-295, the current indentation style is inconsistent with fenced-block linting expectations.
At Line 313, trigger.manual is a spec field, not a label.

Suggested fix
-    Check synchronization from the `ReplicationSource`.
-
-    ```bash
-    kubectl -n <application-namespace> get ReplicationSource rs-pvc-01 -o jsonpath='{.status}'
-    ```
+Check synchronization from the `ReplicationSource`.
+
+```bash
+kubectl -n <application-namespace> get ReplicationSource rs-pvc-01 -o jsonpath='{.status}'
+```
@@
-   - Create a One-Time `ReplicationSource` with the `trigger.manual` label on the primary cluster. This ensures that any tail data generated just as the primary cluster shut down is transmitted to the secondary cluster.
+   - Create a One-Time `ReplicationSource` with the `trigger.manual` spec field on the primary cluster. This ensures that any tail data generated just as the primary cluster shut down is transmitted to the secondary cluster.

Also applies to: 313-313

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/solutions/acp/StatefulSet_Application_DR.md` around lines 291 - 295,
Reformat the inline command into a proper fenced bash block and correct the
terminology for trigger.manual: move the "Check synchronization..." sentence
above a triple-backtick fenced block labeled bash, place the kubectl command
inside that block with correct indentation, and close the fence; update the
descriptive bullet that references trigger.manual to say "trigger.manual spec
field" (or "trigger.manual in the spec") and ensure this text appears in the
same bulleted step that describes creating a one-time ReplicationSource
(reference ReplicationSource and trigger.manual).

135-148: ⚠️ Potential issue | 🟠 Major

Clarify client-facing exposure vs headless Service in the sample.

This section only defines a headless Service. Please add an explicit note (or sample) for the separate client-facing Service/Ingress/Gateway required for DNS/GSLB switching.

Suggested doc addition after the Service block
   apiVersion: v1
   kind: Service
   metadata:
     name: my-stateful-app-headless
@@
      - name: tcp
        port: 80
        targetPort: 8080

+> Note: This headless Service is used for StatefulSet identity and internal discovery.
+> For client traffic switching (DNS/GSLB), define a separate client-facing Service/Ingress/Gateway targeting the same selector (app: my-stateful-app).

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @docs/en/solutions/acp/StatefulSet_Application_DR.md around lines 135 - 148,
Add a clarifying note and optional example for client-facing exposure after the
headless Service block: explain that the shown Service (metadata.name:
my-stateful-app-headless, selector: app: my-stateful-app) is headless and for
internal StatefulSet identity only, and instruct readers to create a separate
client-facing Service/Ingress/Gateway (targeting the same selector app: my-stateful-app) for DNS/GSLB switching; include a short sample or one-line
snippet suggestion for such a client-facing Service/Ingress/Gateway immediately
after the headless Service example.


</details>

---

`174-178`: _⚠️ Potential issue_ | _🟡 Minor_

**Add blank lines around parameter tables for renderer/linter compatibility.**

The three tables are missing surrounding blank lines.

 

<details>
<summary>Suggested pattern</summary>

```diff
 **Parameters**:
+
 | **Parameter** | **Explanation** |
 |:--|:--|
 | **application-namespace** | The namespace of secret, should same as application |
 | **psk.txt**| This field adheres to the format expected by stunnel: `<id>:<at least 32 hex digits>`. <br></br>for example, `1:23b7395fafc3e842bd8ac0fe142e6ad1`. |
+

Also applies to: 207-216, 247-257

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/solutions/acp/StatefulSet_Application_DR.md` around lines 174 - 178,
The parameter tables under the "**Parameters**:" headings (the one containing
the rows "**application-namespace**" and "**psk.txt**" and the two subsequent
parameter tables later in the document) need a blank line inserted before and
after each Markdown table so the renderer/linter recognizes them; update each
table by adding an empty line immediately above the line starting with "|
**Parameter**" and an empty line immediately after the table’s last "|" row.

40-45: ⚠️ Potential issue | 🟡 Minor

Align passive-cluster diagram with replicas=0 at Lines 40-45.

The secondary subgraph still includes a StatefulSet Pod node, which conflicts with the standby model described elsewhere.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/en/solutions/acp/StatefulSet_Application_DR.md` around lines 40 - 45,
The secondary subgraph currently defines a StatefulSet Pod node PodS[StatefulSet
Pod <br>replicas=0] which contradicts the standby model; update the Secondary
subgraph by removing the PodS node (or replace it with a clear placeholder
indicating no active pods, e.g., remove PodS entirely or change its label to "No
Pods / replicas=0") and ensure PVCS and VSR_Dest remain to reflect the passive
cluster; adjust any references to PodS in the same diagram so the diagram aligns
with replicas=0.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/en/solutions/acp/StatefulSet_Application_DR.md`:
- Line 12: Fix the broken Markdown link at the snippet '(**Alauda Build of
VolSync**)[https://docs.alauda.cn/container_platform/main/en/configure/storage/how_to/configuring_pvc_dr.html#deploy-alauda-build-of-volsync]'
by switching to standard Markdown link order so the link text comes before the
URL (e.g., [Alauda Build of VolSync](url)); ensure the emphasis (bold) is
applied inside the square brackets if you want bold link text (e.g., [**Alauda
Build of VolSync**](url)) and remove the reversed `(**text**)[url]` pattern so
parsers render the link correctly.

---

Duplicate comments:
In `@docs/en/solutions/acp/StatefulSet_Application_DR.md`:
- Around line 291-295: Reformat the inline command into a proper fenced bash
block and correct the terminology for trigger.manual: move the "Check
synchronization..." sentence above a triple-backtick fenced block labeled bash,
place the kubectl command inside that block with correct indentation, and close
the fence; update the descriptive bullet that references trigger.manual to say
"trigger.manual spec field" (or "trigger.manual in the spec") and ensure this
text appears in the same bulleted step that describes creating a one-time
ReplicationSource (reference ReplicationSource and trigger.manual).
- Around line 135-148: Add a clarifying note and optional example for
client-facing exposure after the headless Service block: explain that the shown
Service (metadata.name: my-stateful-app-headless, selector: app:
my-stateful-app) is headless and for internal StatefulSet identity only, and
instruct readers to create a separate client-facing Service/Ingress/Gateway
(targeting the same selector `app: my-stateful-app`) for DNS/GSLB switching;
include a short sample or one-line snippet suggestion for such a client-facing
Service/Ingress/Gateway immediately after the headless Service example.
- Around line 174-178: The parameter tables under the "**Parameters**:" headings
(the one containing the rows "**application-namespace**" and "**psk.txt**" and
the two subsequent parameter tables later in the document) need a blank line
inserted before and after each Markdown table so the renderer/linter recognizes
them; update each table by adding an empty line immediately above the line
starting with "| **Parameter**" and an empty line immediately after the table’s
last "|" row.
- Around line 40-45: The secondary subgraph currently defines a StatefulSet Pod
node PodS[StatefulSet Pod <br>replicas=0] which contradicts the standby model;
update the Secondary subgraph by removing the PodS node (or replace it with a
clear placeholder indicating no active pods, e.g., remove PodS entirely or
change its label to "No Pods / replicas=0") and ensure PVCS and VSR_Dest remain
to reflect the passive cluster; adjust any references to PodS in the same
diagram so the diagram aligns with replicas=0.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 16f2b684-8a4f-4cff-a2e1-3bc65961c944

📥 Commits

Reviewing files that changed from the base of the PR and between 381abaf and 9bca13e.

📒 Files selected for processing (1)
  • docs/en/solutions/acp/StatefulSet_Application_DR.md

Copy link
Copy Markdown

@vivindu-1213 vivindu-1213 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@tossmilestone tossmilestone requested a review from zhhray April 14, 2026 05:24
Add StatefulSet application DR solution with VolSync.

Signed-off-by: Xiaoxi He <xxhe@alauda.io>
@zhhray
Copy link
Copy Markdown
Contributor

zhhray commented Apr 14, 2026

/lgtm

- The instance `replicas` for the **primary cluster** is set to `N`.
- The instance `replicas` for the **secondary cluster** must be set to `0` using `patches` in kustomize (to avoid invalid connections and data corruption when failover has not occurred).

### 2.2 VolSync Asynchronous Data Sync Configuration
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议增加提示:
Note: Initial full synchronization and large-volume data transfers may incur significant time and bandwidth costs. Proper network capacity planning is required for cross-region deployments.


1. **Halt Traffic and Writes**: Use GitOps to scale down the stateful workloads on the primary cluster by setting `replicas` to `0`, completely isolating underlying writes from the primary application.
2. **Execute Final Full Synchronization**:
- Delete the original periodic `ReplicationSource` on the primary cluster.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

删除行为,风险提示:
(this stops further synchronization but does not remove already replicated data on the destination)

@tossmilestone tossmilestone merged commit 9072a03 into main Apr 14, 2026
2 checks passed
@tossmilestone tossmilestone deleted the feat/statefulset-dr branch April 14, 2026 09:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants