fix(kubeovn): derive MASTER_NODES from server group for multi-master support#12
fix(kubeovn): derive MASTER_NODES from server group for multi-master support#12
Conversation
…r kube-ovn MASTER_NODES was set to cozystack_api_server_host (always a single IP), breaking OVN RAFT consensus on multi-master clusters. Now auto-detected from the 'server' inventory group host keys, with an explicit override via cozystack_master_nodes variable. Ref: cozystack/cozystack#2242 Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
CI inventory uses 'localhost' as host key (not an IP), so auto-detection would produce an incorrect value. Set the IP explicitly. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Expand server group to three control-plane nodes to demonstrate multi-master setup. Add commented cozystack_master_nodes override. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Add variable to the optional variables table and a new section explaining auto-detection from the server inventory group. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
…tion Verify that MASTER_NODES is correctly derived from the server inventory group with both 3-node and single-node inventories, and that explicit cozystack_master_nodes override works. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 Walkthrough📝 Walkthrough🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical limitation in Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Ignored Files
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively addresses the limitation of hardcoded MASTER_NODES by introducing a flexible mechanism to derive these nodes from the server inventory group or allow explicit override. The changes are well-implemented across the Ansible roles, including defaults, tasks, and templates. Comprehensive documentation updates in README.md and example inventory files (examples/ubuntu/inventory.yml) clearly explain the new functionality and usage. Furthermore, the addition of dedicated test playbooks (tests/test-master-nodes.yml and tests/test-multi-master-inventory.yml) ensures the correctness and robustness of the multi-master support. The update to tests/ci-inventory.yml correctly handles the CI environment's specific needs. Overall, this is a solid improvement that enhances the flexibility and reliability of kube-ovn deployments in multi-master clusters.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/test-master-nodes.yml (1)
63-72: Consider a more robust assertion for MASTER_NODES verification.The assertion at line 67 uses
to_jsonwhich adds JSON-style quoting. This works if the rendered YAML contains the value as a quoted string, but could be fragile if the template format changes. A more explicit check could parse the YAML and verify the actual value.That said, for the current use case this approach is acceptable and functional.
♻️ Optional: More robust YAML parsing approach
- name: Assert rendered MASTER_NODES contains all server IPs ansible.builtin.assert: that: - "'MASTER_NODES' in _rendered_content" - - "(_cozystack_master_nodes | to_json) in _rendered_content" + - "_cozystack_master_nodes in _rendered_content" fail_msg: >- Rendered template does not contain expected MASTER_NODES value. - Expected {{ _cozystack_master_nodes | to_json }} in output. + Expected '{{ _cozystack_master_nodes }}' in output. vars: _rendered_content: "{{ _rendered_manifest.content | b64decode }}"Alternatively, if stricter validation is needed, consider using
from_yamlfilter to parse and extract the actual value.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test-master-nodes.yml` around lines 63 - 72, The current assert task for MASTER_NODES is fragile because it checks the rendered string with (_cozystack_master_nodes | to_json) inside _rendered_content; instead parse the rendered manifest and compare structured values: b64decode _rendered_manifest.content into _rendered_content, apply from_yaml to parse it, then assert that the parsed data's MASTER_NODES key equals or contains the _cozystack_master_nodes list; update the ansible.builtin.assert task to reference the parsed variable (using from_yaml and b64decode) rather than a quoted JSON string so the check is robust to formatting changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/test-master-nodes.yml`:
- Around line 63-72: The current assert task for MASTER_NODES is fragile because
it checks the rendered string with (_cozystack_master_nodes | to_json) inside
_rendered_content; instead parse the rendered manifest and compare structured
values: b64decode _rendered_manifest.content into _rendered_content, apply
from_yaml to parse it, then assert that the parsed data's MASTER_NODES key
equals or contains the _cozystack_master_nodes list; update the
ansible.builtin.assert task to reference the parsed variable (using from_yaml
and b64decode) rather than a quoted JSON string so the check is robust to
formatting changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: dcabcb0d-a4f9-4758-8598-96af4c9c9393
📒 Files selected for processing (9)
.github/workflows/test.ymlREADME.mdexamples/ubuntu/inventory.ymlroles/cozystack/defaults/main.ymlroles/cozystack/tasks/main.ymlroles/cozystack/templates/platform-package.yml.j2tests/ci-inventory.ymltests/test-master-nodes.ymltests/test-multi-master-inventory.yml
Move master node computation and validation to a dedicated task file (compute-master-nodes.yml) so both the role and tests use the same code path. Add IP address validation using ansible.utils.ipaddr to catch hostname-based inventory keys at Ansible time instead of runtime kube-ovn failures. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
…master auto-detection The ipaddr filter requires ansible.utils collection and netaddr Python package. Add both to CI and declare ansible.utils in requirements.yml. Add a single-master test inventory with IP host key (127.0.0.1) to exercise the auto-detection path that was previously untested. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Verify that IP validation correctly rejects hostname-based host keys in the server group with a clear error message directing users to set cozystack_master_nodes explicitly. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@roles/cozystack/tasks/compute-master-nodes.yml`:
- Around line 7-10: The current lookup uses groups['server'] which will
hard-fail when the 'server' group is absent; change the compute expression to
safely default to an empty list before joining (e.g. use groups.get('server',
[]) or groups['server'] | default([])) so in the template block that references
cozystack_master_nodes and groups['server'] you replace the failing join with a
guarded join (e.g. groups.get('server', []) | join(',') or groups['server'] |
default([]) | join(',')), preserving the existing cozystack_master_nodes check
and assert flow.
- Around line 21-24: Add the missing ansible.utils collection to test
dependencies instead of changing filter syntax: update tests/requirements.yml
(and optionally examples/*/requirements.yml for consistency) to include
ansible.utils, keep the explicit ansible.utils.ipaddr usage in the task named
"Validate master node IPs are valid addresses" in
roles/cozystack/tasks/compute-master-nodes.yml, and adjust the loop handling
there to use whitespace-safe looping (e.g., proper loop/loop_control or
with_items handling) so the assert runs correctly for each item.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 29293594-45eb-414c-bd88-3e1851fbf096
📒 Files selected for processing (3)
roles/cozystack/tasks/compute-master-nodes.ymlroles/cozystack/tasks/main.ymltests/test-master-nodes.yml
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test-master-nodes.yml
| {%- if cozystack_master_nodes | length > 0 -%} | ||
| {{ cozystack_master_nodes }} | ||
| {%- else -%} | ||
| {{ groups['server'] | join(',') }} |
There was a problem hiding this comment.
Guard inventory lookups to avoid pre-assert crashes.
At Line 10, groups['server'] hard-fails if the group is absent, which bypasses your clearer assert message. Use safe defaults in the compute step.
Proposed hardening
_cozystack_master_nodes: >-
- {%- if cozystack_master_nodes | length > 0 -%}
+ {%- if (cozystack_master_nodes | default('')) | length > 0 -%}
{{ cozystack_master_nodes }}
{%- else -%}
- {{ groups['server'] | join(',') }}
+ {{ groups.get('server', []) | join(',') }}
{%- endif -%}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@roles/cozystack/tasks/compute-master-nodes.yml` around lines 7 - 10, The
current lookup uses groups['server'] which will hard-fail when the 'server'
group is absent; change the compute expression to safely default to an empty
list before joining (e.g. use groups.get('server', []) or groups['server'] |
default([])) so in the template block that references cozystack_master_nodes and
groups['server'] you replace the failing join with a guarded join (e.g.
groups.get('server', []) | join(',') or groups['server'] | default([]) |
join(',')), preserving the existing cozystack_master_nodes check and assert
flow.
The ansible.utils.ipaddr filter requires the netaddr Python package on the target node. Add it to the E2E job pip install. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
.github/workflows/test.yml (1)
82-84: Prefer installing collections fromrequirements.ymlto avoid CI drift.At Line 83, installing only
ansible.utilsduplicates dependency declarations and can diverge from the repo’s canonical list.💡 Proposed refactor
- - name: Install required collections - run: ansible-galaxy collection install ansible.utils + - name: Install required collections + run: ansible-galaxy collection install --requirements-file requirements.yml🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/test.yml around lines 82 - 84, The CI step named "Install required collections" is hardcoding a single collection (ansible.utils) which risks drift; change the step so it installs collections from the repository's canonical requirements file instead of the single package—replace the ansible-galaxy command that runs "ansible-galaxy collection install ansible.utils" with a command that uses the requirements file (e.g., "ansible-galaxy collection install -r requirements.yml") and ensure the workflow references the correct requirements.yml path so all declared collections are installed consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/test.yml:
- Around line 100-108: The current negative test treats any ansible-playbook
failure as success; modify the step named "Test hostname host keys are rejected"
to capture the playbook output (run ansible-playbook tests/test-master-nodes.yml
--inventory tests/test-hostname-inventory.yml) into a variable and assert the
failure contains the specific hostname-key validation message (e.g. grep/check
for a known substring like "host key validation" or the exact error emitted when
hostname keys are rejected) before declaring success; if the playbook fails but
the output does not contain that expected message, fail the job to avoid false
positives.
---
Nitpick comments:
In @.github/workflows/test.yml:
- Around line 82-84: The CI step named "Install required collections" is
hardcoding a single collection (ansible.utils) which risks drift; change the
step so it installs collections from the repository's canonical requirements
file instead of the single package—replace the ansible-galaxy command that runs
"ansible-galaxy collection install ansible.utils" with a command that uses the
requirements file (e.g., "ansible-galaxy collection install -r
requirements.yml") and ensure the workflow references the correct
requirements.yml path so all declared collections are installed consistently.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 301c06a9-0d24-48c1-ab06-4c441883b9b3
📒 Files selected for processing (4)
.github/workflows/test.ymlrequirements.ymltests/test-hostname-inventory.ymltests/test-single-master-inventory.yml
✅ Files skipped from review due to trivial changes (1)
- tests/test-single-master-inventory.yml
| - name: Test hostname host keys are rejected | ||
| run: | | ||
| if ansible-playbook tests/test-master-nodes.yml \ | ||
| --inventory tests/test-hostname-inventory.yml 2>&1; then | ||
| echo "ERROR: Expected failure for hostname host keys, but playbook succeeded" | ||
| exit 1 | ||
| else | ||
| echo "OK: Hostname host keys correctly rejected" | ||
| fi |
There was a problem hiding this comment.
Tighten negative test assertion to prevent false positives.
At Lines 102-108, any playbook failure is considered success. This can pass even when failure is unrelated to hostname-key validation.
✅ Proposed fix (assert failure reason)
- name: Test hostname host keys are rejected
run: |
- if ansible-playbook tests/test-master-nodes.yml \
- --inventory tests/test-hostname-inventory.yml 2>&1; then
+ set +e
+ output="$(ansible-playbook tests/test-master-nodes.yml \
+ --inventory tests/test-hostname-inventory.yml 2>&1)"
+ status=$?
+ set -e
+
+ if [ "$status" -eq 0 ]; then
echo "ERROR: Expected failure for hostname host keys, but playbook succeeded"
exit 1
- else
- echo "OK: Hostname host keys correctly rejected"
fi
+
+ if ! grep -q "not a valid IP address in MASTER_NODES" <<<"$output"; then
+ echo "ERROR: Playbook failed, but not due to hostname/IP validation"
+ echo "$output"
+ exit 1
+ fi
+
+ echo "OK: Hostname host keys correctly rejected"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/test.yml around lines 100 - 108, The current negative test
treats any ansible-playbook failure as success; modify the step named "Test
hostname host keys are rejected" to capture the playbook output (run
ansible-playbook tests/test-master-nodes.yml --inventory
tests/test-hostname-inventory.yml) into a variable and assert the failure
contains the specific hostname-key validation message (e.g. grep/check for a
known substring like "host key validation" or the exact error emitted when
hostname keys are rejected) before declaring success; if the playbook fails but
the output does not contain that expected message, fail the job to avoid false
positives.
The role uses ansible.utils.ipaddr in compute-master-nodes.yml. Without this dependency in galaxy.yml, users installing the collection via ansible-galaxy would get a missing filter error. Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
Summary
kube-ovn
MASTER_NODESwas hardcoded tocozystack_api_server_host— always a single IP. On multi-master clusters this breaks OVN RAFT consensus becauseovn-centralonly gets one node inNODE_IPS.The role now auto-detects all control-plane node IPs from the
serverinventory group, with an explicitcozystack_master_nodesoverride for non-standard inventories.Ref: cozystack/cozystack#2242
Changes
cozystack_master_nodesvariable (default: auto-detect fromservergroup host keys)_cozystack_master_nodesviaset_factwith validationcozystack_api_server_hostcozystack_master_nodesin CI inventory (localhosthost key is not an IP)cozystack_master_nodesand multi-master setup in READMETest plan
ansible-lintpassesansible-test sanitypasses (CI)MASTER_NODEScontaining all three control-plane IPs.ovn-centralstarted withouthost ip not in NODE_IPSerrorsSummary by CodeRabbit
New Features
Documentation
Tests