diff --git a/docs/component-argo-workflows.md b/docs/component-argo-workflows.md index bd574a6a4..9c6f17afd 100644 --- a/docs/component-argo-workflows.md +++ b/docs/component-argo-workflows.md @@ -14,6 +14,7 @@ set of WorkflowTemplates below. | nautobot-api | HTTP Template Workflow to query the Nautobot API | method,nautobot_url,uri | result | | | bmc-sync-creds | Sync's a devices BMC password with what we have on record | device_id | | | | resync-ironic-nautobot | Resync Ironic nodes to Nautobot | node (optional) | | | +| resync-neutron-nautobot | Resync Neutron networks/subnets to Nautobot | network (optional) | | | \* WorkflowTemplate which requires a manual / custom implementation. diff --git a/docs/operator-guide/index.md b/docs/operator-guide/index.md index 2abb90699..1bac29615 100644 --- a/docs/operator-guide/index.md +++ b/docs/operator-guide/index.md @@ -37,7 +37,7 @@ be the regular project area. - [Gateway API Migration Guide](gateway-api.md) - Migration from ingress-nginx to Kubernetes Gateway API with Envoy Gateway - [Argo Workflows](workflows.md) - Workflow orchestration and troubleshooting -- [Ironic to Nautobot Sync](ironic-nautobot-sync.md) - Event-driven sync and bulk resync operations +- [OpenStack to Nautobot Sync](openstack-nautobot-sync.md) - Event-driven sync and bulk resync operations - [Monitoring Stack](monitoring.md) - Prometheus and Grafana monitoring [cli]: <../user-guide/openstack-cli.md> diff --git a/docs/operator-guide/ironic-nautobot-sync.md b/docs/operator-guide/ironic-nautobot-sync.md deleted file mode 100644 index a1c371053..000000000 --- a/docs/operator-guide/ironic-nautobot-sync.md +++ /dev/null @@ -1,110 +0,0 @@ -# Ironic to Nautobot Synchronization - -This guide explains how OpenStack Ironic data is synchronized to Nautobot -and how to handle situations when they get out of sync. - -## Event-Driven Sync - -Under normal operation, Ironic data is automatically synchronized to Nautobot -via Oslo notifications. When changes occur in Ironic, events are published to -RabbitMQ and processed by Argo Events workflows. - -### How It Works - -1. Ironic publishes Oslo notifications to RabbitMQ when nodes change -2. An Argo Events EventSource consumes messages from the `ironic` queue -3. A Sensor filters for relevant events and triggers workflows -4. The `openstack-oslo-event` workflow processes the event and updates Nautobot - -### Supported Events - -The following Ironic events trigger Nautobot updates: - -| Event Type | Action | -|------------|--------| -| `baremetal.node.create.end` | Creates device in Nautobot | -| `baremetal.node.update.end` | Updates device in Nautobot | -| `baremetal.node.delete.end` | Deletes device from Nautobot | -| `baremetal.node.provision_set.end` | Updates device status and syncs inspection data | -| `baremetal.port.create.end` | Creates interface in Nautobot | -| `baremetal.port.update.end` | Updates interface in Nautobot | -| `baremetal.port.delete.end` | Deletes interface from Nautobot | - -### Data Synchronized - -For each Ironic node, the following data is synced to Nautobot: - -- Device name (generated from manufacturer and service tag) -- Serial number -- Manufacturer and model -- Hardware specs (memory, CPUs, local storage) -- Provision state (mapped to Nautobot status) -- Location and rack (derived from connected switches) -- Tenant (from Ironic lessee field) -- Network interfaces and their connections - -## Bulk Resync - -When Nautobot gets out of sync with Ironic, you can perform a bulk resync. - -### Scheduled Resync (CronWorkflow) - -A CronWorkflow runs daily at 2:00 AM UTC to catch any drift between Ironic -and Nautobot. This provides a safety net for missed events. - -Check the schedule: - -```bash -argo -n argo-events cron list -``` - -Manually trigger the scheduled workflow: - -```bash -argo -n argo-events submit --from cronworkflow/resync-ironic-nautobot -``` - -Suspend/resume the schedule: - -```bash -argo -n argo-events cron suspend resync-ironic-nautobot -argo -n argo-events cron resume resync-ironic-nautobot -``` - -### On-Demand Resync (WorkflowTemplate) - -Resync all nodes: - -```bash -argo -n argo-events submit --from workflowtemplate/resync-ironic-nautobot -``` - -Resync a specific node: - -```bash -argo -n argo-events submit --from workflowtemplate/resync-ironic-nautobot \ - -p node="" -``` - -### Using CLI Directly - -For debugging or running outside the cluster: - -```bash -# Resync all nodes -resync-ironic-nautobot \ - --nautobot_url https://nautobot.example.com \ - --nautobot_token - -# Resync a specific node -resync-ironic-nautobot \ - --node \ - --nautobot_url https://nautobot.example.com \ - --nautobot_token - -# Dry run to see what would be synced -resync-ironic-nautobot \ - --dry-run \ - --nautobot_url https://nautobot.example.com \ - --nautobot_token -``` diff --git a/docs/operator-guide/openstack-nautobot-sync.md b/docs/operator-guide/openstack-nautobot-sync.md new file mode 100644 index 000000000..45907090e --- /dev/null +++ b/docs/operator-guide/openstack-nautobot-sync.md @@ -0,0 +1,170 @@ +# OpenStack to Nautobot Synchronization + +This guide explains how OpenStack data (Ironic and Neutron) is synchronized +to Nautobot and how to handle situations when they get out of sync. + +## Event-Driven Sync + +Under normal operation, OpenStack data is automatically synchronized to Nautobot +via Oslo notifications. When changes occur, events are published to RabbitMQ +and processed by Argo Events workflows. + +### How It Works + +1. OpenStack services publish Oslo notifications to RabbitMQ +2. Argo Events EventSources consume messages from the queues +3. Sensors filter for relevant events and trigger workflows +4. The `openstack-oslo-event` workflow processes events and updates Nautobot + +### Supported Ironic Events + +| Event Type | Action | +|------------|--------| +| `baremetal.node.create.end` | Creates device in Nautobot | +| `baremetal.node.update.end` | Updates device in Nautobot | +| `baremetal.node.delete.end` | Deletes device from Nautobot | +| `baremetal.node.provision_set.end` | Updates device status and syncs inspection data | +| `baremetal.port.create.end` | Creates interface in Nautobot | +| `baremetal.port.update.end` | Updates interface in Nautobot | +| `baremetal.port.delete.end` | Deletes interface from Nautobot | + +### Supported Neutron Events + +| Event Type | Action | +|------------|--------| +| `network.create.end` | Creates UCVNI and IPAM namespace in Nautobot | +| `network.update.end` | Updates UCVNI in Nautobot | +| `network.delete.end` | Deletes UCVNI and namespace from Nautobot | +| `subnet.create.end` | Creates prefix in Nautobot | +| `subnet.update.end` | Updates prefix in Nautobot | +| `subnet.delete.end` | Deletes prefix from Nautobot | + +### Data Synchronized + +**From Ironic:** + +- Device name (generated from manufacturer and service tag) +- Serial number +- Manufacturer and model +- Hardware specs (memory, CPUs, local storage) +- Provision state (mapped to Nautobot status) +- Location and rack (derived from connected switches) +- Tenant (from Ironic lessee field) +- Network interfaces and their connections + +**From Neutron:** + +- Networks → UCVNI (undercloud VNI) with segmentation ID +- Networks → IPAM Namespaces +- Subnets → IPAM Prefixes + +## Bulk Resync + +When Nautobot gets out of sync with OpenStack, you can perform a bulk resync. +Run Neutron resync first to ensure IPAM namespaces and prefixes exist before +syncing Ironic devices and interfaces. + +### Resync Order + +For a full resync, run in this order: + +1. Neutron (creates namespaces and prefixes) +2. Ironic (creates devices and interfaces) + +### Scheduled Resync (CronWorkflow) + +A CronWorkflow runs daily at 2:00 AM UTC to catch any drift between Ironic +and Nautobot. This provides a safety net for missed events. + +Check the schedule: + +```bash +argo -n argo-events cron list +``` + +Manually trigger the scheduled workflow: + +```bash +argo -n argo-events submit --from cronworkflow/resync-ironic-nautobot +``` + +Suspend/resume the schedule: + +```bash +argo -n argo-events cron suspend resync-ironic-nautobot +argo -n argo-events cron resume resync-ironic-nautobot +``` + +### On-Demand Resync (WorkflowTemplate) + +**Neutron (run first):** + +```bash +# Resync all networks and subnets +argo -n argo-events submit --from workflowtemplate/resync-neutron-nautobot + +# Resync a specific network +argo -n argo-events submit --from workflowtemplate/resync-neutron-nautobot \ + -p network="" +``` + +**Ironic (run after Neutron):** + +Resync all nodes: + +```bash +argo -n argo-events submit --from workflowtemplate/resync-ironic-nautobot +``` + +Resync a specific node: + +```bash +argo -n argo-events submit --from workflowtemplate/resync-ironic-nautobot \ + -p node="" +``` + +### Using CLI Directly + +For debugging or running outside the cluster: + +**Neutron:** + +```bash +# Resync all networks and subnets +resync-neutron-nautobot \ + --nautobot_url https://nautobot.example.com \ + --nautobot_token + +# Resync a specific network +resync-neutron-nautobot \ + --network \ + --nautobot_url https://nautobot.example.com \ + --nautobot_token + +# Dry run +resync-neutron-nautobot \ + --dry-run \ + --nautobot_url https://nautobot.example.com \ + --nautobot_token +``` + +**Ironic:** + +```bash +# Resync all nodes +resync-ironic-nautobot \ + --nautobot_url https://nautobot.example.com \ + --nautobot_token + +# Resync a specific node +resync-ironic-nautobot \ + --node \ + --nautobot_url https://nautobot.example.com \ + --nautobot_token + +# Dry run to see what would be synced +resync-ironic-nautobot \ + --dry-run \ + --nautobot_url https://nautobot.example.com \ + --nautobot_token +``` diff --git a/mkdocs.yml b/mkdocs.yml index befa1d797..03e883a9a 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -174,7 +174,7 @@ nav: - 'Infrastructure': - operator-guide/argocd-helm-chart.md - operator-guide/workflows.md - - operator-guide/ironic-nautobot-sync.md + - operator-guide/openstack-nautobot-sync.md - operator-guide/monitoring.md - operator-guide/gateway-api.md - operator-guide/bmc-password.md diff --git a/python/understack-workflows/pyproject.toml b/python/understack-workflows/pyproject.toml index 2cc1da9bd..1e9e8d01e 100644 --- a/python/understack-workflows/pyproject.toml +++ b/python/understack-workflows/pyproject.toml @@ -39,6 +39,7 @@ netapp-configure-interfaces = "understack_workflows.main.netapp_configure_net:ma netapp-create-svm = "understack_workflows.main.netapp_create_svm:main" openstack-oslo-event = "understack_workflows.main.openstack_oslo_event:main" resync-ironic-nautobot = "understack_workflows.main.resync_ironic_to_nautobot:main" +resync-neutron-nautobot = "understack_workflows.main.resync_neutron_to_nautobot:main" sync-keystone = "understack_workflows.main.sync_keystone:main" sync-network-segment-range = "understack_workflows.main.sync_ucvni_group_range:main" undersync-switch = "understack_workflows.main.undersync_switch:main" diff --git a/python/understack-workflows/understack_workflows/main/resync_neutron_to_nautobot.py b/python/understack-workflows/understack_workflows/main/resync_neutron_to_nautobot.py new file mode 100644 index 000000000..511eca811 --- /dev/null +++ b/python/understack-workflows/understack_workflows/main/resync_neutron_to_nautobot.py @@ -0,0 +1,295 @@ +"""Resync Neutron networks and subnets to Nautobot. + +Use when Nautobot gets out of sync with Neutron, e.g., after: +- Nautobot database restore +- Missed events +- Manual Nautobot changes + +Should be run before resync-ironic-nautobot to ensure IPAM namespaces +and prefixes exist before device/interface sync. +""" + +import argparse +import logging +from dataclasses import dataclass + +import pynautobot + +from understack_workflows.helpers import credential +from understack_workflows.helpers import parser_nautobot_args +from understack_workflows.helpers import setup_logger +from understack_workflows.nautobot import NautobotRequestError +from understack_workflows.openstack.client import get_openstack_client +from understack_workflows.oslo_event.neutron_network import NetworkEvent +from understack_workflows.oslo_event.neutron_network import _create_nautobot_ucvni +from understack_workflows.oslo_event.neutron_network import ( + _ensure_nautobot_ipam_namespace_exists, +) +from understack_workflows.oslo_event.neutron_subnet import _create_nautobot_prefix +from understack_workflows.oslo_event.neutron_subnet import _update_nautobot_prefix + +logger = logging.getLogger(__name__) + +_EXIT_SUCCESS = 0 +_EXIT_SYNC_FAILURES = 1 + + +@dataclass +class SyncResult: + """Result of a sync operation.""" + + total: int = 0 + failed: int = 0 + + @property + def succeeded(self) -> int: + return self.total - self.failed + + +def argument_parser() -> argparse.ArgumentParser: + parser = argparse.ArgumentParser(description="Resync Neutron to Nautobot") + parser.add_argument( + "--network", + type=str, + default="", + help="Sync specific network UUID (default: all networks)", + ) + parser.add_argument( + "--dry-run", action="store_true", help="List resources without syncing" + ) + parser.add_argument( + "--ucvni-group", + type=str, + help="UCVNI group name (defaults to UCVNI_GROUP_NAME env var)", + ) + parser = parser_nautobot_args(parser) + return parser + + +def _update_nautobot_ucvni( + nautobot: pynautobot.api, + event: NetworkEvent, + ucvni_group_name: str | None = None, +) -> bool: + """Update existing UCVNI. Returns True if updated, False if not found.""" + import os + + ucvni_id = str(event.network_uuid) + + if ucvni_group_name is None: + ucvni_group_name = os.getenv("UCVNI_GROUP_NAME") + if ucvni_group_name is None: + raise RuntimeError("Please set environment variable UCVNI_GROUP_NAME") + + payload = { + "name": event.network_name, + "status": {"name": "Active"}, + "tenant": str(event.tenant_id), + "ucvni_group": {"name": ucvni_group_name}, + "ucvni_id": event.provider_segmentation_id, + } + + try: + response = nautobot.plugins.undercloud_vni.ucvnis.update( + id=ucvni_id, data=payload + ) + logger.info("Updated Nautobot UCVNI: %s", response) + return True + except pynautobot.RequestError as e: + if e.req.status_code == 404: + logger.debug("No pre-existing Nautobot UCVNI with id=%s", ucvni_id) + return False + raise NautobotRequestError(e) from e + + +def sync_network( + network, + nautobot: pynautobot.api, + ucvni_group_name: str | None = None, +) -> bool: + """Sync a single network to Nautobot. + + Returns True on success, False on failure. + """ + try: + network_id = network.id + + # Create NetworkEvent-like object for reuse of existing functions + event = NetworkEvent( + event_type="network.sync", + network_uuid=network_id, + network_name=network.name, + tenant_id=network.project_id, + external=network.is_router_external or False, + provider_segmentation_id=network.provider_segmentation_id or 0, + ) + + # Create IPAM namespace + _ensure_nautobot_ipam_namespace_exists(nautobot, str(network_id)) + + # Create or update UCVNI if segmentation ID exists + if event.provider_segmentation_id: + if not _update_nautobot_ucvni(nautobot, event, ucvni_group_name): + _create_nautobot_ucvni(nautobot, event, ucvni_group_name) + + return True + except Exception: + logger.exception("Failed to sync network %s", network.id) + return False + + +def sync_subnet( + subnet, + nautobot: pynautobot.api, + network_external_map: dict[str, bool], +) -> bool: + """Sync a single subnet to Nautobot. + + Args: + subnet: OpenStack subnet object + nautobot: Nautobot API instance + network_external_map: Dict mapping network_id -> is_router_external + + Returns True on success, False on failure. + """ + try: + subnet_id = str(subnet.id) + network_id = str(subnet.network_id) + + # Determine namespace - external network subnets go to Global namespace + is_external = network_external_map.get(network_id, False) + namespace = "Global" if is_external else network_id + + payload = { + "id": subnet_id, + "prefix": subnet.cidr, + "status": "Active", + "namespace": {"name": namespace}, + "tenant": {"id": str(subnet.project_id)}, + } + + # Try update first, then create + if not _update_nautobot_prefix(nautobot, subnet_id, payload): + _create_nautobot_prefix(nautobot, payload) + + return True + except Exception: + logger.exception("Failed to sync subnet %s", subnet.id) + return False + + +def sync_neutron_to_nautobot( + nautobot: pynautobot.api, + network_id: str | None = None, + ucvni_group_name: str | None = None, + dry_run: bool = False, +) -> tuple[SyncResult, SyncResult]: + """Sync Neutron networks and subnets to Nautobot. + + Args: + nautobot: Nautobot API instance + network_id: Optional specific network UUID to sync + ucvni_group_name: UCVNI group name for network sync + dry_run: If True, only log what would be synced + + Returns: + Tuple of (network_result, subnet_result) + """ + conn = get_openstack_client() + + network_result = SyncResult() + subnet_result = SyncResult() + + # Get networks + if network_id: + networks = [conn.network.get_network(network_id)] # type: ignore[attr-defined] + else: + networks = list(conn.network.networks()) # type: ignore[attr-defined] + + # Build network external map for subnet sync + network_external_map: dict[str, bool] = {} + + # Sync networks first (creates namespaces) + for network in networks: + network_result.total += 1 + network_external_map[network.id] = network.is_router_external or False + + if dry_run: + logger.info( + "Would sync network: %s (%s) seg_id=%s external=%s", + network.id, + network.name, + network.provider_segmentation_id, + network.is_router_external, + ) + continue + + logger.info("Syncing network: %s (%s)", network.id, network.name) + if not sync_network(network, nautobot, ucvni_group_name): + network_result.failed += 1 + + # Get subnets + if network_id: + subnets = list(conn.network.subnets(network_id=network_id)) # type: ignore[attr-defined] + else: + subnets = list(conn.network.subnets()) # type: ignore[attr-defined] + + # Sync subnets (creates prefixes) + for subnet in subnets: + subnet_result.total += 1 + + if dry_run: + network_is_external = network_external_map.get(subnet.network_id, False) + logger.info( + "Would sync subnet: %s (%s) cidr=%s network_external=%s", + subnet.id, + subnet.name, + subnet.cidr, + network_is_external, + ) + continue + + logger.info( + "Syncing subnet: %s (%s) cidr=%s", subnet.id, subnet.name, subnet.cidr + ) + if not sync_subnet(subnet, nautobot, network_external_map): + subnet_result.failed += 1 + + return network_result, subnet_result + + +def main() -> int: + setup_logger(level=logging.INFO) + args = argument_parser().parse_args() + + nb_token = args.nautobot_token or credential("nb-token", "token") + nautobot = pynautobot.api(args.nautobot_url, token=nb_token) + + network_result, subnet_result = sync_neutron_to_nautobot( + nautobot, + network_id=args.network or None, + ucvni_group_name=args.ucvni_group, + dry_run=args.dry_run, + ) + + if args.dry_run: + logger.info( + "Dry run complete. %d networks and %d subnets would be synced.", + network_result.total, + subnet_result.total, + ) + else: + logger.info( + "Sync complete. Networks: %d/%d, Subnets: %d/%d synced successfully.", + network_result.succeeded, + network_result.total, + subnet_result.succeeded, + subnet_result.total, + ) + + total_failed = network_result.failed + subnet_result.failed + if total_failed: + logger.error("Failed to sync %d resources", total_failed) + return _EXIT_SYNC_FAILURES + + return _EXIT_SUCCESS diff --git a/workflows/argo-events/kustomization.yaml b/workflows/argo-events/kustomization.yaml index a64e56edf..ead420f54 100644 --- a/workflows/argo-events/kustomization.yaml +++ b/workflows/argo-events/kustomization.yaml @@ -26,6 +26,7 @@ resources: - eventsources/alertmanager-webhook-eventsource.yaml - workflowtemplates/alert-automation-neutron-agent-down.yaml - workflowtemplates/resync-ironic-nautobot.yaml + - workflowtemplates/resync-neutron-nautobot.yaml # CronWorkflows - cronworkflows/resync-ironic-nautobot.yaml diff --git a/workflows/argo-events/workflowtemplates/resync-neutron-nautobot.yaml b/workflows/argo-events/workflowtemplates/resync-neutron-nautobot.yaml new file mode 100644 index 000000000..fddeabb15 --- /dev/null +++ b/workflows/argo-events/workflowtemplates/resync-neutron-nautobot.yaml @@ -0,0 +1,63 @@ +--- +apiVersion: argoproj.io/v1alpha1 +metadata: + name: resync-neutron-nautobot + annotations: + workflows.argoproj.io/title: Resync Neutron networks and subnets to Nautobot + workflows.argoproj.io/description: | + Resyncs Neutron network and subnet data to Nautobot. Use when Nautobot + gets out of sync with Neutron, e.g., after database restore or missed events. + + Should be run before resync-ironic-nautobot to ensure IPAM namespaces + and prefixes exist before device/interface sync. + + To resync all networks: + ``` + argo -n argo-events submit --from workflowtemplate/resync-neutron-nautobot + ``` + + To resync a specific network: + ``` + argo -n argo-events submit --from workflowtemplate/resync-neutron-nautobot \ + -p network="" + ``` + + Defined in `workflows/argo-events/workflowtemplates/resync-neutron-nautobot.yaml` +kind: WorkflowTemplate +spec: + entrypoint: main + serviceAccountName: workflow + arguments: + parameters: + - name: network + value: "" # empty = all networks + templates: + - name: main + container: + image: ghcr.io/rackerlabs/understack/ironic-nautobot-client:latest + command: + - resync-neutron-nautobot + args: + - "--network" + - "{{workflow.parameters.network}}" + volumeMounts: + - mountPath: /etc/nb-token/ + name: nb-token + readOnly: true + - mountPath: /etc/openstack + name: baremetal-manage + readOnly: true + envFrom: + - configMapRef: + name: cluster-metadata + optional: false + volumes: + - name: nb-token + secret: + secretName: nautobot-token + - name: baremetal-manage + secret: + secretName: baremetal-manage + items: + - key: clouds.yaml + path: clouds.yaml