SREP-3620: Add feature-testing evidence collection commands#862
SREP-3620: Add feature-testing evidence collection commands#862MitaliBhalla wants to merge 1 commit intoopenshift:masterfrom
Conversation
|
@MitaliBhalla: This pull request references SREP-3620 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds new CLI command groups for AWS and evidence collection, plus cluster snapshot and diff utilities. Implements CloudTrail error querying, cluster snapshot creation, snapshot diffing, and an evidence collection orchestrator. Wires commands into the root CLI and adds documentation and README content. All changes are additive. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI as CloudTrail CLI
participant STS as AWS STS
participant CT as AWS CloudTrail
participant Parser as Event Parser
participant Output as Formatter
User->>CLI: Run `osdctl aws cloudtrail errors` (cluster-id, since, flags)
CLI->>CLI: Parse flags -> compute startTime
CLI->>STS: GetCallerIdentity
STS-->>CLI: ARN/AccountID
CLI->>Output: Print identity (unless --json)
CLI->>CT: LookupEvents (cluster region, startTime)
CT-->>CLI: Regional events
alt cluster region != us-east-1
CLI->>CT: LookupEvents (us-east-1, startTime)
CT-->>CLI: Global events
CLI->>CLI: Merge events
end
CLI->>Parser: Filter/parse events (error-types or defaults)
Parser-->>CLI: ErrorEvent list (with optional ConsoleLink)
alt --json
CLI->>Output: Emit JSON array
else
CLI->>Output: Render human-readable table
end
Output-->>User: Display results
sequenceDiagram
participant User
participant CLI as Snapshot/Diff CLI
participant K8s as Kubernetes API
participant FS as File System
participant Diff as Diff Engine
User->>CLI: Run `osdctl cluster snapshot` (cluster-id, output)
CLI->>K8s: oc get nodes/namespaces/clusteroperators/...
K8s-->>CLI: JSON resources
CLI->>CLI: Build ClusterSnapshot struct
CLI->>FS: Write YAML snapshot
FS-->>CLI: Snapshot saved
User->>CLI: Run `osdctl cluster diff` (snap1, snap2)
CLI->>FS: Read snap1 & snap2 YAML
FS-->>CLI: ClusterSnapshot before/after
CLI->>Diff: compareSnapshots(before, after)
Diff-->>CLI: DiffResult (nodes/operators/namespaces/resources)
alt --json
CLI->>FS: Write structured diff
else
CLI->>CLI: Render human diff (+ / - / ~)
end
FS-->>User: Diff output
sequenceDiagram
participant User
participant CLI as Evidence Collect CLI
participant K8s as Kubernetes API
participant AWS as AWS (STS/CloudTrail)
participant MustGather as must-gather
participant FS as File System
User->>CLI: Run `osdctl evidence collect` (cluster-id, output-dir, since, flags)
CLI->>CLI: Validate inputs, compute startTime
CLI->>K8s: Verify cluster connectivity
K8s-->>CLI: Connected
CLI->>FS: Create output directory
alt not skip-cluster-state
CLI->>K8s: Fetch nodes, operators, machineconfigs
K8s-->>CLI: ClusterState data
CLI->>FS: Store ClusterState in evidence.yaml
end
alt include-events
CLI->>K8s: Get events (--all-namespaces, since)
K8s-->>CLI: Event list
end
alt AWS cluster AND not skip-cloudtrail
CLI->>AWS: whoami (STS)
AWS-->>CLI: ARN/ID
CLI->>AWS: CloudTrail LookupEvents (startTime)
AWS-->>CLI: CloudTrail events
CLI->>FS: Store CloudTrailData in evidence.yaml
end
alt include-must-gather
CLI->>MustGather: Run `oc adm must-gather`
MustGather-->>FS: must-gather artifacts
CLI->>FS: Record diagnostics path
end
CLI->>FS: Write evidence.yaml and summary.txt
FS-->>User: Evidence package ready
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: MitaliBhalla The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@MitaliBhalla: This pull request references SREP-3620 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 13
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/aws/cloudtrail/errors.go`:
- Around line 76-87: The examples block in the cloudtrail errors command uses
the lowercase shorthand "-c" for the cluster-id flag which is incorrect; update
all occurrences in the examples (including the other block referenced) to use
the registered uppercase shorthand "-C" so pasted commands match the actual flag
definition (update the examples string in cmd/aws/cloudtrail/errors.go where the
help/examples for the errors command are defined).
- Around line 339-345: The parseDurationToUTC function currently accepts zero or
negative durations which yields a start time in the future; update
parseDurationToUTC to validate the parsed duration and return an error if
duration <= 0 (e.g., "lookback window must be a positive duration: %s"), so
callers (e.g., --since) get a clear validation error instead of an empty/future
query; keep the existing time.ParseDuration call and only add a check after
parsing that returns a wrapped fmt.Errorf when the duration is non‑positive.
In `@cmd/cluster/diff.go`:
- Around line 104-105: The --json flag (opts.OutputJSON) is still serializing
output with yaml.Marshal; update the code paths that check opts.OutputJSON (the
diffCmd output block and the similar block around the 410-417 area) to use
json.Marshal or json.MarshalIndent instead of yaml.Marshal so the --json flag
actually emits JSON, retain existing error handling and write the marshaled
bytes to stdout/stderr as before; ensure
diffCmd.Flags().BoolVar(&opts.OutputJSON, "json", false, "Output diff in JSON
format") behavior matches the new json serialization.
- Around line 189-231: compareNodes currently only detects Status changes and
treats added/removed nodes, so role, version, condition, and label changes
recorded in NodeSnapshot are never reported; update compareNodes to compare all
relevant NodeSnapshot fields (Roles, Version, Conditions, Labels in addition to
Status) and produce NodeDiff entries for modifications that include Before/After
strings for any changed field(s) using the NodeSnapshot.Name key; do the
same-change logic for the corresponding namespace comparator (the function
handling namespace diffs referenced around lines 315-357, e.g.,
compareNamespaces) so label changes there are also detected and reported in
NamespaceDiff entries.
In `@cmd/cluster/snapshot.go`:
- Around line 97-105: The help examples for the "osdctl cluster snapshot"
command use the lowercase short flag "-c" but the command actually registers the
uppercase short flag "-C"; update all example lines in the snapshot help block
to use "-C" (e.g., replace "osdctl cluster snapshot -c <cluster-id>" with
"osdctl cluster snapshot -C <cluster-id>") so copy/pasting from the examples
matches the registered short flag for the snapshot command.
- Around line 63-69: ResourceInfo currently only stores Status (phase) which is
too lossy; add a stable per-object digest (and/or richer fields) and populate it
when building snapshots so diffs detect spec/rollout changes. Modify the
ResourceInfo struct (add Digest string `yaml:"digest,omitempty"` and optionally
Generation/ResourceVersion or a SpecSummary field), then in the snapshot
construction code that produces ResourceInfo instances (the code referenced
around ResourceInfo usage and the other block ~382-411) compute a deterministic
hash (e.g., SHA256) over the object's meaningful content — for example
metadata.generation/resourceVersion plus the object spec (or the full
unstructured JSON/YAML of the object) — and assign that string to
ResourceInfo.Digest; ensure the same hashing logic is applied in both places
mentioned so future diffs compare Digest (and any added fields) rather than just
Status.
- Around line 121-205: The run() flow fetches the target cluster via OCM but
then shells out to oc in captureNodes(), captureNamespaces(),
captureClusterOperators(), captureResources() (and other capture paths), which
can operate against whatever kubecontext is active; update run() to resolve or
obtain the correct kubeconfig/context for o.ClusterID (reuse existing
utils/CreateConnection/GetClusterAnyStatus flow or call the utility that returns
the cluster's kubeconfig/OCM login token) and ensure every capture function
(captureNodes, captureNamespaces, captureClusterOperators, captureResources)
uses that kubeconfig/context (pass it as an argument or set KUBECONFIG/--context
for oc invocations) or else fail fast with an error if we cannot establish the
correct context; apply the same check/enforcement to the other capture call
sites noted in the review so metadata and captured resources always come from
the requested cluster.
In `@cmd/evidence/collect.go`:
- Around line 333-355: The collectClusterState function currently ignores errors
from sub-collectors (collectNodes, collectOperators, collectMachineConfigs);
change it to capture and return collector errors instead of swallowing them:
call each sub-collector, append any non-nil error to a slice (or use
errors.Join) and only return the partially-built state if no errors occurred; if
there are errors, return nil along with a combined/wrapped error that includes
context identifying which collector failed (mentioning collectNodes,
collectOperators, collectMachineConfigs in the message) so the caller can detect
and report failed sub-collections.
- Around line 279-290: The events collection ignores the user-specified
--since/startTime; modify the flow so collectKubernetesEvents(startTime)
receives the parsed start time (or make startTime accessible to it) and uses it
in the Kubernetes client list options (e.g., metav1.ListOptions fieldSelector or
metav1.ListOptions with SinceTime via metav1.Time) to limit returned events;
update references where events are collected (function collectKubernetesEvents
and call sites around the IncludeEvents handling and the other occurrences noted
around lines 502-540) to pass the startTime and ensure the client query filters
events by that time before assigning evidence.ClusterState.Events.
- Around line 261-275: The code assigns a placeholder CloudTrail object as
evidence even when nothing was actually queried; update the logic so you only
set evidence.CloudTrailData when o.collectCloudTrailData returns meaningful data
(e.g., non-nil and contains events/records) — otherwise leave
evidence.CloudTrailData nil and print a warning. Concretely, modify
o.collectCloudTrailData (or its callers) to return nil (not &CloudTrailData{})
when no events were found, or add a post-call check in the block that calls
o.collectCloudTrailData (the same block using CreateAWSV2Config and the other
occurrences mentioned) to verify the returned struct has real content (e.g.,
len(events) > 0 or other key fields are non-zero) before assigning to
evidence.CloudTrailData and reporting success.
- Around line 161-172: The help examples in the Examples block for the evidence
collect command use the wrong shorthand `-c`; update all example lines to use
the registered uppercase shorthand `-C` for the `cluster-id` flag (the Examples
string in cmd/evidence/collect.go for the evidence collect command), making sure
you replace every `-c <cluster-id>` occurrence (including the later Examples
block referenced at lines 178-186) with `-C <cluster-id>` so copy-pasted
commands work as expected.
- Around line 248-307: Bind all external cluster commands to the target cluster
and tighten error handling: ensure every oc/oc adm invocation used in
runMustGather(), collectKubernetesEvents(), collectClusterState(), and any other
collectors is executed with the target cluster's kubeconfig/context (use the OCM
cluster object returned earlier—e.g., obtain and pass its kubeconfig or context
into command invocations rather than relying on current kubeconfig), propagate
and return errors from the three sub-collectors inside collectClusterState()
instead of swallowing them (make collectClusterState() return non-nil error when
any sub-collector fails), pass startTime into collectKubernetesEvents(startTime)
and use it to add a --since filter to the underlying oc events call, change
collectCloudTrailData() to return an explicit error or real data (do not return
an empty slice on unimplemented; return an error so callers can distinguish “no
data” from “not collected”), and fix CLI examples/flag usage to match the
defined flag (-C) instead of -c.
In `@README.md`:
- Around line 623-624: Update the heading/text that currently reads "Compare
snapshots with YAML output" to indicate JSON output instead; the example command
`osdctl cluster diff before.yaml after.yaml --json` uses the `--json` flag, so
change the header or surrounding prose to "Compare snapshots with JSON output"
(or otherwise mention JSON) to match the `--json` option shown.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fd9c777b-4301-4150-82af-5343f24124a9
📒 Files selected for processing (10)
README.mdcmd/aws/cloudtrail/cmd.gocmd/aws/cloudtrail/errors.gocmd/aws/cmd.gocmd/cluster/cmd.gocmd/cluster/diff.gocmd/cluster/snapshot.gocmd/cmd.gocmd/evidence/cmd.gocmd/evidence/collect.go
|
@MitaliBhalla: This pull request references SREP-3620 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 10
♻️ Duplicate comments (7)
cmd/evidence/collect.go (4)
333-355:⚠️ Potential issue | 🟠 MajorDon't report partial cluster state as success.
collectClusterState()drops errors from all three sub-collectors and still returnsnil. A failedoc getbecomes an empty section plus a success message, which makes incomplete evidence look healthy.Suggested fix
+import "errors" + func (o *collectOptions) collectClusterState() (*ClusterState, error) { state := &ClusterState{} + var errs []error // Collect nodes nodes, err := o.collectNodes() - if err == nil { + if err != nil { + errs = append(errs, fmt.Errorf("collect nodes: %w", err)) + } else { state.Nodes = nodes } // Collect operators operators, err := o.collectOperators() - if err == nil { + if err != nil { + errs = append(errs, fmt.Errorf("collect operators: %w", err)) + } else { state.Operators = operators } // Collect machine configs machineConfigs, err := o.collectMachineConfigs() - if err == nil { + if err != nil { + errs = append(errs, fmt.Errorf("collect machineconfigs: %w", err)) + } else { state.MachineConfigs = machineConfigs } - return state, nil + if len(errs) > 0 { + return nil, errors.Join(errs...) + } + return state, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/evidence/collect.go` around lines 333 - 355, collectClusterState currently swallows errors from collectNodes, collectOperators, and collectMachineConfigs and always returns nil; update collectClusterState to detect and surface failures from those sub-collectors (collectNodes, collectOperators, collectMachineConfigs) instead of silently ignoring them: call each collector, if any returns a non-nil error record it (either return the first error immediately or aggregate multiple errors) and return a non-nil error alongside any partially-collected ClusterState so callers know the cluster state is incomplete.
543-549:⚠️ Potential issue | 🟠 MajorDon't serialize placeholder CloudTrail data as real evidence.
This function never queries CloudTrail but still returns success with an empty struct. The caller and summary then treat “not collected” as “collected and empty.”
Suggested fix
func (o *collectOptions) collectCloudTrailData(startTime time.Time) (*CloudTrailData, error) { // Note: CloudTrail data collection is handled by the 'osdctl aws cloudtrail errors' command - // This function provides a placeholder for the evidence collection summary - // For detailed CloudTrail analysis, use: osdctl aws cloudtrail errors -c <cluster-id> --since <duration> + // For detailed CloudTrail analysis, use: osdctl aws cloudtrail errors -C <cluster-id> --since <duration> _ = startTime - return &CloudTrailData{}, nil + return nil, fmt.Errorf("cloudtrail collection is not implemented for evidence bundles yet") }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/evidence/collect.go` around lines 543 - 549, The collectCloudTrailData placeholder in collectOptions (function collectCloudTrailData) currently returns an empty CloudTrailData struct and nil error, causing callers to treat “not collected” as valid empty evidence; change it to indicate non-collection by returning (nil, error) instead of &CloudTrailData{} — e.g. return nil, fmt.Errorf("cloudtrail data collection not implemented") or a package-level sentinel error like ErrNotCollected — and update any callers to handle a nil *CloudTrailData (or that sentinel) so placeholder data is never serialized as real evidence.
279-282:⚠️ Potential issue | 🟠 Major
--sinceis ignored for Kubernetes events.
startTimenever reachescollectKubernetesEvents(), so--include-events --since 1hstill dumps the full event history. On busy clusters that can massively inflate the bundle and blur the validation window.Also applies to: 502-540
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/evidence/collect.go` around lines 279 - 282, The code ignores the --since/startTime when collecting Kubernetes events because startTime isn't passed into collectKubernetesEvents; update the call site to pass startTime (or set it on the receiver) and change collectKubernetesEvents (the method named collectKubernetesEvents) to accept and use a startTime parameter (or read the receiver field) to filter events by event.Timestamp >= startTime so --include-events --since 1h only returns recent events; ensure any auxiliary logic that parses --since (the startTime variable) is reused rather than recomputed.
357-358:⚠️ Potential issue | 🔴 CriticalBind every
ocinvocation to the requested cluster.All of these calls run against whatever kubeconfig context the operator already has active.
-Cis only used for OCM lookup and metadata, so this command can collect evidence from the wrong cluster while labeling it with the requestedClusterID.Also applies to: 415-415, 473-473, 503-503, 558-558
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/evidence/collect.go` around lines 357 - 358, The oc calls in collectNodes currently run against the active kubeconfig; update the exec.CommandContext invocations in collectNodes (and the other occurrences noted) to pass the requested cluster binding by appending the appropriate flags (either "--kubeconfig" with the requested kubeconfig path or "--context" with the requested cluster context) sourced from the options struct (e.g., use o.KubeconfigPath or o.ClusterContext) so every oc invocation targets the intended cluster.cmd/cluster/snapshot.go (2)
63-69:⚠️ Potential issue | 🟠 MajorScope and enrich extra resource snapshots.
Line 383 always queries
--all-namespaces, so--namespacesonly narrows the namespace list, not the additional resources. On top of that,ResourceInfostill only persistsstatus.phase, which means later diffs miss most rollout/spec updates even when the object changed.Also applies to: 382-411
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/cluster/snapshot.go` around lines 63 - 69, The snapshot logic currently always queries extra resources with --all-namespaces (so --namespaces doesn't scope extras) and the ResourceInfo struct only stores status.phase, causing diffs to miss spec/rollout changes; update the query that builds extra resource lists (the code around the call that currently forces "--all-namespaces") to respect the provided namespaces flag and only include --all-namespaces when no namespaces are specified, and extend the ResourceInfo type (and any serialization logic that populates it) to persist richer fields such as Metadata.ResourceVersion, Labels, Annotations, and the full Spec and Status (as raw JSON/YAML strings) instead of just status.phase so downstream diffs detect spec/status changes (refer to ResourceInfo and the function that collects extra resources).
121-205:⚠️ Potential issue | 🔴 CriticalBind every
occall to the requested cluster and fail if core captures are incomplete.Line 132 resolves the target cluster through OCM, but the capture paths started on Lines 162, 171, 180, and 190 still shell out to whichever kubecontext
ocis already using. Because those errors are only logged, the command can still exit 0 after writing a snapshot from the wrong cluster or with missing core sections.Also applies to: 207-209, 269-274, 323-326, 382-385
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/cluster/snapshot.go` around lines 121 - 205, The snapshot routine resolves the target cluster via utils.GetClusterAnyStatus in snapshotOptions.run but then calls captureNodes, captureNamespaces, captureClusterOperators and captureResources which shell out to oc without being bound to that cluster and currently only log errors, allowing successful exit with wrong or partial data; modify the capture* functions (captureNodes, captureNamespaces, captureClusterOperators, captureResources) to accept an explicit cluster binding parameter (e.g., cluster.ID or the cluster kubeconfig/API endpoint token returned by utils or a new utils.GetClusterKubeconfig function) and ensure every oc invocation is invoked with that binding (or set KUBECONFIG/oc --server/--context accordingly) so it always targets the resolved cluster, and change snapshotOptions.run to treat errors from these core captures as fatal by returning the error (instead of only logging) so writeSnapshot is only called when all core captures succeed; update any callers of these methods and ensure writeSnapshot remains unchanged for saving ClusterSnapshot.cmd/cluster/diff.go (1)
190-232:⚠️ Potential issue | 🟠 MajorDiff the node and namespace fields you already snapshot.
cmd/cluster/snapshot.go:37-50captures node roles, version, conditions, and labels, andcmd/cluster/snapshot.go:36-41captures namespace labels. These comparators still only react toStatus, so upgrades, role flips, condition changes, and label mutations never surface unless the status string also changes.Also applies to: 316-359
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/cluster/diff.go` around lines 190 - 232, compareNodes only compares Status (and only includes Roles on add/remove), so changes to Roles, Version, Conditions, or Labels from NodeSnapshot never produce "modified" diffs; do the same deeper field-wise comparison used in snapshot.go by comparing Roles (order-insensitive), Version, Conditions, and Labels (use reflect.DeepEqual or explicit map/slice comparison) and when any of those differ append a "modified" NodeDiff for that node with Before/After strings that include the relevant fields (e.g., Status, Roles, Version, Conditions, Labels) using the NodeSnapshot and NodeDiff types; apply the same fix to the analogous comparator (compareNamespaces or the function around lines 316-359) so namespace label changes are detected and included in Before/After.
🧹 Nitpick comments (1)
docs/README.md (1)
1107-1125: Add language labels to the new fenced blocks.All of the new synopsis/flags fences are unlabeled, which is why markdownlint is reporting MD040 throughout these sections. Tagging them as
textand the example fences asbashwill keep the README lint-clean.Suggested doc fix
-``` +```text osdctl aws [flags] -``` +``` ... -``` +```text --as string Username to impersonate for the operation. User could be a regular user or a service account in a namespace. --cluster string The name of the kubeconfig cluster to use --context string The name of the kubeconfig context to use ... -``` +```Also applies to: 1134-1152, 1175-1198, 1606-1625, 2193-2214, 3018-3036, 3064-3088
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/README.md` around lines 1107 - 1125, The unlabeled fenced code blocks (e.g., the synopsis block starting with "osdctl aws [flags]" and the flags block containing lines like "--as string Username to impersonate...") need language labels to satisfy markdownlint MD040; update those backtick fences to use ```text for the synopsis/flags blocks and use ```bash for any example/command blocks, ensuring you add the label on the opening fence only (e.g., replace ``` with ```text or ```bash as appropriate) and apply the same change to the other similar sections mentioned in the comment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/aws/cloudtrail/errors.go`:
- Line 98: The call to errorsCmd.MarkFlagRequired("cluster-id") is unchecked;
capture its returned error and handle it instead of ignoring it—wrap the call in
an err := ... assignment and if err != nil then return the error (or call the
command init/fatal logger) so wiring mistakes surface at startup; update the
initialization code that calls errorsCmd.MarkFlagRequired to check the error and
propagate or log it (reference: errorsCmd.MarkFlagRequired).
In `@cmd/cluster/diff.go`:
- Around line 152-157: compareOperators currently appends one change record per
changed field, so using len(result.OperatorChanges) overcounts operators;
instead compute the number of unique operators and assign that to
result.Summary.OperatorsChanged. Modify the code after compareOperators(...)
(referencing result.OperatorChanges and result.Summary.OperatorsChanged) to
build a set keyed by the operator identity used in change records (e.g.,
operator name/namespace or the unique ID present in each change entry), count
the unique keys and assign that count; apply the same dedup logic to the other
occurrence mentioned (lines ~235-309) where operator changes are summarized.
In `@cmd/cluster/snapshot.go`:
- Around line 417-433: The snapshot file is currently written with
world-readable permissions (0644) in snapshotOptions.writeSnapshot; change the
file write to use owner-only permissions (0600) so the evidence file is
restricted to the invoking user—update the os.WriteFile call that writes
o.OutputFile (and any related write path in writeSnapshot) to use 0600 instead
of 0644 so only the file owner can read/write the snapshot.
- Around line 111-116: The two calls that ignore errors are
snapshotCmd.MarkFlagRequired("cluster-id") and
snapshotCmd.MarkFlagRequired("output"); change them to use cobra.CheckErr(...)
so the returned error is handled (e.g.,
cobra.CheckErr(snapshotCmd.MarkFlagRequired("cluster-id")) and
cobra.CheckErr(snapshotCmd.MarkFlagRequired("output"))), leaving the rest of the
flag registration (snapshotCmd.Flags().StringVarP/ StringSliceVar and opts
fields) unchanged.
In `@cmd/evidence/collect.go`:
- Around line 569-575: The file writes sensitive evidence files with
world-readable permissions; update the writer to use restrictive file mode
(e.g., 0600) instead of 0644: in the saveEvidence method
(collectOptions.saveEvidence) replace the os.WriteFile call so the file is
created/written with owner-only read/write permissions, and apply the same
change to the other evidence/summary writer (the other os.WriteFile usage
referenced in the review) so both evidence.yaml and summary.txt are written with
restrictive permissions.
- Around line 152-156: The help text in cmd/evidence/collect.go promises that
cluster state includes "namespaces" but the implementation doesn't collect or
serialize namespaces; either remove "namespaces" from the descriptive
comment/help string or implement namespace collection. To fix, either update the
help/description text in collect.go to drop "namespaces" so it matches the
actual outputs, or add namespace collection by calling the Kubernetes API
(client.CoreV1().Namespaces().List) in the same cluster-state collection path
and include the result in the same serialization/writing code path used by the
existing cluster state writer (the cluster-state collection/serialization
functions in collect.go such as the cluster state writer or
serializeClusterState/WriteClusterState helpers). Ensure the change is applied
to the help string and/or the cluster-state collection code so documentation
matches behavior.
- Around line 654-660: The parseDurationToUTC function currently allows zero and
negative durations which produce future start times; after calling
time.ParseDuration in parseDurationToUTC, validate that duration is strictly
positive (duration > 0) and return a descriptive error (e.g., "lookback must be
greater than zero") for zero or negative values instead of computing a time;
keep the function name parseDurationToUTC and its existing error wrapping for
parse failures, but add this extra validation branch to prevent future start
times.
In `@docs/osdctl_aws_cloudtrail_errors.md`:
- Around line 13-28: Wrap the examples for the command "osdctl aws cloudtrail
errors" in a fenced bash code block so the lines beginning with "#" are treated
as shell comments rather than markdown headings; replace the current loose
example lines with a ```bash block containing the comment lines (e.g., "# Get
permission errors...") followed by the corresponding commands and close the
fence with ```, ensuring the usage snippet remains inside the fenced block.
In `@docs/README.md`:
- Around line 1162-1174: The example command blocks (e.g., lines containing
"osdctl aws cloudtrail errors -C <cluster-id> --since 1h" and the other example
commands) are currently indented and parsed as Markdown headings; wrap each
example set in a fenced code block using triple backticks with "bash" (```bash)
at the start and a closing ``` at the end, ensuring the leading "#" lines remain
as comment lines inside the fence; apply the same fix for the other example
groups mentioned (around the sections that contain the commands ending with
"--json", "--link", and "--error-types AccessDenied,UnauthorizedOperation").
- Around line 2183-2191: Update the example commands for osdctl cluster snapshot
to use the documented flag name -C (or --cluster-id) instead of -c so
copy-pasted commands work; locate the example lines containing "osdctl cluster
snapshot -c <cluster-id> ..." and replace -c with -C, and ensure the note about
the flag table still matches the examples (refer to the osdctl cluster snapshot
examples in the README).
---
Duplicate comments:
In `@cmd/cluster/diff.go`:
- Around line 190-232: compareNodes only compares Status (and only includes
Roles on add/remove), so changes to Roles, Version, Conditions, or Labels from
NodeSnapshot never produce "modified" diffs; do the same deeper field-wise
comparison used in snapshot.go by comparing Roles (order-insensitive), Version,
Conditions, and Labels (use reflect.DeepEqual or explicit map/slice comparison)
and when any of those differ append a "modified" NodeDiff for that node with
Before/After strings that include the relevant fields (e.g., Status, Roles,
Version, Conditions, Labels) using the NodeSnapshot and NodeDiff types; apply
the same fix to the analogous comparator (compareNamespaces or the function
around lines 316-359) so namespace label changes are detected and included in
Before/After.
In `@cmd/cluster/snapshot.go`:
- Around line 63-69: The snapshot logic currently always queries extra resources
with --all-namespaces (so --namespaces doesn't scope extras) and the
ResourceInfo struct only stores status.phase, causing diffs to miss spec/rollout
changes; update the query that builds extra resource lists (the code around the
call that currently forces "--all-namespaces") to respect the provided
namespaces flag and only include --all-namespaces when no namespaces are
specified, and extend the ResourceInfo type (and any serialization logic that
populates it) to persist richer fields such as Metadata.ResourceVersion, Labels,
Annotations, and the full Spec and Status (as raw JSON/YAML strings) instead of
just status.phase so downstream diffs detect spec/status changes (refer to
ResourceInfo and the function that collects extra resources).
- Around line 121-205: The snapshot routine resolves the target cluster via
utils.GetClusterAnyStatus in snapshotOptions.run but then calls captureNodes,
captureNamespaces, captureClusterOperators and captureResources which shell out
to oc without being bound to that cluster and currently only log errors,
allowing successful exit with wrong or partial data; modify the capture*
functions (captureNodes, captureNamespaces, captureClusterOperators,
captureResources) to accept an explicit cluster binding parameter (e.g.,
cluster.ID or the cluster kubeconfig/API endpoint token returned by utils or a
new utils.GetClusterKubeconfig function) and ensure every oc invocation is
invoked with that binding (or set KUBECONFIG/oc --server/--context accordingly)
so it always targets the resolved cluster, and change snapshotOptions.run to
treat errors from these core captures as fatal by returning the error (instead
of only logging) so writeSnapshot is only called when all core captures succeed;
update any callers of these methods and ensure writeSnapshot remains unchanged
for saving ClusterSnapshot.
In `@cmd/evidence/collect.go`:
- Around line 333-355: collectClusterState currently swallows errors from
collectNodes, collectOperators, and collectMachineConfigs and always returns
nil; update collectClusterState to detect and surface failures from those
sub-collectors (collectNodes, collectOperators, collectMachineConfigs) instead
of silently ignoring them: call each collector, if any returns a non-nil error
record it (either return the first error immediately or aggregate multiple
errors) and return a non-nil error alongside any partially-collected
ClusterState so callers know the cluster state is incomplete.
- Around line 543-549: The collectCloudTrailData placeholder in collectOptions
(function collectCloudTrailData) currently returns an empty CloudTrailData
struct and nil error, causing callers to treat “not collected” as valid empty
evidence; change it to indicate non-collection by returning (nil, error) instead
of &CloudTrailData{} — e.g. return nil, fmt.Errorf("cloudtrail data collection
not implemented") or a package-level sentinel error like ErrNotCollected — and
update any callers to handle a nil *CloudTrailData (or that sentinel) so
placeholder data is never serialized as real evidence.
- Around line 279-282: The code ignores the --since/startTime when collecting
Kubernetes events because startTime isn't passed into collectKubernetesEvents;
update the call site to pass startTime (or set it on the receiver) and change
collectKubernetesEvents (the method named collectKubernetesEvents) to accept and
use a startTime parameter (or read the receiver field) to filter events by
event.Timestamp >= startTime so --include-events --since 1h only returns recent
events; ensure any auxiliary logic that parses --since (the startTime variable)
is reused rather than recomputed.
- Around line 357-358: The oc calls in collectNodes currently run against the
active kubeconfig; update the exec.CommandContext invocations in collectNodes
(and the other occurrences noted) to pass the requested cluster binding by
appending the appropriate flags (either "--kubeconfig" with the requested
kubeconfig path or "--context" with the requested cluster context) sourced from
the options struct (e.g., use o.KubeconfigPath or o.ClusterContext) so every oc
invocation targets the intended cluster.
---
Nitpick comments:
In `@docs/README.md`:
- Around line 1107-1125: The unlabeled fenced code blocks (e.g., the synopsis
block starting with "osdctl aws [flags]" and the flags block containing lines
like "--as string Username to impersonate...") need
language labels to satisfy markdownlint MD040; update those backtick fences to
use ```text for the synopsis/flags blocks and use ```bash for any
example/command blocks, ensuring you add the label on the opening fence only
(e.g., replace ``` with ```text or ```bash as appropriate) and apply the same
change to the other similar sections mentioned in the comment.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 2570e793-1ae3-4e89-8bbf-73076384ad8e
📒 Files selected for processing (15)
README.mdcmd/aws/cloudtrail/cmd.gocmd/aws/cloudtrail/errors.gocmd/aws/cmd.gocmd/cluster/cmd.gocmd/cluster/diff.gocmd/cluster/snapshot.gocmd/cmd.gocmd/evidence/cmd.gocmd/evidence/collect.godocs/README.mddocs/osdctl.mddocs/osdctl_aws.mddocs/osdctl_aws_cloudtrail.mddocs/osdctl_aws_cloudtrail_errors.md
🚧 Files skipped from review as they are similar to previous changes (5)
- cmd/aws/cmd.go
- cmd/evidence/cmd.go
- cmd/cluster/cmd.go
- cmd/aws/cloudtrail/cmd.go
- cmd/cmd.go
|
@MitaliBhalla: This pull request references SREP-3620 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 7
♻️ Duplicate comments (10)
docs/README.md (1)
1162-1174:⚠️ Potential issue | 🟡 MinorWrap these example groups in fenced
bashblocks again.The indented
# ...lines are rendering as headings instead of shell comments, so these four sections are back to triggering the same Markdown issue.Suggested doc fix
Examples: - # Get permission errors from the last hour - osdctl aws cloudtrail errors -C <cluster-id> --since 1h - - # Get errors from the last 30 minutes with JSON output - osdctl aws cloudtrail errors -C <cluster-id> --since 30m --json +```bash +# Get permission errors from the last hour +osdctl aws cloudtrail errors -C <cluster-id> --since 1h + +# Get errors from the last 30 minutes with JSON output +osdctl aws cloudtrail errors -C <cluster-id> --since 30m --json +```Apply the same pattern to the
cluster diff,cluster snapshot, andevidence collectexample groups.Also applies to: 1599-1604, 2183-2191, 3051-3062
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/README.md` around lines 1162 - 1174, The Markdown example blocks containing shell comments (e.g., the groups showing "osdctl aws cloudtrail errors -C <cluster-id> ..." and the similar "cluster diff", "cluster snapshot", and "evidence collect" example groups) must be wrapped in fenced code blocks with a bash language tag so leading `#` lines render as comments instead of headings; update each example group by adding a starting "```bash" before the first commented line and a closing "```" after the last command for the cloudtrail, cluster diff, cluster snapshot and evidence collect example sections.cmd/cluster/snapshot.go (3)
115-116:⚠️ Potential issue | 🟡 MinorHandle the
MarkFlagRequiredreturn values.These calls still discard Cobra's returned error, so
errcheckwill continue to flag the constructor path. Prefercobra.CheckErr(...)or propagate the error.Suggested fix
- _ = snapshotCmd.MarkFlagRequired("cluster-id") - _ = snapshotCmd.MarkFlagRequired("output") + cobra.CheckErr(snapshotCmd.MarkFlagRequired("cluster-id")) + cobra.CheckErr(snapshotCmd.MarkFlagRequired("output"))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/cluster/snapshot.go` around lines 115 - 116, The two calls to snapshotCmd.MarkFlagRequired currently ignore returned errors; update the constructor to capture and handle these by checking the error (e.g., err := snapshotCmd.MarkFlagRequired("cluster-id"); cobra.CheckErr(err) and similarly for "output") or propagate the error up, ensuring both snapshotCmd.MarkFlagRequired invocations are not discarding their returned error (reference snapshotCmd.MarkFlagRequired and use cobra.CheckErr or return the err).
63-69:⚠️ Potential issue | 🟠 Major
ResourceInfois still too lossy for meaningful diffs.Only
status.phaseis persisted for arbitrary resources. For common types like Deployments, Services, ConfigMaps, and many CRs, that field is empty or unrelated to the change users care about, so later diffs mostly collapse to add/remove detection and miss rollout/spec changes.Also applies to: 404-411
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/cluster/snapshot.go` around lines 63 - 69, The ResourceInfo struct only persists name/namespace/kind/status and loses meaningful details for diffs; update ResourceInfo to include either the full unstructured object (e.g., Raw map[string]interface{} or JSON/YAML blob) or at minimum include metadata.generation, metadata.resourceVersion, metadata.labels/annotations, and the spec/template (or a computed hash of spec/template + important status fields) so rollout/spec changes are detectable; locate the ResourceInfo type and any code that serializes/deserializes snapshots and change them to capture and persist these additional fields (or the full object) and adjust comparison logic to use the new spec/template or hash fields when generating diffs.
121-205:⚠️ Potential issue | 🔴 CriticalCollect from the requested cluster, not whichever
occontext happens to be active.
run()resolveso.ClusterIDthrough OCM, but every capture path later shells out toocwithout passing a kubeconfig/context or validating the current one. If the caller is logged into a different cluster, the snapshot metadata comes from one cluster while nodes/namespaces/operators come from another.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/cluster/snapshot.go` around lines 121 - 205, run() currently resolves o.ClusterID via OCM but then lets captureNodes(), captureNamespaces(), captureClusterOperators(), and captureResources() shell out to oc using whatever context is active; fetch the target cluster kubeconfig from OCM (or via an existing helper like utils.GetClusterKubeconfig/GetClusterAdminKubeconfig) immediately in run(), save it to a temp file or to o.Kubeconfig, and ensure each capture function (captureNodes, captureNamespaces, captureClusterOperators, captureResources) uses that kubeconfig (or the temp file path) when invoking oc (or pass it down as an argument) rather than relying on the current oc context; update run() to set/clean the temp kubeconfig and adjust the capture function signatures to accept the kubeconfig if needed so all captured resources come from the same cluster resolved by run().docs/osdctl_aws_cloudtrail_errors.md (1)
13-28:⚠️ Potential issue | 🟡 MinorFence the examples block.
The lines beginning with
#are still parsed as markdown headings instead of shell comments. Please wrap the examples in a fencedbashblock.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/osdctl_aws_cloudtrail_errors.md` around lines 13 - 28, The examples block is not fenced with a bash code fence so lines starting with # are being treated as Markdown headings; wrap the entire examples snippet (the four example commands and the usage line ``osdctl aws cloudtrail errors [flags]``) inside a fenced code block that specifies bash (i.e., start with ```bash and end with ```), ensuring the existing `#` comment prefixes remain inside the fenced block; update the block around the examples in the docs/osdctl_aws_cloudtrail_errors.md file where the Examples section and the usage line appear.cmd/cluster/diff.go (1)
216-223:⚠️ Potential issue | 🟠 MajorNode and namespace diffs still miss fields already captured in the snapshot.
compareNodesonly reportsStatuschanges, andcompareNamespacesdoes the same. That drops nodeRoles,Version,Conditions,Labels, and namespaceLabelschanges even though those fields are present in the snapshot model.Also applies to: 342-348
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/cluster/diff.go` around lines 216 - 223, compareNodes and compareNamespaces only record Status changes; update compareNodes and compareNamespaces to also detect and append NodeDiff/NamespaceDiff entries for Roles, Version, Conditions, and Labels (for nodes) and Labels (for namespaces) when values differ between beforeNode/beforeNamespace and afterNode/afterNamespace. Locate the compareNodes and compareNamespaces functions and, alongside the existing Status check, add equality checks for Roles (slice/set comparison), Version (string), Conditions (structured comparison), and Labels (map comparison), creating NodeDiff/NamespaceDiff items with ChangeType "modified" and Before/After formatted values; apply the same additional checks in the other similar compare block later in the file where node/namespace comparisons are duplicated.cmd/evidence/collect.go (4)
276-289:⚠️ Potential issue | 🟠 Major
--sinceflag is not applied to Kubernetes events collection.The
startTimeis computed from the--sinceflag but never passed tocollectKubernetesEvents(). The function retrieves all events regardless of the time window, which can produce a large, unfocused dataset on busy clusters.🔧 Proposed fix
Update the function signature to accept startTime and filter events:
-func (o *collectOptions) collectKubernetesEvents() ([]EventInfo, error) { +func (o *collectOptions) collectKubernetesEvents(startTime time.Time) ([]EventInfo, error) {Then filter events by timestamp after parsing, or use
--field-selectorwith theoccommand if supported.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/evidence/collect.go` around lines 276 - 289, The events collection ignores the computed startTime from the --since flag because startTime isn’t passed into collectKubernetesEvents; update the call in the IncludeEvents block to pass the startTime and change the collectKubernetesEvents signature to accept a time.Time (or pointer) parameter, then filter returned events inside collectKubernetesEvents by their timestamps (e.g., Event.LastTimestamp / Event.FirstTimestamp) to only include events after startTime (alternatively use a field-selector when executing oc/kubectl inside collectKubernetesEvents); ensure ClusterState.Events is still set with the filtered slice.
540-547:⚠️ Potential issue | 🟠 MajorCloudTrail placeholder returns empty data indistinguishable from "collected but empty".
collectCloudTrailDataalways returns&CloudTrailData{}without actually collecting anything. The caller assigns this toevidence.CloudTrailDataand prints success, making "not implemented" indistinguishable from "no errors found."Either:
- Return
niland don't setevidence.CloudTrailDatawhen collection isn't performed- Integrate with the actual CloudTrail collection logic from
cmd/aws/cloudtrail/errors.go- Return an error indicating the feature is not yet implemented
🔧 Minimal fix - return nil instead of empty struct
func (o *collectOptions) collectCloudTrailData(startTime time.Time) (*CloudTrailData, error) { // Note: CloudTrail data collection is handled by the 'osdctl aws cloudtrail errors' command - // This function provides a placeholder for the evidence collection summary - // For detailed CloudTrail analysis, use: osdctl aws cloudtrail errors -c <cluster-id> --since <duration> - _ = startTime - - return &CloudTrailData{}, nil + // For detailed CloudTrail analysis, use: osdctl aws cloudtrail errors -C <cluster-id> --since <duration> + return nil, nil }Then update the caller to check for nil before assigning.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/evidence/collect.go` around lines 540 - 547, The collectCloudTrailData function currently returns an empty &CloudTrailData{} which is indistinguishable from a successful empty result; change collectCloudTrailData to return nil (and a nil error or a specific "not implemented" error) when CloudTrail collection is not performed, and update the caller sites that assign evidence.CloudTrailData to check for nil before setting it (or propagate the error) so callers can differentiate "not implemented" from "collected but empty"; alternatively, if you prefer to implement collection now, wire in the real logic from cmd/aws/cloudtrail/errors.go into collectCloudTrailData instead of returning an empty struct.
330-352:⚠️ Potential issue | 🟠 Major
collectClusterStatesilently swallows all sub-collector errors.The function ignores errors from
collectNodes,collectOperators, andcollectMachineConfigs, always returningnilerror. This makes failed collections indistinguishable from successful but empty ones. The caller at line 248-255 will report success even when all sub-collectors failed.🔧 Proposed fix - propagate errors
func (o *collectOptions) collectClusterState() (*ClusterState, error) { state := &ClusterState{} + var errs []error // Collect nodes nodes, err := o.collectNodes() - if err == nil { + if err != nil { + errs = append(errs, fmt.Errorf("nodes: %w", err)) + } else { state.Nodes = nodes } // Collect operators operators, err := o.collectOperators() - if err == nil { + if err != nil { + errs = append(errs, fmt.Errorf("operators: %w", err)) + } else { state.Operators = operators } // Collect machine configs machineConfigs, err := o.collectMachineConfigs() - if err == nil { + if err != nil { + errs = append(errs, fmt.Errorf("machineconfigs: %w", err)) + } else { state.MachineConfigs = machineConfigs } - return state, nil + if len(errs) > 0 { + return state, fmt.Errorf("partial collection failure: %v", errs) + } + return state, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/evidence/collect.go` around lines 330 - 352, collectClusterState currently swallows errors from collectNodes, collectOperators, and collectMachineConfigs and always returns nil error; change it to propagate failures by checking each err and returning an error if any sub-collector fails (e.g., return nil, err or aggregate errors into one descriptive error using fmt.Errorf or a multi-error approach) while still filling state with successful parts; update the function around collectNodes, collectOperators, and collectMachineConfigs to return immediately on non-nil err (or collect and combine errors) so callers of collectClusterState can detect failures.
354-358:⚠️ Potential issue | 🟠 MajorAll
occommands execute against the current kubeconfig context, not the target cluster.The cluster is resolved via OCM (line 199), but none of the
exec.CommandContextcalls foroc get nodes,oc get clusteroperators,oc get machineconfigs,oc get events, oroc adm must-gatherare bound to the target cluster. This can collect evidence from the wrong cluster while labeling it with the correct cluster ID.Consider either:
- Using
--contextor--kubeconfigflags with the target cluster's configuration- Documenting that users must ensure their current kubeconfig context matches the target cluster
- Using the Kubernetes client-go library with cluster credentials from OCM/backplane instead of shelling out
Also applies to: 411-415, 469-473, 499-503, 549-561
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/evidence/collect.go` around lines 354 - 358, The shelled-out oc commands (e.g., in collectNodes()) run against the current kubeconfig context instead of the target cluster resolved via OCM; update all exec.CommandContext usages (collectNodes, collectClusterOperators, collectMachineConfigs, collectEvents, runMustGather or similar) to bind to the target cluster by passing the cluster kubeconfig/context obtained during cluster resolution: either add "--kubeconfig", or "--context <name>" to the oc command args (or set KUBECONFIG in cmd.Env) using the kubeconfig returned by your OCM/backplane resolver, or replace these shells with a client-go implementation using the resolved credentials; ensure the same resolver symbol that collects the cluster kubeconfig is used so each oc invocation targets the intended cluster.
🧹 Nitpick comments (3)
docs/osdctl_aws.md (1)
13-15: Consider adding language specifiers to fenced code blocks.The fenced code blocks lack language specifiers, which can affect syntax highlighting and accessibility. For CLI help output, consider using
textorshellidentifiers.This is a minor documentation nit, especially if these docs are auto-generated by Cobra.
Also applies to: 19-21, 25-36
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/osdctl_aws.md` around lines 13 - 15, Update the fenced code blocks in the docs that show CLI examples (e.g., the block containing "osdctl aws [flags]" and the other blocks noted at lines 19-21 and 25-36) to include a language specifier such as "shell" or "text" (e.g., change ``` to ```shell) so syntax highlighting and accessibility are improved; ensure all CLI/example blocks in the file use the same appropriate specifier.cmd/aws/cloudtrail/errors.go (1)
98-98: Handle theMarkFlagRequirederror.The error from
MarkFlagRequiredis silently discarded. While this rarely fails in practice, it can hide wiring mistakes if the flag name is changed.🔧 Suggested fix
- _ = errorsCmd.MarkFlagRequired("cluster-id") + if err := errorsCmd.MarkFlagRequired("cluster-id"); err != nil { + panic(fmt.Sprintf("failed to mark cluster-id as required: %v", err)) + }Or using Cobra's helper:
- _ = errorsCmd.MarkFlagRequired("cluster-id") + cobra.CheckErr(errorsCmd.MarkFlagRequired("cluster-id"))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/aws/cloudtrail/errors.go` at line 98, The call to errorsCmd.MarkFlagRequired("cluster-id") currently discards its error; change it to handle the returned error (e.g., capture the error from MarkFlagRequired and propagate or log it using your project's error handling, or use Cobra's helper like cobra.CheckErr) so wiring mistakes are not silently ignored—update the call site where errorsCmd.MarkFlagRequired("cluster-id") is invoked to check the returned err and handle it appropriately.cmd/evidence/collect.go (1)
182-183: HandleMarkFlagRequirederrors.Both
MarkFlagRequiredcalls silently discard errors. This can hide wiring mistakes if flag names change.🔧 Suggested fix
- _ = collectCmd.MarkFlagRequired("cluster-id") - _ = collectCmd.MarkFlagRequired("output") + cobra.CheckErr(collectCmd.MarkFlagRequired("cluster-id")) + cobra.CheckErr(collectCmd.MarkFlagRequired("output"))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/evidence/collect.go` around lines 182 - 183, The MarkFlagRequired calls on collectCmd currently discard errors; update the two calls to capture their returned error and handle it instead of assigning to blank identifier — e.g., err := collectCmd.MarkFlagRequired("cluster-id"); if err != nil { cobra.CheckErr(err) } (or log.Fatalf/return the error from the enclosing init/setup function) and do the same for collectCmd.MarkFlagRequired("output"); this ensures wiring mistakes are surfaced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/aws/cloudtrail/errors.go`:
- Around line 350-356: The whoami function dereferences output.Arn and
output.Account without nil checks; add safeguards to prevent panics by verifying
output is non-nil and that output.Arn and output.Account are non-nil before
dereferencing. If any of those are nil, return a descriptive error (e.g., "empty
GetCallerIdentity response: missing Arn/Account") and preserve the existing
return types; otherwise safely dereference and return the values. Use the
sts.GetCallerIdentity response (output) and its Arn and Account fields as the
unique identifiers to locate where to add these checks.
In `@cmd/cluster/diff.go`:
- Around line 262-298: compareOperators currently only checks Available,
Degraded, Progressing and Version on OperatorSnapshot, so changes to other
entries in the Conditions slice are missed; update compareOperators (and any
helper used to diff operators) to also diff the Conditions field: iterate the
beforeOp.Conditions and afterOp.Conditions (matching by Condition.Type or
creating maps keyed by Type), and for any condition present/removed or whose
Status/Reason/Message (or other relevant fields) changed, append an OperatorDiff
with Field set to e.g. "Conditions.<Type>.<Subfield>" (or "Conditions.<Type>"
for added/removed) and populate Before/After with the previous and current
values; ensure you handle added, removed and modified condition entries so all
condition-only changes surface in the diffs.
- Around line 110-125: In run(), before calling compareSnapshots, verify the two
loaded snapshots belong to the same cluster by comparing
beforeSnapshot.Metadata.ClusterID and afterSnapshot.Metadata.ClusterID; if they
differ, return a clear error (e.g. "snapshots belong to different clusters:
before=<id> after=<id>") from diffOptions.run so we fail fast instead of
producing a misleading diff. Ensure you reference the existing loadSnapshot
return values (beforeSnapshot, afterSnapshot) and perform the check immediately
after loading the after snapshot and before calling compareSnapshots.
In `@cmd/cluster/snapshot.go`:
- Around line 160-195: The current logic swallows errors from core collectors
(o.captureNodes, o.captureNamespaces, o.captureClusterOperators) and writes
empty sections into snapshot, which leads to false diffs; change the behavior so
the snapshot creation fails fast on these core-section errors: when
o.captureNodes, o.captureNamespaces, or o.captureClusterOperators returns a
non-nil err, return that error from the enclosing snapshot creation function
(instead of only logging) so the snapshot is not persisted, or alternatively
populate an explicit snapshot-level error field (e.g., Snapshot.Errors["nodes"]
= err.Error()) and ensure downstream diffing ignores sections with recorded
errors; locate and update the block that assigns
snapshot.Nodes/snapshot.Namespaces/snapshot.Operators and the surrounding
function to implement one of these two fixes.
In `@docs/osdctl_cluster_diff.md`:
- Around line 18-27: Wrap the examples block in a bash fenced code block so the
comment lines are treated as shell instead of headings: add a starting "```bash"
before the "Examples:" line and ensure there's a matching closing "```" after
the usage block (so the "Examples:" line, the two "# Compare ..." lines and the
usage snippet are all inside the fenced block); update the block surrounding the
existing usage snippet (osdctl cluster diff ...) to be enclosed by the new
```bash fence so markdown renders correctly.
In `@docs/osdctl_cluster_snapshot.md`:
- Around line 18-30: The examples block currently isn't fenced and comment lines
like "# Capture cluster snapshot to a file" are rendered as Markdown headings;
wrap the entire examples section (from the first example comment through the
final "osdctl cluster snapshot [flags]" line) in a fenced code block with the
language tag bash (add "```bash" before the first example and "```" after the
last line) so the commands (including the literal "osdctl cluster snapshot
[flags]") render and copy/paste correctly.
In `@docs/README.md`:
- Around line 1107-1125: The generated Markdown code fences in
pkg/docgen/docgen.go are created without language specifiers (the usage and
flags outputs currently emit plain ```), which triggers MD040; update the two
places that emit the usage block and the flags block so they open with a
language tag (e.g., change the usage block fence from ``` to ```bash or ```text,
and change the flags block fence from ``` to ```text), ensuring both the
usage-generation site and the flags-generation site in docgen.go use the new
fenced form.
---
Duplicate comments:
In `@cmd/cluster/diff.go`:
- Around line 216-223: compareNodes and compareNamespaces only record Status
changes; update compareNodes and compareNamespaces to also detect and append
NodeDiff/NamespaceDiff entries for Roles, Version, Conditions, and Labels (for
nodes) and Labels (for namespaces) when values differ between
beforeNode/beforeNamespace and afterNode/afterNamespace. Locate the compareNodes
and compareNamespaces functions and, alongside the existing Status check, add
equality checks for Roles (slice/set comparison), Version (string), Conditions
(structured comparison), and Labels (map comparison), creating
NodeDiff/NamespaceDiff items with ChangeType "modified" and Before/After
formatted values; apply the same additional checks in the other similar compare
block later in the file where node/namespace comparisons are duplicated.
In `@cmd/cluster/snapshot.go`:
- Around line 115-116: The two calls to snapshotCmd.MarkFlagRequired currently
ignore returned errors; update the constructor to capture and handle these by
checking the error (e.g., err := snapshotCmd.MarkFlagRequired("cluster-id");
cobra.CheckErr(err) and similarly for "output") or propagate the error up,
ensuring both snapshotCmd.MarkFlagRequired invocations are not discarding their
returned error (reference snapshotCmd.MarkFlagRequired and use cobra.CheckErr or
return the err).
- Around line 63-69: The ResourceInfo struct only persists
name/namespace/kind/status and loses meaningful details for diffs; update
ResourceInfo to include either the full unstructured object (e.g., Raw
map[string]interface{} or JSON/YAML blob) or at minimum include
metadata.generation, metadata.resourceVersion, metadata.labels/annotations, and
the spec/template (or a computed hash of spec/template + important status
fields) so rollout/spec changes are detectable; locate the ResourceInfo type and
any code that serializes/deserializes snapshots and change them to capture and
persist these additional fields (or the full object) and adjust comparison logic
to use the new spec/template or hash fields when generating diffs.
- Around line 121-205: run() currently resolves o.ClusterID via OCM but then
lets captureNodes(), captureNamespaces(), captureClusterOperators(), and
captureResources() shell out to oc using whatever context is active; fetch the
target cluster kubeconfig from OCM (or via an existing helper like
utils.GetClusterKubeconfig/GetClusterAdminKubeconfig) immediately in run(), save
it to a temp file or to o.Kubeconfig, and ensure each capture function
(captureNodes, captureNamespaces, captureClusterOperators, captureResources)
uses that kubeconfig (or the temp file path) when invoking oc (or pass it down
as an argument) rather than relying on the current oc context; update run() to
set/clean the temp kubeconfig and adjust the capture function signatures to
accept the kubeconfig if needed so all captured resources come from the same
cluster resolved by run().
In `@cmd/evidence/collect.go`:
- Around line 276-289: The events collection ignores the computed startTime from
the --since flag because startTime isn’t passed into collectKubernetesEvents;
update the call in the IncludeEvents block to pass the startTime and change the
collectKubernetesEvents signature to accept a time.Time (or pointer) parameter,
then filter returned events inside collectKubernetesEvents by their timestamps
(e.g., Event.LastTimestamp / Event.FirstTimestamp) to only include events after
startTime (alternatively use a field-selector when executing oc/kubectl inside
collectKubernetesEvents); ensure ClusterState.Events is still set with the
filtered slice.
- Around line 540-547: The collectCloudTrailData function currently returns an
empty &CloudTrailData{} which is indistinguishable from a successful empty
result; change collectCloudTrailData to return nil (and a nil error or a
specific "not implemented" error) when CloudTrail collection is not performed,
and update the caller sites that assign evidence.CloudTrailData to check for nil
before setting it (or propagate the error) so callers can differentiate "not
implemented" from "collected but empty"; alternatively, if you prefer to
implement collection now, wire in the real logic from
cmd/aws/cloudtrail/errors.go into collectCloudTrailData instead of returning an
empty struct.
- Around line 330-352: collectClusterState currently swallows errors from
collectNodes, collectOperators, and collectMachineConfigs and always returns nil
error; change it to propagate failures by checking each err and returning an
error if any sub-collector fails (e.g., return nil, err or aggregate errors into
one descriptive error using fmt.Errorf or a multi-error approach) while still
filling state with successful parts; update the function around collectNodes,
collectOperators, and collectMachineConfigs to return immediately on non-nil err
(or collect and combine errors) so callers of collectClusterState can detect
failures.
- Around line 354-358: The shelled-out oc commands (e.g., in collectNodes()) run
against the current kubeconfig context instead of the target cluster resolved
via OCM; update all exec.CommandContext usages (collectNodes,
collectClusterOperators, collectMachineConfigs, collectEvents, runMustGather or
similar) to bind to the target cluster by passing the cluster kubeconfig/context
obtained during cluster resolution: either add "--kubeconfig", or "--context
<name>" to the oc command args (or set KUBECONFIG in cmd.Env) using the
kubeconfig returned by your OCM/backplane resolver, or replace these shells with
a client-go implementation using the resolved credentials; ensure the same
resolver symbol that collects the cluster kubeconfig is used so each oc
invocation targets the intended cluster.
In `@docs/osdctl_aws_cloudtrail_errors.md`:
- Around line 13-28: The examples block is not fenced with a bash code fence so
lines starting with # are being treated as Markdown headings; wrap the entire
examples snippet (the four example commands and the usage line ``osdctl aws
cloudtrail errors [flags]``) inside a fenced code block that specifies bash
(i.e., start with ```bash and end with ```), ensuring the existing `#` comment
prefixes remain inside the fenced block; update the block around the examples in
the docs/osdctl_aws_cloudtrail_errors.md file where the Examples section and the
usage line appear.
In `@docs/README.md`:
- Around line 1162-1174: The Markdown example blocks containing shell comments
(e.g., the groups showing "osdctl aws cloudtrail errors -C <cluster-id> ..." and
the similar "cluster diff", "cluster snapshot", and "evidence collect" example
groups) must be wrapped in fenced code blocks with a bash language tag so
leading `#` lines render as comments instead of headings; update each example
group by adding a starting "```bash" before the first commented line and a
closing "```" after the last command for the cloudtrail, cluster diff, cluster
snapshot and evidence collect example sections.
---
Nitpick comments:
In `@cmd/aws/cloudtrail/errors.go`:
- Line 98: The call to errorsCmd.MarkFlagRequired("cluster-id") currently
discards its error; change it to handle the returned error (e.g., capture the
error from MarkFlagRequired and propagate or log it using your project's error
handling, or use Cobra's helper like cobra.CheckErr) so wiring mistakes are not
silently ignored—update the call site where
errorsCmd.MarkFlagRequired("cluster-id") is invoked to check the returned err
and handle it appropriately.
In `@cmd/evidence/collect.go`:
- Around line 182-183: The MarkFlagRequired calls on collectCmd currently
discard errors; update the two calls to capture their returned error and handle
it instead of assigning to blank identifier — e.g., err :=
collectCmd.MarkFlagRequired("cluster-id"); if err != nil { cobra.CheckErr(err) }
(or log.Fatalf/return the error from the enclosing init/setup function) and do
the same for collectCmd.MarkFlagRequired("output"); this ensures wiring mistakes
are surfaced.
In `@docs/osdctl_aws.md`:
- Around line 13-15: Update the fenced code blocks in the docs that show CLI
examples (e.g., the block containing "osdctl aws [flags]" and the other blocks
noted at lines 19-21 and 25-36) to include a language specifier such as "shell"
or "text" (e.g., change ``` to ```shell) so syntax highlighting and
accessibility are improved; ensure all CLI/example blocks in the file use the
same appropriate specifier.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 27eb9a3b-169f-4fdd-a724-6b2fbd4e26ab
📒 Files selected for processing (20)
README.mdcmd/aws/cloudtrail/cmd.gocmd/aws/cloudtrail/errors.gocmd/aws/cmd.gocmd/cluster/cmd.gocmd/cluster/diff.gocmd/cluster/snapshot.gocmd/cmd.gocmd/evidence/cmd.gocmd/evidence/collect.godocs/README.mddocs/osdctl.mddocs/osdctl_aws.mddocs/osdctl_aws_cloudtrail.mddocs/osdctl_aws_cloudtrail_errors.mddocs/osdctl_cluster.mddocs/osdctl_cluster_diff.mddocs/osdctl_cluster_snapshot.mddocs/osdctl_evidence.mddocs/osdctl_evidence_collect.md
✅ Files skipped from review due to trivial changes (1)
- docs/osdctl_evidence_collect.md
🚧 Files skipped from review as they are similar to previous changes (1)
- cmd/cluster/cmd.go
772997f to
7346956
Compare
There was a problem hiding this comment.
Actionable comments posted: 9
♻️ Duplicate comments (7)
cmd/cluster/snapshot.go (3)
114-115:⚠️ Potential issue | 🟡 MinorHandle
MarkFlagRequirederrors instead of discarding them.These ignored returns will keep
errcheckunhappy and can hide a broken flag registration if the flag name drifts.Suggested fix
- _ = snapshotCmd.MarkFlagRequired("cluster-id") - _ = snapshotCmd.MarkFlagRequired("output") + cobra.CheckErr(snapshotCmd.MarkFlagRequired("cluster-id")) + cobra.CheckErr(snapshotCmd.MarkFlagRequired("output"))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/cluster/snapshot.go` around lines 114 - 115, The two calls to snapshotCmd.MarkFlagRequired("cluster-id") and snapshotCmd.MarkFlagRequired("output") currently discard returned errors; update the code to handle these errors instead of using blank identifier—capture the error from each MarkFlagRequired call and handle it (for example, return the error up the caller, or log/fatal via the command setup logger) so that failures to register required flags are surfaced and errcheck is satisfied.
64-69:⚠️ Potential issue | 🟠 MajorCapture more than
status.phasefor extra resources.
ResourceInfostill only storesStatus, socluster diffwill miss most spec/rollout changes for resources like Deployments and Services and mostly degrade into add/remove detection.Also applies to: 399-428
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/cluster/snapshot.go` around lines 64 - 69, ResourceInfo currently only stores Status which misses spec/rollout changes; extend ResourceInfo to capture metadata and spec-summary used for diffs (e.g., add fields like Labels map[string]string, Annotations map[string]string, Generation/ObservedGeneration (int64), ResourceVersion string, and a SpecHash or SpecSummary string/json) and update the code paths that build/populate ResourceInfo (the functions that create snapshot ResourceInfo instances) to fill these new fields from object.Metadata and object.Spec/Status; also apply the same changes to the other ResourceInfo occurrence referenced in the duplicate region so cluster diff can detect spec/rollout changes.
120-200:⚠️ Potential issue | 🔴 CriticalBind every
occapture to the requested cluster.
run()resolves the target cluster through OCM, but the capture paths still execute plainocagainst whatever kubecontext the caller already has selected. That can produce a snapshot whose metadata says cluster B while nodes/operators/resources were collected from cluster A.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/cluster/snapshot.go` around lines 120 - 200, The run() function resolves the target cluster but then calls captureNodes(), captureNamespaces(), captureClusterOperators(), and captureResources() which run plain oc against the caller's current kubecontext; bind every oc call to the resolved cluster by retrieving the cluster-specific kubeconfig/token (or API server URL and credentials) after GetClusterAnyStatus and passing it into the capture functions (or setting it on snapshotOptions) so those functions invoke oc/k8s client with the cluster-specific context; update the signatures of captureNodes, captureNamespaces, captureClusterOperators, captureResources (or have them read a new o.TargetKubeConfig / o.TargetContext set in run()) and ensure each function uses that kubeconfig/context when executing oc or creating clients.cmd/evidence/collect.go (3)
255-269:⚠️ Potential issue | 🟠 MajorThis path never actually collects CloudTrail evidence.
collectCloudTrailData()always returnsnil, nil, but once AWS auth succeeds the caller prints a success message. That makes the command look complete whileevidence.yamlcontains no CloudTrail section at all.Also applies to: 561-568
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/evidence/collect.go` around lines 255 - 269, The caller prints "CloudTrail access verified" even when collectCloudTrailData returns nil (since collectCloudTrailData currently returns nil, nil), so fix either collectCloudTrailData to actually return collected data or update the caller in the block that calls CreateAWSV2Config and collectCloudTrailData: after cloudTrailData, err := o.collectCloudTrailData(startTime) treat a nil cloudTrailData as an error (log a warning and do not set evidence.CloudTrailData or print the success line), and only assign to evidence.CloudTrailData and print the success message when cloudTrailData is non-nil and err == nil; reference collectCloudTrailData, evidence.CloudTrailData, and osdCloud.CreateAWSV2Config to locate the affected code.
327-365:⚠️ Potential issue | 🟠 MajorPartial collector failures still get serialized as successful evidence.
If one of the core sub-collectors fails, this returns a partially populated
ClusterStateand the run path treats it as success. The saved bundle then can't distinguish "section was empty" from "section failed to collect".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/evidence/collect.go` around lines 327 - 365, The collectClusterState function currently treats partial sub-collector failures as success; change its behavior so any failure in collectNodes, collectOperators, or collectMachineConfigs causes collectClusterState to return an error rather than a partially populated ClusterState. Specifically, after collecting into state.Nodes, state.Operators, and state.MachineConfigs and accumulating warnings in warnings, if len(warnings) > 0 return nil and an error (e.g. fmt.Errorf("partial collection failures: %v", warnings)); keep the existing all-empty check logic or combine it into this single check and retain the warning printing logic if desired. Ensure you reference the warnings slice and the collectNodes/collectOperators/collectMachineConfigs callers to locate the change.
196-199:⚠️ Potential issue | 🔴 CriticalBind evidence collection to the requested cluster before running
oc.You resolve the target cluster from OCM here, but the subsequent collectors and
oc adm must-gatherstill execute against the caller's current kubecontext. That can label the bundle with the rightClusterIDwhile collecting evidence from a different cluster.Also applies to: 243-301
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/evidence/collect.go` around lines 196 - 199, After resolving the target cluster with utils.GetClusterAnyStatus (cluster) and o.ClusterID, make sure to switch the active kube/oc context to that target cluster before invoking any collectors or running "oc adm must-gather" (and similarly in the later block around lines 243-301). Replace or augment the current executor/collector setup to use the target cluster's kubeconfig/REST config (via the existing helper or client factory in your codebase) so all subsequent collector functions and must-gather invocations run against the resolved cluster, and restore the original context if needed after collection.cmd/cluster/diff.go (1)
204-237:⚠️ Potential issue | 🟠 MajorNode diffs still miss condition-only and label-only changes.
NodeSnapshotcapturesConditionsandLabelsincmd/cluster/snapshot.go:44-51, but this comparator only checks status/version/roles. Any change that leaves those three fields intact stays invisible in the diff.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/cluster/diff.go` around lines 204 - 237, The compareNodes function currently only compares Status, Version and Roles and therefore misses changes to NodeSnapshot.Conditions and NodeSnapshot.Labels; update compareNodes to also detect and report differences in Conditions and Labels (use deep comparisons such as reflect.DeepEqual or a deterministic serialization rather than fmt.Sprintf) by adding checks that append human-readable change strings (e.g., "Conditions: ... -> ..." and "Labels: ... -> ...") to the existing changes slice, and ensure the NodeDiff entries for added/modified nodes include the Conditions and Labels in their Before/After descriptions so condition-only or label-only updates are surfaced; reference the compareNodes function, NodeSnapshot struct, and NodeDiff usage when making the change.
🧹 Nitpick comments (4)
cmd/aws/cloudtrail/errors.go (1)
154-158: Preserve the shared AWS config when switching regions.Creating the
us-east-1client from rawcloudtrail.Optionsdrops any retryer, endpoint resolver, API options, or other settings already loaded intocfg. Prefercloudtrail.NewFromConfig(cfg, func(o *cloudtrail.Options) { o.Region = defaultRegion })instead.Suggested refactor
- globalClient := cloudtrail.New(cloudtrail.Options{ - Region: defaultRegion, - Credentials: cfg.Credentials, - HTTPClient: cfg.HTTPClient, - }) + globalClient := cloudtrail.NewFromConfig(cfg, func(o *cloudtrail.Options) { + o.Region = defaultRegion + })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/aws/cloudtrail/errors.go` around lines 154 - 158, The code creates a new CloudTrail client with cloudtrail.New(cloudtrail.Options{...}) which discards shared settings on cfg; instead preserve cfg and only override the region by using cloudtrail.NewFromConfig(cfg, func(o *cloudtrail.Options) { o.Region = defaultRegion }), so update the creation of globalClient to call cloudtrail.NewFromConfig with cfg and the region override (referencing cfg, defaultRegion, cloudtrail.NewFromConfig, and globalClient) to retain retryer, endpoint resolver, API options, and other shared config.docs/osdctl_cluster_snapshot.md (1)
18-20: Add language tags to the fenced blocks.Using
bashfor examples andtextfor usage/options blocks will clear the markdownlint warnings and render these sections more cleanly.Also applies to: 24-33, 37-43, 47-57
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/osdctl_cluster_snapshot.md` around lines 18 - 20, The fenced code blocks in docs/osdctl_cluster_snapshot.md (e.g., the usage block showing "osdctl cluster snapshot [flags]" and the example/output sections) lack language tags; update each fenced block to include an appropriate language tag—use "text" for usage/options blocks (the block around "osdctl cluster snapshot [flags]") and "bash" for example command blocks—applying this change to the ranges noted (lines ~18, 24-33, 37-43, 47-57) so markdownlint warnings are resolved and rendering improves.docs/osdctl_evidence_collect.md (1)
18-20: Add language tags to the fenced blocks.Using
bashfor the examples andtextfor usage/options keeps this page lint-clean and improves the rendered formatting.Also applies to: 24-36, 40-49, 53-63
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/osdctl_evidence_collect.md` around lines 18 - 20, The fenced code blocks showing examples and usage should include language tags: change the command/example blocks like the one containing "osdctl evidence collect [flags]" to use ```bash and change the usage/options blocks to use ```text; apply the same change to the other fenced blocks referenced (the examples in the later examples/usage sections) so all command snippets are ```bash and all usage/options blocks are ```text to satisfy linting and improve rendered formatting.docs/osdctl_cluster_diff.md (1)
18-20: Add language tags to the fenced blocks.
bashfor the examples andtextfor usage/options keeps the generated docs lint-clean and improves formatting in rendered Markdown.Also applies to: 24-30, 34-37, 41-52
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/osdctl_cluster_diff.md` around lines 18 - 20, The fenced code blocks showing the command invocation (e.g. the line starting with "osdctl cluster diff <before.yaml> <after.yaml> [flags]") and the surrounding usage/options blocks need language tags; change the example blocks to start with ```bash and change usage/options blocks to start with ```text (apply the same change to the other fenced blocks noted in the comment ranges 24-30, 34-37, and 41-52) so the generated docs lint correctly and render with proper syntax highlighting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/aws/cloudtrail/errors.go`:
- Around line 86-95: The command currently only honors the local --json boolean
(opts.OutputJSON) but ignores the inherited -o/--output string flag; update the
errors handling to detect the global output flag and treat "json" the same as
--json. Concretely, in the errors command path (the RunE that calls opts.run
and/or inside opts.run), read the command's inherited output flag (e.g.,
cmd.Flags().GetString("output") or
cmd.Root().PersistentFlags().GetString("output")) and set/override
opts.OutputJSON = true when that value == "json" (or otherwise support
json+pretty variations), then proceed with the existing JSON rendering code so
osdctl aws cloudtrail errors -o json behaves identically to --json.
In `@cmd/aws/cmd.go`:
- Around line 12-17: Update the help text in the AWS command definition: change
occurrences of "AWS related" to the hyphenated form "AWS-related" in both the
Short and Long string literals (the fields defined in the AWS command in
cmd/aws/cmd.go) so the generated docs use the corrected grammar.
In `@cmd/cluster/diff.go`:
- Around line 375-383: The NamespaceDiff creation currently only renders Status
in Before/After even when labelsChanged is true; update the logic around
labelsChanged/ statusChanged in the block that builds the NamespaceDiff (the
code using NamespaceDiff, Name, ChangeType, Before, After, and the
beforeNs/afterNs variables) so that Before and After include labels when labels
differ (e.g., include "Status: <status>; Labels: <labels>" or at minimum append
the labels fmt for beforeNs.Labels and afterNs.Labels when labelsChanged is
true), ensuring label-only modifications produce meaningful Before/After text.
- Around line 571-578: warnCaptureErrors currently writes capture warnings to
stdout which corrupts JSON output; update warnCaptureErrors to write all warning
output to stderr instead by switching fmt.Printf and fmt.Println calls to write
to os.Stderr (e.g., fmt.Fprintln/os.Stderr or fmt.Fprintf(os.Stderr,...)), and
add the necessary import for the os package; ensure the function signature and
logic remain the same and only the output stream is changed.
In `@cmd/evidence/collect.go`:
- Around line 179-180: Replace the ignored return values from
collectCmd.MarkFlagRequired with cmdutil.CheckErr(...) to propagate any errors;
specifically, wrap the two calls to collectCmd.MarkFlagRequired("cluster-id")
and collectCmd.MarkFlagRequired("output") in cmdutil.CheckErr so failures are
handled consistently with the codebase (same pattern used in
cmd/cluster/loggingcheck.go).
In `@docs/osdctl_aws_cloudtrail_errors.md`:
- Around line 13-57: The fenced code blocks in the "Examples", the initial usage
block, and the "Options" / "Options inherited from parent commands" sections are
missing language tags; update the three types of blocks so they are tagged for
proper linting and rendering: mark the usage/flags blocks as "text" (the
single-line usage ```osdctl aws cloudtrail errors [flags]``` and the multi-line
Options blocks), and mark the Examples code fence as "bash" (the examples
showing osdctl commands). Ensure each triple-backtick fence is changed to
```text or ```bash accordingly so MD040 warnings are resolved.
In `@docs/osdctl_aws_cloudtrail.md`:
- Around line 12-35: The markdown code fences in the osdctl_aws_cloudtrail.md
content are missing language annotations and trigger MD040; update each
triple-backtick fence that wraps CLI output and option blocks to use the "text"
language (replace ``` with ```text) so the fenced blocks are annotated (the
three blocks around "osdctl aws cloudtrail [flags]", the help block containing
"-h, --help", and the inherited options block).
In `@docs/osdctl_aws.md`:
- Around line 13-36: Update the three fenced code blocks that show the command
usage ("osdctl aws [flags]") and the Options / Options inherited from parent
commands lists to use language tag "text" instead of plain backticks; replace
the opening fences ``` with ```text for the command block and both flag/option
blocks so markdownlint MD040 stops flagging them and the blocks render as plain
text.
In `@docs/osdctl_evidence.md`:
- Around line 13-36: Update the three fenced code blocks in
docs/osdctl_evidence.md to include a language specifier (use "text") after the
opening backticks so markdownlint MD040 is satisfied; specifically, change the
blocks containing "osdctl evidence [flags]", the short Options block starting
with "-h, --help", and the larger "Options inherited from parent commands" block
to start with ```text instead of ``` (if these blocks are generated, adjust the
template/generator that emits the command and options blocks so it always emits
```text).
---
Duplicate comments:
In `@cmd/cluster/diff.go`:
- Around line 204-237: The compareNodes function currently only compares Status,
Version and Roles and therefore misses changes to NodeSnapshot.Conditions and
NodeSnapshot.Labels; update compareNodes to also detect and report differences
in Conditions and Labels (use deep comparisons such as reflect.DeepEqual or a
deterministic serialization rather than fmt.Sprintf) by adding checks that
append human-readable change strings (e.g., "Conditions: ... -> ..." and
"Labels: ... -> ...") to the existing changes slice, and ensure the NodeDiff
entries for added/modified nodes include the Conditions and Labels in their
Before/After descriptions so condition-only or label-only updates are surfaced;
reference the compareNodes function, NodeSnapshot struct, and NodeDiff usage
when making the change.
In `@cmd/cluster/snapshot.go`:
- Around line 114-115: The two calls to
snapshotCmd.MarkFlagRequired("cluster-id") and
snapshotCmd.MarkFlagRequired("output") currently discard returned errors; update
the code to handle these errors instead of using blank identifier—capture the
error from each MarkFlagRequired call and handle it (for example, return the
error up the caller, or log/fatal via the command setup logger) so that failures
to register required flags are surfaced and errcheck is satisfied.
- Around line 64-69: ResourceInfo currently only stores Status which misses
spec/rollout changes; extend ResourceInfo to capture metadata and spec-summary
used for diffs (e.g., add fields like Labels map[string]string, Annotations
map[string]string, Generation/ObservedGeneration (int64), ResourceVersion
string, and a SpecHash or SpecSummary string/json) and update the code paths
that build/populate ResourceInfo (the functions that create snapshot
ResourceInfo instances) to fill these new fields from object.Metadata and
object.Spec/Status; also apply the same changes to the other ResourceInfo
occurrence referenced in the duplicate region so cluster diff can detect
spec/rollout changes.
- Around line 120-200: The run() function resolves the target cluster but then
calls captureNodes(), captureNamespaces(), captureClusterOperators(), and
captureResources() which run plain oc against the caller's current kubecontext;
bind every oc call to the resolved cluster by retrieving the cluster-specific
kubeconfig/token (or API server URL and credentials) after GetClusterAnyStatus
and passing it into the capture functions (or setting it on snapshotOptions) so
those functions invoke oc/k8s client with the cluster-specific context; update
the signatures of captureNodes, captureNamespaces, captureClusterOperators,
captureResources (or have them read a new o.TargetKubeConfig / o.TargetContext
set in run()) and ensure each function uses that kubeconfig/context when
executing oc or creating clients.
In `@cmd/evidence/collect.go`:
- Around line 255-269: The caller prints "CloudTrail access verified" even when
collectCloudTrailData returns nil (since collectCloudTrailData currently returns
nil, nil), so fix either collectCloudTrailData to actually return collected data
or update the caller in the block that calls CreateAWSV2Config and
collectCloudTrailData: after cloudTrailData, err :=
o.collectCloudTrailData(startTime) treat a nil cloudTrailData as an error (log a
warning and do not set evidence.CloudTrailData or print the success line), and
only assign to evidence.CloudTrailData and print the success message when
cloudTrailData is non-nil and err == nil; reference collectCloudTrailData,
evidence.CloudTrailData, and osdCloud.CreateAWSV2Config to locate the affected
code.
- Around line 327-365: The collectClusterState function currently treats partial
sub-collector failures as success; change its behavior so any failure in
collectNodes, collectOperators, or collectMachineConfigs causes
collectClusterState to return an error rather than a partially populated
ClusterState. Specifically, after collecting into state.Nodes, state.Operators,
and state.MachineConfigs and accumulating warnings in warnings, if len(warnings)
> 0 return nil and an error (e.g. fmt.Errorf("partial collection failures: %v",
warnings)); keep the existing all-empty check logic or combine it into this
single check and retain the warning printing logic if desired. Ensure you
reference the warnings slice and the
collectNodes/collectOperators/collectMachineConfigs callers to locate the
change.
- Around line 196-199: After resolving the target cluster with
utils.GetClusterAnyStatus (cluster) and o.ClusterID, make sure to switch the
active kube/oc context to that target cluster before invoking any collectors or
running "oc adm must-gather" (and similarly in the later block around lines
243-301). Replace or augment the current executor/collector setup to use the
target cluster's kubeconfig/REST config (via the existing helper or client
factory in your codebase) so all subsequent collector functions and must-gather
invocations run against the resolved cluster, and restore the original context
if needed after collection.
---
Nitpick comments:
In `@cmd/aws/cloudtrail/errors.go`:
- Around line 154-158: The code creates a new CloudTrail client with
cloudtrail.New(cloudtrail.Options{...}) which discards shared settings on cfg;
instead preserve cfg and only override the region by using
cloudtrail.NewFromConfig(cfg, func(o *cloudtrail.Options) { o.Region =
defaultRegion }), so update the creation of globalClient to call
cloudtrail.NewFromConfig with cfg and the region override (referencing cfg,
defaultRegion, cloudtrail.NewFromConfig, and globalClient) to retain retryer,
endpoint resolver, API options, and other shared config.
In `@docs/osdctl_cluster_diff.md`:
- Around line 18-20: The fenced code blocks showing the command invocation (e.g.
the line starting with "osdctl cluster diff <before.yaml> <after.yaml> [flags]")
and the surrounding usage/options blocks need language tags; change the example
blocks to start with ```bash and change usage/options blocks to start with
```text (apply the same change to the other fenced blocks noted in the comment
ranges 24-30, 34-37, and 41-52) so the generated docs lint correctly and render
with proper syntax highlighting.
In `@docs/osdctl_cluster_snapshot.md`:
- Around line 18-20: The fenced code blocks in docs/osdctl_cluster_snapshot.md
(e.g., the usage block showing "osdctl cluster snapshot [flags]" and the
example/output sections) lack language tags; update each fenced block to include
an appropriate language tag—use "text" for usage/options blocks (the block
around "osdctl cluster snapshot [flags]") and "bash" for example command
blocks—applying this change to the ranges noted (lines ~18, 24-33, 37-43, 47-57)
so markdownlint warnings are resolved and rendering improves.
In `@docs/osdctl_evidence_collect.md`:
- Around line 18-20: The fenced code blocks showing examples and usage should
include language tags: change the command/example blocks like the one containing
"osdctl evidence collect [flags]" to use ```bash and change the usage/options
blocks to use ```text; apply the same change to the other fenced blocks
referenced (the examples in the later examples/usage sections) so all command
snippets are ```bash and all usage/options blocks are ```text to satisfy linting
and improve rendered formatting.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cea1a2b2-d822-4469-828c-551323ad9151
📒 Files selected for processing (20)
README.mdcmd/aws/cloudtrail/cmd.gocmd/aws/cloudtrail/errors.gocmd/aws/cmd.gocmd/cluster/cmd.gocmd/cluster/diff.gocmd/cluster/snapshot.gocmd/cmd.gocmd/evidence/cmd.gocmd/evidence/collect.godocs/README.mddocs/osdctl.mddocs/osdctl_aws.mddocs/osdctl_aws_cloudtrail.mddocs/osdctl_aws_cloudtrail_errors.mddocs/osdctl_cluster.mddocs/osdctl_cluster_diff.mddocs/osdctl_cluster_snapshot.mddocs/osdctl_evidence.mddocs/osdctl_evidence_collect.md
🚧 Files skipped from review as they are similar to previous changes (3)
- cmd/cmd.go
- cmd/evidence/cmd.go
- README.md
9c6b3bb to
cd08339
Compare
|
@MitaliBhalla: This pull request references SREP-3620 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (10)
cmd/cluster/diff.go (1)
226-243:⚠️ Potential issue | 🟠 MajorNode condition and label changes are still invisible.
NodeSnapshotalso capturesConditionsandLabels, but this comparator only checksStatus,Version, andRoles. A node can change readiness/conditions or labels without ever being reported inosdctl cluster diff, which makes the snapshot comparison incomplete.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/cluster/diff.go` around lines 226 - 243, The comparator currently only compares Status, Version and Roles so changes to NodeSnapshot.Conditions and NodeSnapshot.Labels are missed; update the modified-node detection (the block that inspects beforeNode and afterNode and appends to diffs) to also compare Conditions and Labels (use a stable/deep comparison like reflect.DeepEqual for Conditions slice and compare Labels map keys/values or fmt.Sprintf on a sorted representation) and include those fields in the changes list and in the Before/After formatted strings for the NodeDiff so readiness/condition and label changes are reported for the given name/beforeNode/afterNode.docs/README.md (1)
1099-1185:⚠️ Potential issue | 🟡 MinorAdd language specifiers to these new fenced blocks.
These generated fences still omit a language tag, so markdownlint will keep flagging MD040 here. Use
textfor the usage/flags blocks, and ifdocs/README.mdis generated, fix the docgen template as well so regeneration stays lint-clean.Also applies to: 1573-1605, 2150-2184, 2980-3045
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/README.md` around lines 1099 - 1185, The fenced code blocks under the headings "osdctl aws", "osdctl aws cloudtrail", and "osdctl aws cloudtrail errors" (and the other similar blocks noted) are missing a language specifier which triggers MD040; update each triple-backtick fence that shows usage/flags output to use the language tag "text" (e.g., ```text) and, if README.md is generated, update the doc generation template that emits these blocks so future regenerations include ```text by default; search for the templates that produce the usage/flags blocks (look for the headings "osdctl aws", "osdctl aws cloudtrail", "osdctl aws cloudtrail errors") and add the language specifier there.cmd/aws/cloudtrail/errors.go (1)
354-359:⚠️ Potential issue | 🟡 MinorGuard the STS response before dereferencing.
In
aws-sdk-go-v2/service/stsv1.41.6 these fields are pointers. A short or emptyGetCallerIdentityresponse would panic here instead of surfacing a normal error.🛡️ Suggested guard
func whoami(stsClient *sts.Client) (accountArn string, accountId string, err error) { output, err := stsClient.GetCallerIdentity(context.TODO(), &sts.GetCallerIdentityInput{}) if err != nil { return "", "", err } + if output == nil || output.Arn == nil || output.Account == nil { + return "", "", fmt.Errorf("empty GetCallerIdentity response") + } return *output.Arn, *output.Account, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/aws/cloudtrail/errors.go` around lines 354 - 359, The whoami function dereferences output.Arn and output.Account without checking the STS response; modify whoami (which calls stsClient.GetCallerIdentity) to first verify output is non-nil and that output.Arn and output.Account are non-nil before returning them, and return a clear error if any of those fields are missing so we avoid a panic on short/empty responses.docs/osdctl_aws_cloudtrail_errors.md (1)
13-57:⚠️ Potential issue | 🟡 MinorAdd language tags to the generated code fences.
These blocks are still plain triple-backtick fences, so markdownlint MD040 remains red.
bashfits the Examples block; the usage/options blocks should betext. If these pages are generated, please fix the generator/template once so the same regression in the other new command docs disappears too.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/osdctl_aws_cloudtrail_errors.md` around lines 13 - 57, The code fences in the docs (the usage block "osdctl aws cloudtrail errors [flags]" and the Examples/Options blocks) lack language tags, triggering markdownlint MD040; update the generated markdown so the Examples fences use "bash" and the usage/options/inherited-options fences use "text" (or another appropriate language identifier) and update the generator/template that emits these pages so future command docs include the correct triple-backtick language tags (e.g., ensure the template that renders the Examples, Options, and inherited options blocks sets the language attribute).cmd/evidence/collect.go (3)
147-156:⚠️ Potential issue | 🟠 MajorThis command still doesn't collect any CloudTrail evidence.
collectCloudTrailData()is a stub that always returnsnil, nil, soevidence.yamlnever contains CloudTrail events even though the help text and runtime step say this command gathers them. Either wire the CloudTrail collector in here or rename this step/help to an access check so the UX matches the output.Also applies to: 263-277, 569-576
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/evidence/collect.go` around lines 147 - 156, The help text and runtime step claim CloudTrail events are collected but collectCloudTrailData() is currently a stub returning nil, so evidence.yaml never contains CloudTrail data; fix by wiring the actual CloudTrail collector into the collect command: replace the stub call to collectCloudTrailData() with the real collector invocation (or call the function that returns CloudTrail events) and ensure its returned events are merged into the evidence map written to evidence.yaml (same place where other collectors' outputs are aggregated), or alternatively update the Long help text and runtime step names to explicitly state this is only an access/permission check if you do not want to implement collection. Ensure you reference collectCloudTrailData(), the evidence aggregation logic that writes evidence.yaml, and any caller functions in cmd/evidence/collect.go so the CloudTrail output is included.
232-237:⚠️ Potential issue | 🔴 CriticalDon't collect from an ambient
occontext.The only guard here is a best-effort warning from
verifyClusterContext(), but every collector still shells out to plainoc ...afterward. That means-Cselects the cluster for metadata only; the actual evidence can come from whichever kubeconfig context is active, and the inherited--context/--kubeconfigflags never reach the subprocesses.Please centralize subprocess creation so every
oc/oc admcall is bound to the resolved target cluster before collection starts.Also applies to: 377-379, 434-436, 492-494, 522-524, 585-590, 693-703
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/evidence/collect.go` around lines 232 - 237, The current verifyClusterContext(...) only warns but individual collectors still invoke plain subprocesses with "oc" which can use the ambient kubeconfig; centralize subprocess creation by adding a single helper (e.g., runOcCmd or newOcExecutor) in cmd/evidence/collect.go that always injects the resolved cluster context/kubeconfig (cluster.ID(), resolved kubeconfig path or --context/--kubeconfig flags) or sets KUBECONFIG in the env for each process; replace direct subprocess invocations in the collectors referenced (the blocks around verifyClusterContext, and the other occurrences you noted) to call that helper so every "oc" / "oc adm" invocation is bound to the target cluster before collection starts.
250-260:⚠️ Potential issue | 🟠 MajorMake partial cluster-state failures visible.
If one sub-collector fails,
collectClusterState()still returnsnilerror andrun()prints a success line with partial counts. Incomplete evidence needs to stay distinguishable from a clean collection.Also applies to: 335-373
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/evidence/collect.go` around lines 250 - 260, collectClusterState() is hiding sub-collector failures so run() prints a success line even when the result is partial; change collectClusterState (and any helper sub-collector functions it calls) to return a non-nil error when any sub-collector fails (or return a sentinel PartialDataError) and propagate that error to the caller, then update the call site in run() (where o.collectClusterState() is invoked and evidence.ClusterState is set) to treat a non-nil error as a warning/partial result (log the error and still attach the partial clusterState) instead of printing a clean success; reference collectClusterState, o.collectClusterState(), evidence.ClusterState and the run() caller to find and update the code paths.docs/osdctl_cluster_diff.md (1)
18-52:⚠️ Potential issue | 🟡 MinorAdd language tags to the fenced blocks.
These anonymous fences will still trip MD040. Please mark the examples block as
bashand the synopsis/options blocks astextso the generated docs render cleanly and lint passes.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/osdctl_cluster_diff.md` around lines 18 - 52, Update the fenced code blocks for the osdctl cluster diff documentation to include language tags so the linter passes: change the synopsis and options blocks (the blocks showing "osdctl cluster diff <before.yaml> <after.yaml> [flags]" and the Options sections) to use ```text and mark the Examples block's shells (the example invocations like "osdctl cluster diff before.yaml after.yaml" and the --json flag example) as ```bash; ensure each existing anonymous fence around those snippets is replaced with the appropriately tagged fence so the "osdctl cluster diff" docs render correctly.cmd/cluster/snapshot.go (2)
65-71:⚠️ Potential issue | 🟠 Major
ResourceInfois too lossy for meaningful diffs.For many resource types called out by this command (
MachineConfig,Deployment,Service, etc.),status.phaseis empty or unrelated to the actual rollout/spec change being validated. In those cases the snapshot only supports add/remove detection and misses the modifications users are collecting evidence for. Please store a stable digest or richer per-object summary alongsideStatussocluster diffcan detect real changes.Also applies to: 400-429
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/cluster/snapshot.go` around lines 65 - 71, ResourceInfo is too lossy for diffs; update the ResourceInfo struct (the type named ResourceInfo and its fields Name, Namespace, Kind, Status) to include a stable per-object digest or richer summary field (e.g., Digest or Summary) that captures a hash of the meaningful spec+metadata (or a small canonical summary) so cluster diff can detect modifications beyond add/remove; ensure wherever ResourceInfo instances are created (the snapshot collection code that builds these structs) you compute and populate that digest/summary from the object's spec and stable metadata so downstream diff logic can compare the new Digest/Summary in addition to Status.
121-145:⚠️ Potential issue | 🔴 CriticalBind
occollection to the requested cluster.
run()resolves--cluster-idthrough OCM, but every collector still executes bareoccommands against the caller's current kubecontext. That can produce a snapshot whose metadata describes cluster B while nodes/namespaces/operators come from cluster A. Please establish the target kubeconfig/context up front and thread it through everyocinvocation, or fail fast if the active context does not match.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cmd/cluster/snapshot.go` around lines 121 - 145, The run() function resolves --cluster-id via OCM but never binds the kubectl/oc context, causing collectors to run against the caller's current kubeconfig; update run() (in snapshotOptions) to resolve the cluster's API URL/cluster credentials immediately after GetClusterAnyStatus and then set or validate the active kubecontext for that cluster (for example by invoking a utility like utils.SetKubeContextForCluster(cluster) or utils.LoadClusterKubeconfig(cluster) and exporting KUBECONFIG) and propagate that kubeconfig/context into every collector call (or fail fast if the active context does not match the target); also update collector function signatures to accept a kubeconfig/context parameter or make them read from the newly set environment variable so all oc invocations target the requested cluster.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/aws/cloudtrail/errors.go`:
- Around line 87-93: The RunE handler currently only treats "json" specially and
silently ignores other values; update the anonymous RunE function (the block
using cmd.Flags().GetString("output"), opts.OutputJSON and returning opts.run())
to explicitly validate the output flag: if output == "json" set opts.OutputJSON
= true, else if output == "" proceed as before, but if output is any other
non-empty value (e.g., "yaml", "env") return a descriptive error (e.g.,
fmt.Errorf("unsupported --output value: %s", output)) so automation gets a clear
failure instead of falling back to human formatting.
In `@cmd/cluster/snapshot.go`:
- Around line 226-229: Replace all uses of exec.CommandContext(...).Output()
with exec.CommandContext(...).CombinedOutput() and include the command output in
the returned error; specifically, for each exec.CommandContext call that assigns
output, err (e.g., the "oc get nodes" invocation and the other three
exec.CommandContext usages), change to CombinedOutput() and wrap failures with a
formatted error that includes strings.TrimSpace(string(output)) (e.g., return
nil, fmt.Errorf("oc ... failed: %s", strings.TrimSpace(string(output)))). Ensure
you update each occurrence that currently uses .Output() so stderr is preserved
in diagnostics.
---
Duplicate comments:
In `@cmd/aws/cloudtrail/errors.go`:
- Around line 354-359: The whoami function dereferences output.Arn and
output.Account without checking the STS response; modify whoami (which calls
stsClient.GetCallerIdentity) to first verify output is non-nil and that
output.Arn and output.Account are non-nil before returning them, and return a
clear error if any of those fields are missing so we avoid a panic on
short/empty responses.
In `@cmd/cluster/diff.go`:
- Around line 226-243: The comparator currently only compares Status, Version
and Roles so changes to NodeSnapshot.Conditions and NodeSnapshot.Labels are
missed; update the modified-node detection (the block that inspects beforeNode
and afterNode and appends to diffs) to also compare Conditions and Labels (use a
stable/deep comparison like reflect.DeepEqual for Conditions slice and compare
Labels map keys/values or fmt.Sprintf on a sorted representation) and include
those fields in the changes list and in the Before/After formatted strings for
the NodeDiff so readiness/condition and label changes are reported for the given
name/beforeNode/afterNode.
In `@cmd/cluster/snapshot.go`:
- Around line 65-71: ResourceInfo is too lossy for diffs; update the
ResourceInfo struct (the type named ResourceInfo and its fields Name, Namespace,
Kind, Status) to include a stable per-object digest or richer summary field
(e.g., Digest or Summary) that captures a hash of the meaningful spec+metadata
(or a small canonical summary) so cluster diff can detect modifications beyond
add/remove; ensure wherever ResourceInfo instances are created (the snapshot
collection code that builds these structs) you compute and populate that
digest/summary from the object's spec and stable metadata so downstream diff
logic can compare the new Digest/Summary in addition to Status.
- Around line 121-145: The run() function resolves --cluster-id via OCM but
never binds the kubectl/oc context, causing collectors to run against the
caller's current kubeconfig; update run() (in snapshotOptions) to resolve the
cluster's API URL/cluster credentials immediately after GetClusterAnyStatus and
then set or validate the active kubecontext for that cluster (for example by
invoking a utility like utils.SetKubeContextForCluster(cluster) or
utils.LoadClusterKubeconfig(cluster) and exporting KUBECONFIG) and propagate
that kubeconfig/context into every collector call (or fail fast if the active
context does not match the target); also update collector function signatures to
accept a kubeconfig/context parameter or make them read from the newly set
environment variable so all oc invocations target the requested cluster.
In `@cmd/evidence/collect.go`:
- Around line 147-156: The help text and runtime step claim CloudTrail events
are collected but collectCloudTrailData() is currently a stub returning nil, so
evidence.yaml never contains CloudTrail data; fix by wiring the actual
CloudTrail collector into the collect command: replace the stub call to
collectCloudTrailData() with the real collector invocation (or call the function
that returns CloudTrail events) and ensure its returned events are merged into
the evidence map written to evidence.yaml (same place where other collectors'
outputs are aggregated), or alternatively update the Long help text and runtime
step names to explicitly state this is only an access/permission check if you do
not want to implement collection. Ensure you reference collectCloudTrailData(),
the evidence aggregation logic that writes evidence.yaml, and any caller
functions in cmd/evidence/collect.go so the CloudTrail output is included.
- Around line 232-237: The current verifyClusterContext(...) only warns but
individual collectors still invoke plain subprocesses with "oc" which can use
the ambient kubeconfig; centralize subprocess creation by adding a single helper
(e.g., runOcCmd or newOcExecutor) in cmd/evidence/collect.go that always injects
the resolved cluster context/kubeconfig (cluster.ID(), resolved kubeconfig path
or --context/--kubeconfig flags) or sets KUBECONFIG in the env for each process;
replace direct subprocess invocations in the collectors referenced (the blocks
around verifyClusterContext, and the other occurrences you noted) to call that
helper so every "oc" / "oc adm" invocation is bound to the target cluster before
collection starts.
- Around line 250-260: collectClusterState() is hiding sub-collector failures so
run() prints a success line even when the result is partial; change
collectClusterState (and any helper sub-collector functions it calls) to return
a non-nil error when any sub-collector fails (or return a sentinel
PartialDataError) and propagate that error to the caller, then update the call
site in run() (where o.collectClusterState() is invoked and
evidence.ClusterState is set) to treat a non-nil error as a warning/partial
result (log the error and still attach the partial clusterState) instead of
printing a clean success; reference collectClusterState,
o.collectClusterState(), evidence.ClusterState and the run() caller to find and
update the code paths.
In `@docs/osdctl_aws_cloudtrail_errors.md`:
- Around line 13-57: The code fences in the docs (the usage block "osdctl aws
cloudtrail errors [flags]" and the Examples/Options blocks) lack language tags,
triggering markdownlint MD040; update the generated markdown so the Examples
fences use "bash" and the usage/options/inherited-options fences use "text" (or
another appropriate language identifier) and update the generator/template that
emits these pages so future command docs include the correct triple-backtick
language tags (e.g., ensure the template that renders the Examples, Options, and
inherited options blocks sets the language attribute).
In `@docs/osdctl_cluster_diff.md`:
- Around line 18-52: Update the fenced code blocks for the osdctl cluster diff
documentation to include language tags so the linter passes: change the synopsis
and options blocks (the blocks showing "osdctl cluster diff <before.yaml>
<after.yaml> [flags]" and the Options sections) to use ```text and mark the
Examples block's shells (the example invocations like "osdctl cluster diff
before.yaml after.yaml" and the --json flag example) as ```bash; ensure each
existing anonymous fence around those snippets is replaced with the
appropriately tagged fence so the "osdctl cluster diff" docs render correctly.
In `@docs/README.md`:
- Around line 1099-1185: The fenced code blocks under the headings "osdctl aws",
"osdctl aws cloudtrail", and "osdctl aws cloudtrail errors" (and the other
similar blocks noted) are missing a language specifier which triggers MD040;
update each triple-backtick fence that shows usage/flags output to use the
language tag "text" (e.g., ```text) and, if README.md is generated, update the
doc generation template that emits these blocks so future regenerations include
```text by default; search for the templates that produce the usage/flags blocks
(look for the headings "osdctl aws", "osdctl aws cloudtrail", "osdctl aws
cloudtrail errors") and add the language specifier there.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c58040ae-8f4b-48e8-b222-f67bbb3851fc
📒 Files selected for processing (20)
README.mdcmd/aws/cloudtrail/cmd.gocmd/aws/cloudtrail/errors.gocmd/aws/cmd.gocmd/cluster/cmd.gocmd/cluster/diff.gocmd/cluster/snapshot.gocmd/cmd.gocmd/evidence/cmd.gocmd/evidence/collect.godocs/README.mddocs/osdctl.mddocs/osdctl_aws.mddocs/osdctl_aws_cloudtrail.mddocs/osdctl_aws_cloudtrail_errors.mddocs/osdctl_cluster.mddocs/osdctl_cluster_diff.mddocs/osdctl_cluster_snapshot.mddocs/osdctl_evidence.mddocs/osdctl_evidence_collect.md
✅ Files skipped from review due to trivial changes (1)
- README.md
🚧 Files skipped from review as they are similar to previous changes (3)
- docs/osdctl.md
- docs/osdctl_aws_cloudtrail.md
- docs/osdctl_cluster.md
|
/retest |
1 similar comment
|
/retest |
cmd/aws/cloudtrail/cmd.go
Outdated
| // NewCmdCloudtrail returns the cloudtrail command group under aws | ||
| func NewCmdCloudtrail() *cobra.Command { | ||
| cloudtrailCmd := &cobra.Command{ | ||
| Use: "cloudtrail", |
There was a problem hiding this comment.
We already have osdctl cloudtrail commands
There was a problem hiding this comment.
write-events - fetches all CloudTrail write events with filtering
permission-denied-events - only matches Client.UnauthorizedOperation errors
For feature testing/IAM validation, we need to catch a broader set of permission errors:
- AccessDenied
- UnauthorizedOperation / Client.UnauthorizedOperation
- Forbidden
- InvalidClientTokenId
- AuthFailure
- ExpiredToken
- SignatureDoesNotMatch
Plus additional features like --error-types for custom filtering, --json output, and HCP cluster awareness.
Options:
Enhance permission-denied-events - add broader error patterns + new flags (keeps one command, but changes existing behavior)
Add new **osdctl cloudtrail errors subcommand - broader error detection under existing cloudtrail hierarchy (preserves existing behavior)
Which approach would you prefer?
- osdctl aws cloudtrail errors: surface AWS permission errors - osdctl cluster snapshot: capture cluster state - osdctl cluster diff: compare snapshots - osdctl evidence collect: all-in-one collection - Support ROSA Classic and HCP clusters - Add README documentation
|
@MitaliBhalla: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Adds new osdctl commands to help SRE teams collect evidence during feature validation testing (IAM policies, operators, etc.), reducing manual toil and speeding up feature validation.
New Commands
osdctl aws cloudtrail errors - Surface AWS permission errors from CloudTrail
osdctl cluster snapshot - Capture point-in-time cluster state (nodes, operators, namespaces)
osdctl cluster diff - Compare two snapshots to identify changes
osdctl evidence collect - All-in-one evidence collection
Usage Examples
Surface CloudTrail permission errors
osdctl aws cloudtrail errors -C <cluster-id> --since 1hCapture cluster state before/after testing
Collect all evidence at once
osdctl evidence collect -C <cluster-id> --output ./evidence/Features
Supports both ROSA Classic and ROSA HCP clusters
Detects HCP clusters and provides relevant guidance
JSON output option for programmatic use
AWS Console links for CloudTrail events
Testing
Tested on ROSA Classic cluster mbhalla-osdctl (v4.21.4):
Jira
https://issues.redhat.com/browse/SREP-3620
Summary by CodeRabbit
New Features
Documentation