|
| 1 | +# Resolution Trace Data Collection Guide |
| 2 | + |
| 3 | +All instrumented log lines use the prefix `RESOLUTION_TRACE` with key=value pairs. |
| 4 | +Every line includes `phase=` and `event=BEGIN|DONE`. DONE lines include `duration_ms=`. |
| 5 | + |
| 6 | +## Collecting Logs |
| 7 | + |
| 8 | +```bash |
| 9 | +# From an OLM pod on an OpenShift cluster |
| 10 | +oc logs deploy/olm-operator -n openshift-operator-lifecycle-manager > olm.log |
| 11 | + |
| 12 | +# Extract only trace lines |
| 13 | +grep RESOLUTION_TRACE olm.log > traces.log |
| 14 | +``` |
| 15 | + |
| 16 | +## Phase Reference |
| 17 | + |
| 18 | +### Orchestration (step_resolver.go) |
| 19 | + |
| 20 | +| Phase | Type | Key Fields | |
| 21 | +|---|---|---| |
| 22 | +| `resolve_steps` | wall clock | `namespace`, `subscription_count` | |
| 23 | +| `list_subscriptions` | I/O (API) | `namespace`, `subscription_count` | |
| 24 | +| `list_operator_groups` | I/O (API) | `namespace`, `count` | |
| 25 | +| `core_resolve` | hybrid | `namespace`, `subscription_count`, `namespace_count`, `operator_count` | |
| 26 | +| `generate_steps` | compute | `namespace`, `operator_count`, `step_count`, `bundle_lookup_count` | |
| 27 | + |
| 28 | +### Core Resolution (resolver.go) |
| 29 | + |
| 30 | +| Phase | Type | Key Fields | |
| 31 | +|---|---|---| |
| 32 | +| `cache_namespaced` | I/O coord | `namespace`, `namespace_count` | |
| 33 | +| `existing_variables` | compute | `namespace`, `entry_count`, `variable_count` | |
| 34 | +| `subscription_variables` | compute | `namespace`, `subscription_count`, `total_variables` | |
| 35 | +| `single_sub_variables` | compute | `namespace`, `subscription`, `package`, `channel`, `catalog`, `has_current`, `variable_count` | |
| 36 | +| `add_invariants` | compute | `namespace`, `variable_count_before`, `variable_count_after`, `invariants_added` | |
| 37 | +| `cache_error_check` | I/O wait | `namespace` | |
| 38 | +| `sat_solve` | compute | `namespace`, `variable_count`, `subscription_count`, `solved_count` | |
| 39 | +| `extract_results` | compute | `namespace`, `operator_count` | |
| 40 | + |
| 41 | +### Per-Subscription Detail (resolver.go) |
| 42 | + |
| 43 | +| Phase | Type | Key Fields | |
| 44 | +|---|---|---| |
| 45 | +| `cache_find_entries` | compute | `catalog`, `entry_count` | |
| 46 | +| `sort_channel` | compute | `entry_count`, `sorted_count` | |
| 47 | +| `build_candidates` | compute | `sorted_bundle_count`, `candidate_count` | |
| 48 | +| `bundle_variables_dfs` | compute | `initial_stack_size`, `visited_count`, `variables_count`, `bundles_processed`, `dependency_predicate_count`, `total_dep_bundles` | |
| 49 | + |
| 50 | +### Cache (cache.go) |
| 51 | + |
| 52 | +| Phase | Type | Key Fields | |
| 53 | +|---|---|---| |
| 54 | +| `cache_populate` | I/O coord | `source_count`, `cache_hit_count`, `cache_miss_count` | |
| 55 | + |
| 56 | +### Catalog Snapshots (source_registry.go, source_csvs.go) |
| 57 | + |
| 58 | +| Phase | Type | Key Fields | |
| 59 | +|---|---|---| |
| 60 | +| `catalog_snapshot` | I/O (gRPC) | `catalog`, `bundle_count`, `package_count` | |
| 61 | +| `list_bundles` | I/O (gRPC) | `catalog`, `bundle_count` | |
| 62 | +| `get_packages` | I/O (gRPC) | `catalog`, `package_count`, `call_count`, `duration_ms` | |
| 63 | +| `csv_snapshot` | I/O (API) | `namespace`, `csv_count`, `subscription_count`, `entry_count` | |
| 64 | + |
| 65 | +### SAT Solver (solver/solve.go) |
| 66 | + |
| 67 | +| Phase | Type | Key Fields | |
| 68 | +|---|---|---| |
| 69 | +| `sat_add_constraints` | compute | `variable_count`, `constraint_count`, `literal_count` | |
| 70 | +| `sat_search` | compute | `anchor_count`, `assumption_count`, `final_assumption_count`, `outcome` | |
| 71 | +| `sat_minimize` | compute | `extras_count`, `excluded_count`, `cardinality_bound`, `result_count` | |
| 72 | + |
| 73 | +--- |
| 74 | + |
| 75 | +## Experiment 1: Resolution Latency vs Subscription Count |
| 76 | + |
| 77 | +**Goal:** How does total resolution time scale with the number of Subscriptions in a namespace? |
| 78 | + |
| 79 | +### Collect total resolution time per namespace |
| 80 | + |
| 81 | +```bash |
| 82 | +grep "phase=resolve_steps event=DONE" traces.log \ |
| 83 | + | grep -oP 'namespace=\S+ subscription_count=\d+ .*?duration_ms=\d+' \ |
| 84 | + | sort |
| 85 | +``` |
| 86 | + |
| 87 | +### Collect core_resolve time (excludes API listing overhead) |
| 88 | + |
| 89 | +```bash |
| 90 | +grep "phase=core_resolve event=DONE" traces.log \ |
| 91 | + | grep -oP 'namespace=\S+ subscription_count=\d+ .*?duration_ms=\d+' |
| 92 | +``` |
| 93 | + |
| 94 | +### Per-subscription breakdown within a namespace |
| 95 | + |
| 96 | +```bash |
| 97 | +NS="my-namespace" |
| 98 | +grep "phase=single_sub_variables event=DONE" traces.log \ |
| 99 | + | grep "namespace=$NS" \ |
| 100 | + | grep -oP 'subscription=\S+ package=\S+ catalog=\S+ variable_count=\d+ .*?duration_ms=\d+' |
| 101 | +``` |
| 102 | + |
| 103 | +### Subscription count vs SAT solve time |
| 104 | + |
| 105 | +```bash |
| 106 | +grep "phase=sat_solve event=DONE" traces.log \ |
| 107 | + | grep -oP 'namespace=\S+ variable_count=\d+ subscription_count=\d+ .*?duration_ms=\d+' |
| 108 | +``` |
| 109 | + |
| 110 | +--- |
| 111 | + |
| 112 | +## Experiment 2: Resolution Latency vs Catalog Size |
| 113 | + |
| 114 | +**Goal:** How does resolution time change based on which catalog is used and how many bundles it contains? |
| 115 | + |
| 116 | +### Catalog snapshot fetch times (gRPC round-trip cost) |
| 117 | + |
| 118 | +```bash |
| 119 | +grep "phase=catalog_snapshot event=DONE" traces.log \ |
| 120 | + | grep -oP 'catalog=\S+ bundle_count=\d+ package_count=\d+ .*?duration_ms=\d+' |
| 121 | +``` |
| 122 | + |
| 123 | +### gRPC ListBundles time per catalog |
| 124 | + |
| 125 | +```bash |
| 126 | +grep "phase=list_bundles event=DONE" traces.log \ |
| 127 | + | grep -oP 'catalog=\S+ bundle_count=\d+ .*?duration_ms=\d+' |
| 128 | +``` |
| 129 | + |
| 130 | +### gRPC GetPackage aggregate time per catalog |
| 131 | + |
| 132 | +```bash |
| 133 | +grep "phase=get_packages event=DONE" traces.log \ |
| 134 | + | grep -oP 'catalog=\S+ package_count=\d+ call_count=\d+ duration_ms=\d+' |
| 135 | +``` |
| 136 | + |
| 137 | +### Cache hit/miss rate (are snapshots being reused?) |
| 138 | + |
| 139 | +```bash |
| 140 | +grep "phase=cache_populate event=DONE" traces.log \ |
| 141 | + | grep -oP 'source_count=\d+ cache_hit_count=\d+ cache_miss_count=\d+ .*?duration_ms=\d+' |
| 142 | +``` |
| 143 | + |
| 144 | +### Per-subscription: which catalog and how many entries matched |
| 145 | + |
| 146 | +```bash |
| 147 | +grep "phase=cache_find_entries event=DONE" traces.log \ |
| 148 | + | grep -oP 'catalog=\S+ entry_count=\d+ .*?duration_ms=\d+' |
| 149 | +``` |
| 150 | + |
| 151 | +--- |
| 152 | + |
| 153 | +## Experiment 3: Dependency Graph Complexity |
| 154 | + |
| 155 | +**Goal:** How does the DFS dependency resolution scale with bundles and dependencies? |
| 156 | + |
| 157 | +### DFS work per resolution |
| 158 | + |
| 159 | +```bash |
| 160 | +grep "phase=bundle_variables_dfs event=DONE" traces.log \ |
| 161 | + | grep -oP 'bundles_processed=\d+ dependency_predicate_count=\d+ total_dep_bundles=\d+ visited_count=\d+ variables_count=\d+ .*?duration_ms=\d+' |
| 162 | +``` |
| 163 | + |
| 164 | +### Invariant growth (GVK/package conflict constraints) |
| 165 | + |
| 166 | +```bash |
| 167 | +grep "phase=add_invariants event=DONE" traces.log \ |
| 168 | + | grep -oP 'namespace=\S+ variable_count_before=\d+ variable_count_after=\d+ invariants_added=\d+ .*?duration_ms=\d+' |
| 169 | +``` |
| 170 | + |
| 171 | +--- |
| 172 | + |
| 173 | +## Experiment 4: SAT Solver Internals |
| 174 | + |
| 175 | +**Goal:** How does the SAT problem size affect solve time? |
| 176 | + |
| 177 | +### Problem size (variables, constraints, literals) |
| 178 | + |
| 179 | +```bash |
| 180 | +grep "phase=sat_add_constraints event=DONE" traces.log \ |
| 181 | + | grep -oP 'variable_count=\d+ constraint_count=\d+ literal_count=\d+ .*?duration_ms=\d+' |
| 182 | +``` |
| 183 | + |
| 184 | +### Search complexity (anchor count, backtracking) |
| 185 | + |
| 186 | +```bash |
| 187 | +grep "phase=sat_search event=DONE" traces.log \ |
| 188 | + | grep -oP 'outcome=\d+ anchor_count=\d+ final_assumption_count=\d+ .*?duration_ms=\d+' |
| 189 | +``` |
| 190 | + |
| 191 | +### Cardinality minimization cost |
| 192 | + |
| 193 | +```bash |
| 194 | +grep "phase=sat_minimize event=DONE" traces.log \ |
| 195 | + | grep -oP 'extras_count=\d+ excluded_count=\d+ cardinality_bound=\d+ result_count=\d+ .*?duration_ms=\d+' |
| 196 | +``` |
| 197 | + |
| 198 | +--- |
| 199 | + |
| 200 | +## Experiment 5: Installed Operators (Virtual Catalog) |
| 201 | + |
| 202 | +**Goal:** How does the number of already-installed CSVs affect resolution? |
| 203 | + |
| 204 | +### CSV snapshot size and cost |
| 205 | + |
| 206 | +```bash |
| 207 | +grep "phase=csv_snapshot event=DONE" traces.log \ |
| 208 | + | grep -oP 'namespace=\S+ csv_count=\d+ subscription_count=\d+ entry_count=\d+ .*?duration_ms=\d+' |
| 209 | +``` |
| 210 | + |
| 211 | +### Existing variables processing time |
| 212 | + |
| 213 | +```bash |
| 214 | +grep "phase=existing_variables event=DONE" traces.log \ |
| 215 | + | grep -oP 'namespace=\S+ entry_count=\d+ variable_count=\d+ .*?duration_ms=\d+' |
| 216 | +``` |
| 217 | + |
| 218 | +--- |
| 219 | + |
| 220 | +## Filtering by Namespace |
| 221 | + |
| 222 | +Every query above can be scoped to a single namespace: |
| 223 | + |
| 224 | +```bash |
| 225 | +NS="openshift-operators" |
| 226 | +grep "RESOLUTION_TRACE" traces.log | grep "namespace=$NS" |
| 227 | +``` |
| 228 | + |
| 229 | +For phases that use `catalog=` instead of `namespace=` (snapshot phases), the catalog |
| 230 | +field is formatted as `namespace/name`, so filter with: |
| 231 | + |
| 232 | +```bash |
| 233 | +grep "catalog=$NS/" traces.log |
| 234 | +``` |
| 235 | + |
| 236 | +--- |
| 237 | + |
| 238 | +## Correlation: Linking Phases to the Same Resolution Cycle |
| 239 | + |
| 240 | +The `resolve_steps` phase (in step_resolver.go) injects a `resolution_id=` field |
| 241 | +into the logrus logger. All phases emitted from **step_resolver.go** carry this |
| 242 | +field automatically. Phases from deeper layers (resolver.go, solver) do not carry |
| 243 | +`resolution_id` in the log message itself, but they appear between the `resolve_steps` |
| 244 | +BEGIN and DONE timestamps and share the same `namespace=` value. |
| 245 | + |
| 246 | +### Get all phases for a specific resolution cycle by resolution_id |
| 247 | + |
| 248 | +```bash |
| 249 | +ID="a1b2c3d4" |
| 250 | +grep "resolution_id=$ID" olm.log | grep RESOLUTION_TRACE |
| 251 | +``` |
| 252 | + |
| 253 | +### Correlate by timestamp window (for phases without resolution_id) |
| 254 | + |
| 255 | +```bash |
| 256 | +# 1. Find the resolve_steps BEGIN/DONE timestamps for a namespace |
| 257 | +grep "phase=resolve_steps" traces.log | grep "namespace=$NS" |
| 258 | + |
| 259 | +# 2. Use awk to filter traces within that time window |
| 260 | +# (all traces between a BEGIN and its matching DONE belong to one cycle) |
| 261 | +``` |
| 262 | + |
| 263 | +--- |
| 264 | + |
| 265 | +## Quick One-Liner: Full Pipeline Summary for a Namespace |
| 266 | + |
| 267 | +```bash |
| 268 | +NS="openshift-operators" |
| 269 | +echo "=== Resolution Timeline ===" |
| 270 | +grep "RESOLUTION_TRACE.*namespace=$NS.*event=DONE" traces.log \ |
| 271 | + | grep -oP 'phase=\S+ .*?duration_ms=\d+' \ |
| 272 | + | column -t |
| 273 | + |
| 274 | +echo "" |
| 275 | +echo "=== Catalog Snapshots ===" |
| 276 | +grep "phase=catalog_snapshot event=DONE" traces.log \ |
| 277 | + | grep -oP 'catalog=\S+ bundle_count=\d+ package_count=\d+ .*?duration_ms=\d+' \ |
| 278 | + | column -t |
| 279 | + |
| 280 | +echo "" |
| 281 | +echo "=== SAT Solver Detail ===" |
| 282 | +grep "phase=sat_" traces.log \ |
| 283 | + | grep "event=DONE" \ |
| 284 | + | grep -oP 'phase=\S+ .*' \ |
| 285 | + | column -t |
| 286 | +``` |
| 287 | + |
| 288 | +--- |
| 289 | + |
| 290 | +## Generating a CSV for Analysis |
| 291 | + |
| 292 | +```bash |
| 293 | +# Example: subscription count vs resolution time |
| 294 | +echo "namespace,subscription_count,duration_ms" > resolution_by_subs.csv |
| 295 | +grep "phase=resolve_steps event=DONE" traces.log \ |
| 296 | + | sed -E 's/.*namespace=(\S+).*subscription_count=([0-9]+).*duration_ms=([0-9]+).*/\1,\2,\3/' \ |
| 297 | + >> resolution_by_subs.csv |
| 298 | + |
| 299 | +# Example: catalog snapshot size vs fetch time |
| 300 | +echo "catalog,bundle_count,package_count,duration_ms" > catalog_snapshots.csv |
| 301 | +grep "phase=catalog_snapshot event=DONE" traces.log \ |
| 302 | + | sed -E 's/.*catalog=(\S+).*bundle_count=([0-9]+).*package_count=([0-9]+).*duration_ms=([0-9]+).*/\1,\2,\3,\4/' \ |
| 303 | + >> catalog_snapshots.csv |
| 304 | + |
| 305 | +# Example: SAT problem size vs solve time |
| 306 | +echo "namespace,variable_count,subscription_count,solved_count,duration_ms" > sat_solve.csv |
| 307 | +grep "phase=sat_solve event=DONE" traces.log \ |
| 308 | + | grep -v "error=true" \ |
| 309 | + | sed -E 's/.*namespace=(\S+).*variable_count=([0-9]+).*subscription_count=([0-9]+).*solved_count=([0-9]+).*duration_ms=([0-9]+).*/\1,\2,\3,\4,\5/' \ |
| 310 | + >> sat_solve.csv |
| 311 | +``` |
0 commit comments