Skip to content

docs: local preflight — storage account policy check investigation#7179

Closed
vhvb1989 wants to merge 13 commits intomainfrom
preflight/local-auth-policy-check
Closed

docs: local preflight — storage account policy check investigation#7179
vhvb1989 wants to merge 13 commits intomainfrom
preflight/local-auth-policy-check

Conversation

@vhvb1989
Copy link
Copy Markdown
Member

@vhvb1989 vhvb1989 commented Mar 18, 2026

Overview

Documents the investigation into detecting Azure Policy restrictions on storage
accounts before deployment. After evaluating multiple approaches, the check was
not implemented due to fundamental API limitations.

Changes

  • Move docs/design/local-preflight-validation.mddocs/local-preflight/README.md
  • Add "Investigated Checks" section to the README with a table of checks that
    were explored but not shipped
  • Add docs/local-preflight/storage-account-policy-check.md — detailed
    investigation writeup covering:
    • Client-side armpolicy parsing (sees management-group policies but cannot
      evaluate ARM expressions → false positives)
    • Server-side checkPolicyRestrictions API (evaluates all conditions but does
      not see management-group-inherited policies → misses real denials)
    • policyStates API (only evaluates existing resources, not hypothetical ones)

Why the check was not shipped

Enterprise deny policies for storage accounts almost always originate from
management groups. The two available approaches each fail in a different way:

Approach Sees MG policies Evaluates all conditions Result
Client-side armpolicy parsing ❌ (ARM expressions) False positives
Server-side checkPolicyRestrictions Misses real denials

See the investigation doc
for full details, including specific failure scenarios with example policies.

Relates to #7177

Copilot AI review requested due to automatic review settings March 18, 2026 00:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new local (client-side) preflight check in azd provision that warns when Azure Policy assignments on the target subscription are likely to deny deployments of resources that have local authentication enabled, helping users avoid late RequestDisallowedByPolicy failures.

Changes:

  • Introduces PolicyService (Azure Policy assignments/definitions enumeration + rule parsing) to detect deny policies related to disableLocalAuth / allowSharedKeyAccess.
  • Adds a Bicep preflight check that compares detected deny policies against bicep snapshot predicted resources and emits a warning when violations are found.
  • Registers the new service in the IoC container and adds the armpolicy SDK dependency + cspell overrides.

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
cli/azd/pkg/infra/provisioning/bicep/bicep_provider.go Registers and implements the new local preflight check that warns based on snapshot resources and detected deny policies.
cli/azd/pkg/azapi/policy_service.go New Azure Policy service to list assignments, fetch definitions (including initiatives), and parse policy rules for local-auth deny patterns.
cli/azd/pkg/azapi/policy_service_test.go Unit tests for effect resolution, rule parsing helpers, and resource property evaluation.
cli/azd/cmd/container.go Adds IoC registration for PolicyService.
cli/azd/go.mod Adds armpolicy dependency (currently marked indirect).
cli/azd/go.sum Adds checksums for the new armpolicy dependency.
cli/azd/.vscode/cspell.yaml Adds spelling overrides for policy-related identifiers/terms.

You can also share your feedback on Copilot code review. Take the survey.

Copy link
Copy Markdown
Contributor

@wbreza wbreza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review - PR #7179

What Looks Good

  • Clean integration with the existing preflight framework - follows the same PreflightCheckFn pattern as checkRoleAssignmentPermissions
  • Well-structured PolicyService with proper separation of concerns - ~90% of new code is in pkg/azapi/policy_service.go, reusable outside of Bicep
  • Excellent test coverage for policy rule parsing - 15+ test cases covering deny literals, parameterized effects, nested conditions, multi-type arrays, resource type inheritance, and edge cases
  • Definition caching prevents redundant API calls; management-group scope handling is a nice touch for enterprise subs
  • Modern Go patterns throughout - maps.Copy, slices.Sorted(maps.Keys(...)), t.Parallel(), strings.Cut
  • All 11 copilot review issues addressed thoroughly

Summary

Priority Count
Medium 3
Low 1
Total 4

Overall Assessment: Approve. Well-implemented feature with strong test coverage and clean integration. The main item worth considering is including policy names in the warning to make it more actionable.

Review performed with GitHub Copilot CLI

@vhvb1989 vhvb1989 force-pushed the preflight/local-auth-policy-check branch 4 times, most recently from c4fdb2c to feb6dfb Compare March 21, 2026 00:23
@spboyer
Copy link
Copy Markdown
Member

spboyer commented Mar 23, 2026

Telemetry data backing for expanding the preflight framework.

I ran a deep analysis on service.arm.deployment.failed from the ddazureclients Kusto cluster (March 2026, 23 days of data). The numbers:

  • 11,627 total ARM deployment failures in March, affecting 3,296 unique machines
  • 78% come from provision, 19% from up (which runs provision internally)
  • Users wait an average of 5.3 minutes (median 2.4 min, P90 12.2 min) before getting an ARM failure

The top ARM error code is Conflict at 67.2% (7,478 failures). The inner error codes break down as:

Inner Error Code Count Share
InvalidTemplateDeployment 2,252 25.4%
BadRequest 1,716 19.4%
ContainerAppOperationError 906 10.2%
RoleAssignmentExists 351 4.0%
InvalidResourceGroupLocation 350 3.9%
RequestConflict 278 3.1%
ResourceGroupNotFound 274 3.1%
DeploymentActive 199 2.2%
ProvisioningDisabled 177 2.0%

Your policy check addresses the ProvisioningDisabled bucket (177/mo). Two adjacent items that could fit the preflight framework:

  1. InvalidResourceGroupLocation (350/mo) -- preflight could validate that the target region supports all required resource types before deploying. The snapshot data already contains the resource types needed. This aligns with issue Enhance validation result/context: Suggestion field, parameters.json, location, docs links #7119 which mentions location validation.

  2. DeploymentActive (199/mo) -- preflight could check for in-progress deployments on the resource group via GET /deployments?$filter=provisioningState eq 'Running' and warn or wait. Users currently burn 5+ minutes before this fails.

Additionally, RoleAssignmentExists (351/mo) is a pure idempotency issue -- the role assignment already exists from a previous run. This should be treated as a non-fatal condition rather than a deployment failure. This supports the direction in issue #7115 ("never abort deployment on validation errors").

Data source: ddazureclients.kusto.windows.net / DevCli / RawEventsAppRequests, March 1-23, 2026.

@spboyer
Copy link
Copy Markdown
Member

spboyer commented Mar 23, 2026

Deep Dive: Policy Errors Beyond Storage Accounts

Follow-up to the previous telemetry comment. A deeper analysis of InvalidTemplateDeployment error chains (March 2026, 23 days) shows policy errors are a bigger opportunity than the storage-only scope of this PR:

RequestDisallowedByPolicy = 454 errors/month across 164 unique machines (9.6% of all InvalidTemplateDeployment errors).

The policy errors are spread across all service types, not just storage:

Service Host Policy Errors
containerapp 102
(unknown) 77
function 74
appservice 70
containerapp-dotnet (multi) ~50

The checkPolicyRestrictions API pattern in this PR is architecturally sound and works for any resource type — the only limitation is the current filter to Microsoft.Storage/storageAccounts.

Suggestion: Merge this PR as-is (storage-only is still valuable), then follow up with a PR that extends the resource type filter to iterate over all resource types found in the Bicep snapshot. This would cover the remaining ~350 non-storage policy errors per month.

Retry behavior context: 65% of machines that hit InvalidTemplateDeployment retry and fail again. For policy errors specifically, retrying can never work without a policy exemption or configuration change — better to catch it in seconds at preflight than waste 2.5 minutes on average per deployment attempt.

Copy link
Copy Markdown
Member

@jongio jongio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good structure overall — the parallel fetch pattern in local_preflight.go, the definition/set-definition caching, and the fail-gracefully design are solid choices. The PolicyService unit tests cover the key parsing scenarios well.

PR description mismatch: The description and files-changed table say the code uses armpolicyinsights.PolicyRestrictionsClient.CheckAtSubscriptionScope (server-side evaluation), but the implementation uses armpolicy for client-side policy rule parsing. Please update the description to match the actual implementation.

Existing unresolved feedback: @wbreza's notes about including policy names in the warning message and adding unit tests for checkStorageAccountPolicy integration logic remain open and worth addressing.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Documents the outcome of the local preflight investigation into detecting Azure Policy restrictions that would deny storage account deployments when local authentication (shared key access) is enabled, and records why this check was not implemented due to API limitations (management-group policy visibility vs. condition evaluation).

Changes:

  • Added a detailed investigation writeup for a storage account policy preflight check and why it was not shipped.
  • Updated the local preflight documentation to include an “Investigated Checks (Not Implemented)” section with a link to the new writeup.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
cli/azd/docs/local-preflight/storage-account-policy-check.md New investigation summary documenting multiple approaches (client-side parsing, server-side checkPolicyRestrictions, policyStates) and why none are viable for hypothetical-resource preflight.
cli/azd/docs/local-preflight/README.md Adds an “Investigated Checks (Not Implemented)” section and links to the storage account policy check investigation.

jongio
jongio previously requested changes Mar 31, 2026
Copy link
Copy Markdown
Member

@jongio jongio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs: local preflight - storage account policy check investigation by @vhvb1989

Summary

What: Moves the local preflight design doc to docs/local-preflight/README.md, adds a "not implemented" table, and adds a 219-line investigation doc explaining why a storage account policy check wasn't shipped.

Why: Documents three approaches to detecting Azure Policy restrictions on storage accounts before deployment - client-side parsing (false positives), server-side checkPolicyRestrictions (misses MG policies), and policyStates (existing resources only). Thorough investigation with clear reasoning for why the feature was abandoned.

Assessment: The investigation is well-structured and the conclusion is well-supported. However, the checkPolicyRestrictions testing used api-version 2022-03-01 - the 2024-10-01 version added a management group scope endpoint that could change the conclusion. The example policy JSON has a logic issue, and one Future Considerations bullet has an inaccuracy about /validate.

Findings

Category High Medium
API 1 0
Logic 0 1
Documentation 0 1
Total 1 2

Key Findings

  1. [HIGH] API: checkPolicyRestrictions was tested against 2022-03-01 only - the 2024-10-01 API version adds a management group scope endpoint
  2. [MEDIUM] Logic: Example policy JSON conditions are self-contradictory
  3. [MEDIUM] Documentation: /validate doesn't evaluate Azure Policy deny effects

What's Done Well

  • Thorough investigation with three distinct approaches, each with clear evidence for why it fails
  • Excellent use of concrete examples (real policy patterns, specific API responses)
  • Good separation into a dedicated docs/local-preflight/ directory
  • The file rename preserves all existing relative links correctly

3 inline comments below.

Copy link
Copy Markdown
Contributor

@wbreza wbreza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-Review — PR #7179 (scope changed from code to docs since Mar 18 approval)

docs: local preflight — storage account policy check investigation by @vhvb1989

Summary

What: Documents an investigation into detecting Azure Policy restrictions on storage accounts before deployment. After evaluating three approaches (client-side parsing, server-side checkPolicyRestrictions, and policyStates), the check was not shipped due to API limitations.

Why: Records institutional knowledge so future engineers don't repeat the same investigation.

Assessment: Well-structured investigation with a sound core argument. Three technical accuracy issues should be addressed before merge — all previously identified by @jongio. My previous approval (Mar 18) was on the code implementation, which has been entirely removed. Re-reviewing against the current documentation-only changeset.

Prior Findings Verification

All 3 of @jongio's CHANGES_REQUESTED findings verified and agreed:

# Priority Finding Status
1 High checkPolicyRestrictions tested against 2022-03-01 only — 2024-10-01 adds MG scope endpoint ⌛ Unresolved
2 Medium Example policy JSON conditions are self-contradictory ⌛ Unresolved
3 Medium /validate doesn't evaluate Azure Policy deny effects ⌛ Unresolved

New Finding

# Priority Finding
4 Low Conclusion table — policyStates row shows ✅✅❌ which could mislead at a glance without a footnote

Details on finding 4: The policyStates row shows green checks for "Sees MG policies" and "Evaluates all conditions" alongside a red X for "Suitable". Without reading the text, a reader scanning the table might wonder why something with two green checks was rejected. Consider adding a fourth column or a footnote: "Existing resources only — cannot evaluate hypothetical resources from Bicep snapshot."

What's Done Well

  • Excellent investigation structure — clear goal, three distinct approaches with evidence, well-reasoned conclusion
  • Good directory reorganization (docs/design/docs/local-preflight/)
  • "Investigated Checks (Not Implemented)" table pattern is great for institutional knowledge — should be reused
  • Core argument is sound despite the example JSON issue
  • The false positive analysis with concrete subscription examples (contoso-dev vs contoso-prod) makes the trade-off tangible

Review performed with GitHub Copilot CLI

vhvb1989 and others added 11 commits April 2, 2026 17:00
Add a local preflight check that detects Azure Policy assignments which
deny resources with local authentication enabled (disableLocalAuth != true).

The check:
- Lists policy assignments on the target subscription via armpolicy SDK
- Fetches policy definitions and inspects policyRule for disableLocalAuth
  field conditions with deny effect
- Handles parameterized effects, policy sets (initiatives), and nested
  allOf/anyOf conditions
- Cross-references affected resource types against the Bicep snapshot
- Reports a warning (not error) when template resources would be blocked

Also handles storage accounts which use allowSharedKeyAccess (inverted
logic) instead of disableLocalAuth.

Fixes #7177

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix extractParameterReference whitespace bug: use trimmed string for
  both prefix check and inner extraction
- Remove localAuthEnabled from isLocalAuthField to avoid false positives
  (ResourceHasLocalAuthDisabled doesn't handle it)
- Handle multiple resource types in policy 'in' array (was only taking
  first entry)
- Thread resolved resourceType through recursive findInCompoundCondition
  calls so nested conditions inherit parent's resource type
- Use []LocalAuthDenyPolicy per resource type to avoid overwriting when
  multiple deny policies target the same type
- Add in-memory cache for policy definition/set definition fetches to
  avoid duplicate API calls across assignments
- Improve warning message to mention both disableLocalAuth and
  allowSharedKeyAccess for storage accounts
- Add tests for whitespace parameter extraction, multi-type 'in' arrays,
  and nested condition resource type inheritance
- Run go mod tidy to move armpolicy to direct dependency

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds log.Printf statements at key decision points so the flow can be
traced with --debug to diagnose why the check might not fire:
- Number of policy assignments found
- Policy rule parse failures
- Number of deny policies found
- Per-resource type matching and localAuthDisabled status
- Effect type mismatches

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The subscription-level API lists inherited policy assignments from
management groups, but the definition IDs reference management group
scope (e.g. /providers/Microsoft.Management/managementGroups/{mgId}/...).

Previously, getPolicyDefinitionByID and getPolicySetDefinitionByID only
handled built-in and subscription-scoped definitions, silently failing
to fetch management-group-scoped ones. This caused the check to miss
policies like 'Storage Accounts - Safe Secrets Standard' which are
typically assigned at the management group level.

Fix: detect management group scope from the definition ID and use
GetAtManagementGroup SDK method. Also add diagnostic logging throughout
the check pipeline to aid debugging.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Remove subscription ID (redundant, already shown in context)
- Remove unresolved ARM expression resource names (not helpful)
- Aggregate by resource type instead of per-resource
- Deduplicate policy names
- Shorter, cleaner message format

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the custom client-side policy rule parsing engine with the
server-side checkPolicyRestrictions API (Microsoft.PolicyInsights).
The check is now scoped to storage accounts and calls the API in
parallel for each account.

Key changes:
- PolicyService now uses armpolicyinsights.PolicyRestrictionsClient
  instead of manually fetching/parsing policy assignments & definitions
- checkLocalAuthPolicy renamed to checkStorageAccountPolicy, filters
  snapshot resources to Microsoft.Storage/storageAccounts only
- Parallel API calls per storage account via sync.WaitGroup.Go
- Warning messages now include policy reasons from the server response
- Replaced armpolicy SDK dependency with armpolicyinsights
- Rewrote tests with mock HTTP transport for the new API

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… improve policy loop

- Revert .vscode/launch.json: restore parameterized ${input:cliArgs} and
  remove developer-specific cwd path
- Remove cli/azd/docs/design/extension-flag-handling.md: unrelated to this PR
- Add context cancellation check at top of policy assignment iteration loop
- Remove now-unused isCustomPolicyDefinition function

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the storage account local auth policy check due to false positives
that cannot be resolved with current Azure APIs:

- Client-side armpolicy parsing sees management-group policies but cannot
  evaluate ARM expressions in policy conditions (subscription tags, region
  opt-in gates), causing false warnings on subscriptions where the policy
  does not actually apply.
- Server-side checkPolicyRestrictions API correctly evaluates all conditions
  but does not see management-group-inherited policies, missing real denials.

Move local preflight docs to cli/azd/docs/local-preflight/ and add a
detailed investigation writeup documenting both approaches, the specific
failure modes, and why neither approach is viable today.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Fix self-contradictory example policy: change contains() condition from
  equals "false" to equals "true" so opt-in logic is consistent
- Add API version annotation (2022-03-01) to checkPolicyRestrictions example
- Add Limitation column to conclusion table for clarity (policyStates row)
- Fix /validate claim: ARM /validate and /whatIf don't evaluate deny effects
- Add 2024-10-01 MG-scope endpoint to Future Considerations as most
  promising next step

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@vhvb1989 vhvb1989 force-pushed the preflight/local-auth-policy-check branch from f553db2 to de81ee9 Compare April 2, 2026 17:02
Tested checkPolicyRestrictions with api-version=2024-10-01 against a
subscription with a tenant-root-MG deny policy (Storage Accounts - Safe
Secrets Standard). Findings:

- Subscription-scope 2024-10-01: still does not see MG-inherited policies
- MG-scope endpoint: only supports 'type' pending field, rejects
  resourceDetails — cannot evaluate property-level restrictions
- includeAuditEffect parameter: no change in results

Updated Approach 2 section with detailed 2024-10-01 follow-up, added
per-endpoint comparison table, and revised conclusion table and Future
Considerations to reflect that the newer API does not resolve the
limitation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@vhvb1989 vhvb1989 force-pushed the preflight/local-auth-policy-check branch from da1d2cc to 0618e1b Compare April 2, 2026 17:36
@vhvb1989 vhvb1989 requested a review from jongio April 3, 2026 23:46
This PR should only contain documentation changes (investigation doc and
README update). Restore local_preflight.go to match main — the
DiagnosticID, PreflightCheck struct, and AddCheck signature belong to a
separate code change.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Member

@jongio jongio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid investigation writeup. Two minor wording issues:

  • storage-account-policy-check.md:229,247 - Untested for MG scope is misleading since testing was done (returned empty)
  • storage-account-policy-check.md:4 - date says March 2026 but April follow-up section exists

Not blocking.

@microsoft-github-policy-service microsoft-github-policy-service bot added the no-recent-activity identity issues with no activity label Apr 14, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 14, 2026

🔗 Linked Issue Required

Thanks for the contribution! Please link a GitHub issue to this PR by adding Fixes #123 to the description or using the sidebar.
No issue yet? Feel free to create one!

@microsoft-github-policy-service microsoft-github-policy-service bot removed the no-recent-activity identity issues with no activity label Apr 14, 2026
@vhvb1989 vhvb1989 closed this Apr 14, 2026
auto-merge was automatically disabled April 14, 2026 01:23

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants