Skip to content

fix(azure-ai-ml): add subscription_id and resource_group to Azure storage datastores#46067

Open
fmabroukmsft wants to merge 1 commit intoAzure:mainfrom
fmabroukmsft:fix/ml-datastore-subscription-resourcegroup
Open

fix(azure-ai-ml): add subscription_id and resource_group to Azure storage datastores#46067
fmabroukmsft wants to merge 1 commit intoAzure:mainfrom
fmabroukmsft:fix/ml-datastore-subscription-resourcegroup

Conversation

@fmabroukmsft
Copy link
Copy Markdown
Member

Problem

When creating Azure storage datastores via the azure-ai-ml SDK, the resulting datastore is missing subscriptionId and resourceGroup in its properties. This causes downstream operations like sharing data assets to registries to fail with a 400 error.

Datastores created via the Azure ML Studio UI correctly populate these fields.

Root Cause

The AzureBlobDatastore, AzureFileDatastore, and AzureDataLakeGen2Datastore entity classes do not include subscription_id and resource_group in their constructors, _to_rest_object() serialization, or _from_rest_object() deserialization — even though the underlying REST models (RestAzureBlobDatastore, etc.) fully support these optional fields.

Fix

Entity classes (entities/_datastore/azure_storage.py)

  • Added subscription_id: Optional[str] = None and resource_group: Optional[str] = None to all three constructors
  • Pass them through in _to_rest_object()
  • Read them back in _from_rest_object() for round-trip fidelity

YAML schema (_schema/_datastore/azure_storage.py)

  • Added subscription_id and resource_group as optional string fields to AzureStorageSchema

Backwards Compatibility

Both fields are optional and default to None. Existing code that does not pass them continues to work identically.

Example YAML (now supported)

\$schema: https://azuremlschemas.azureedge.net/latest/azureBlob.schema.json
type: azure_blob
name: my_datastore
account_name: mystorageaccount
container_name: mycontainer
subscription_id: 8f338f6e-4fce-44ae-969c-fc7d8fda030e
resource_group: my-resource-group

Testing

  • Manually verified via az ml datastore create with a companion CLI extension fix
  • Created datastore now shows subscriptionId and resourceGroup populated in ARM API response (previously null)

Related

…rage datastores

The AzureBlobDatastore, AzureFileDatastore, and AzureDataLakeGen2Datastore
entity classes were not including subscription_id and resource_group in
their _to_rest_object() serialization, even though the REST models accept
these optional fields.

When these fields are missing, the created datastore lacks ARM scope,
which breaks downstream operations such as sharing data assets to a
registry (400 error). Datastores created via the UI correctly populate
these fields.

Changes:
- Added subscription_id and resource_group parameters to all three Azure
  storage datastore constructors
- Pass these fields through in _to_rest_object() to the REST models
- Read them back in _from_rest_object() for round-trip fidelity
- Added fields to AzureStorageSchema so they can be specified in YAML

The fields are optional and default to None, so this is fully backwards
compatible — existing code that does not pass them continues to work.

Related ICM: 716428613

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates azure-ai-ml datastore entities and YAML schema to support persisting the Azure Storage account ARM scope (subscription_id and resource_group) so datastore creation/round-tripping includes these fields and downstream operations don’t fail due to missing scope.

Changes:

  • Added optional subscription_id and resource_group parameters to Azure storage datastore entity constructors.
  • Plumbed the new fields through REST serialization (_to_rest_object) and deserialization (_from_rest_object) for round-trip fidelity.
  • Extended the YAML schema to accept subscription_id and resource_group on Azure storage datastore definitions.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
sdk/ml/azure-ai-ml/azure/ai/ml/entities/_datastore/azure_storage.py Adds the new scope fields to Azure storage datastore entities and passes them through REST conversion.
sdk/ml/azure-ai-ml/azure/ai/ml/_schema/_datastore/azure_storage.py Extends Marshmallow schema to load/dump subscription_id and resource_group from YAML.

Comment on lines 84 to +88
self.account_name = account_name
self.endpoint = endpoint
self.protocol = protocol
self.subscription_id = subscription_id
self.resource_group = resource_group
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AzureFileDatastore now has subscription_id/resource_group fields that affect datastore identity/scope, but eq doesn’t compare them. This can cause two datastores with different ARM scope to compare equal and can hide regressions in round-trip tests. Include subscription_id and resource_group in the eq comparison (alongside endpoint/protocol).

Copilot uses AI. Check for mistakes.
self.protocol = protocol
self.subscription_id = subscription_id
self.resource_group = resource_group

Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AzureBlobDatastore now stores subscription_id/resource_group, but eq does not include these fields. That means objects with different storage ARM scope will compare equal, and round-trip equality assertions won’t validate these new properties. Update eq to compare subscription_id and resource_group as well.

Suggested change
def __eq__(self, other: object) -> bool:
if not isinstance(other, AzureBlobDatastore):
return NotImplemented
return (
super().__eq__(other)
and self.subscription_id == other.subscription_id
and self.resource_group == other.resource_group
)

Copilot uses AI. Check for mistakes.
Comment on lines 312 to +317
self.account_name = account_name
self.filesystem = filesystem
self.endpoint = endpoint
self.protocol = protocol
self.subscription_id = subscription_id
self.resource_group = resource_group
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AzureDataLakeGen2Datastore now has subscription_id/resource_group but eq doesn’t compare them, so differing ARM scope won’t be reflected in equality checks. Add subscription_id and resource_group to the eq comparison for correctness and to enable reliable round-trip assertions.

Copilot uses AI. Check for mistakes.
Comment on lines 24 to +33
class AzureStorageSchema(PathAwareSchema):
name = fields.Str(required=True)
id = fields.Str(dump_only=True)
account_name = fields.Str(required=True)
endpoint = fields.Str()
protocol = fields.Str()
description = fields.Str()
tags = fields.Dict(keys=fields.Str(), values=fields.Str())
subscription_id = fields.Str()
resource_group = fields.Str()
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior adds subscription_id/resource_group to datastore YAML loading and REST serialization, but existing unit tests (e.g., tests/datastore/unittests/test_datastore_schema.py) don’t assert these fields on dump/load or _to_rest_object/_from_rest_object. Add/extend a test config that includes subscription_id and resource_group and assert they round-trip into the REST properties and back into the entity.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants