fix(azure-ai-ml): add subscription_id and resource_group to Azure storage datastores#46067
Conversation
…rage datastores The AzureBlobDatastore, AzureFileDatastore, and AzureDataLakeGen2Datastore entity classes were not including subscription_id and resource_group in their _to_rest_object() serialization, even though the REST models accept these optional fields. When these fields are missing, the created datastore lacks ARM scope, which breaks downstream operations such as sharing data assets to a registry (400 error). Datastores created via the UI correctly populate these fields. Changes: - Added subscription_id and resource_group parameters to all three Azure storage datastore constructors - Pass these fields through in _to_rest_object() to the REST models - Read them back in _from_rest_object() for round-trip fidelity - Added fields to AzureStorageSchema so they can be specified in YAML The fields are optional and default to None, so this is fully backwards compatible — existing code that does not pass them continues to work. Related ICM: 716428613 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR updates azure-ai-ml datastore entities and YAML schema to support persisting the Azure Storage account ARM scope (subscription_id and resource_group) so datastore creation/round-tripping includes these fields and downstream operations don’t fail due to missing scope.
Changes:
- Added optional
subscription_idandresource_groupparameters to Azure storage datastore entity constructors. - Plumbed the new fields through REST serialization (
_to_rest_object) and deserialization (_from_rest_object) for round-trip fidelity. - Extended the YAML schema to accept
subscription_idandresource_groupon Azure storage datastore definitions.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| sdk/ml/azure-ai-ml/azure/ai/ml/entities/_datastore/azure_storage.py | Adds the new scope fields to Azure storage datastore entities and passes them through REST conversion. |
| sdk/ml/azure-ai-ml/azure/ai/ml/_schema/_datastore/azure_storage.py | Extends Marshmallow schema to load/dump subscription_id and resource_group from YAML. |
| self.account_name = account_name | ||
| self.endpoint = endpoint | ||
| self.protocol = protocol | ||
| self.subscription_id = subscription_id | ||
| self.resource_group = resource_group |
There was a problem hiding this comment.
AzureFileDatastore now has subscription_id/resource_group fields that affect datastore identity/scope, but eq doesn’t compare them. This can cause two datastores with different ARM scope to compare equal and can hide regressions in round-trip tests. Include subscription_id and resource_group in the eq comparison (alongside endpoint/protocol).
| self.protocol = protocol | ||
| self.subscription_id = subscription_id | ||
| self.resource_group = resource_group | ||
|
|
There was a problem hiding this comment.
AzureBlobDatastore now stores subscription_id/resource_group, but eq does not include these fields. That means objects with different storage ARM scope will compare equal, and round-trip equality assertions won’t validate these new properties. Update eq to compare subscription_id and resource_group as well.
| def __eq__(self, other: object) -> bool: | |
| if not isinstance(other, AzureBlobDatastore): | |
| return NotImplemented | |
| return ( | |
| super().__eq__(other) | |
| and self.subscription_id == other.subscription_id | |
| and self.resource_group == other.resource_group | |
| ) |
| self.account_name = account_name | ||
| self.filesystem = filesystem | ||
| self.endpoint = endpoint | ||
| self.protocol = protocol | ||
| self.subscription_id = subscription_id | ||
| self.resource_group = resource_group |
There was a problem hiding this comment.
AzureDataLakeGen2Datastore now has subscription_id/resource_group but eq doesn’t compare them, so differing ARM scope won’t be reflected in equality checks. Add subscription_id and resource_group to the eq comparison for correctness and to enable reliable round-trip assertions.
| class AzureStorageSchema(PathAwareSchema): | ||
| name = fields.Str(required=True) | ||
| id = fields.Str(dump_only=True) | ||
| account_name = fields.Str(required=True) | ||
| endpoint = fields.Str() | ||
| protocol = fields.Str() | ||
| description = fields.Str() | ||
| tags = fields.Dict(keys=fields.Str(), values=fields.Str()) | ||
| subscription_id = fields.Str() | ||
| resource_group = fields.Str() |
There was a problem hiding this comment.
New behavior adds subscription_id/resource_group to datastore YAML loading and REST serialization, but existing unit tests (e.g., tests/datastore/unittests/test_datastore_schema.py) don’t assert these fields on dump/load or _to_rest_object/_from_rest_object. Add/extend a test config that includes subscription_id and resource_group and assert they round-trip into the REST properties and back into the entity.
Problem
When creating Azure storage datastores via the
azure-ai-mlSDK, the resulting datastore is missingsubscriptionIdandresourceGroupin its properties. This causes downstream operations like sharing data assets to registries to fail with a 400 error.Datastores created via the Azure ML Studio UI correctly populate these fields.
Root Cause
The
AzureBlobDatastore,AzureFileDatastore, andAzureDataLakeGen2Datastoreentity classes do not includesubscription_idandresource_groupin their constructors,_to_rest_object()serialization, or_from_rest_object()deserialization — even though the underlying REST models (RestAzureBlobDatastore, etc.) fully support these optional fields.Fix
Entity classes (
entities/_datastore/azure_storage.py)subscription_id: Optional[str] = Noneandresource_group: Optional[str] = Noneto all three constructors_to_rest_object()_from_rest_object()for round-trip fidelityYAML schema (
_schema/_datastore/azure_storage.py)subscription_idandresource_groupas optional string fields toAzureStorageSchemaBackwards Compatibility
Both fields are optional and default to
None. Existing code that does not pass them continues to work identically.Example YAML (now supported)
Testing
az ml datastore createwith a companion CLI extension fixsubscriptionIdandresourceGrouppopulated in ARM API response (previouslynull)Related