UN-2297 [FEAT] Support AWS S3 storage authentication via IRSA#1852
UN-2297 [FEAT] Support AWS S3 storage authentication via IRSA#1852kirtimanmishrazipstack wants to merge 17 commits intomainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Summary by CodeRabbitBug Fixes
Chores
WalkthroughChanged Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…ort-AWS-S3-storage-IAM
…ort-AWS-S3-storage-IAM
…ort-AWS-S3-storage-IAM
Greptile SummaryThis PR adds support for AWS IAM Roles for Service Accounts (IRSA) authentication when connecting to S3/Minio storage, enabling credential-chain-based auth (e.g. on EKS) without requiring explicit access key/secret pairs. Key changes:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[file_storage_init called] --> B{provider?}
B -- LOCAL --> C[Set auto_mkdir=True]
B -- MINIO --> D[Set protocol = S3]
B -- S3 / MINIO --> E[Strip empty-string values\nfrom storage_config]
C --> F[fsspec.filesystem]
D --> E
E --> G{region_name\nin config?}
G -- Yes --> H[Pop region_name\nfrom storage_config]
H --> I[Merge into client_kwargs]
I --> F
G -- No --> F
F --> J{Success?}
J -- Yes --> K[Return AbstractFileSystem]
J -- KeyError --> L[Log error + raise FileStorageError]
J -- Exception --> L
Prompt To Fix All With AIThis is a comment left during a code review.
Path: unstract/sdk1/tests/file_storage/__init__.py
Line: 1
Comment:
**No tests added for new IRSA logic**
The `__init__.py` was added to create the test package structure, but no actual test cases were written for the new IRSA-related behaviour. The key paths that would benefit from test coverage are:
- Empty-string credential values are stripped before calling `fsspec.filesystem()` (e.g. passing `key=""` results in `key` being absent from the final kwargs).
- `region_name` is moved from the top-level kwargs into `client_kwargs` for both `S3` and `MINIO` providers.
- A caller that passes no `region_name` at all does not get a `client_kwargs` entry injected.
- A caller that already provides `client_kwargs` without a `region_name` key has it merged correctly.
Given that this change directly affects credential-chain resolution on production EKS clusters, having at least a lightweight unit-test suite here would make the behaviour explicit and prevent regressions.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: unstract/sdk1/src/unstract/sdk1/file_storage/helper.py
Line: 45-46
Comment:
**`region_name` falsy check skips valid non-empty values**
Line 46 uses a truthiness check (`if region_name:`) rather than an explicit `is not None` guard:
```python
region_name = storage_config.pop("region_name", None)
if region_name: # <-- falsy check
```
There are two subtle consequences:
1. **Empty string after stripping** — `region_name=""` would already have been removed by the dict comprehension above, so `pop` returns `None`, and the guard correctly skips. This path is fine.
2. **Caller explicitly passes `region_name=None`** — `pop` returns `None`; the falsy check skips moving it into `client_kwargs`. The key is also gone from `storage_config` (because `pop` removed it). The net effect is that an explicitly-passed `None` is silently discarded rather than forwarded to `fsspec`, which could mask caller mistakes.
More importantly, using `is not None` makes the intent unambiguous:
```suggestion
region_name = storage_config.pop("region_name", None)
if region_name is not None:
```
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: "Merge branch 'UN-229..." |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
unstract/sdk1/src/unstract/sdk1/file_storage/helper.py (1)
13-14: Type annotationobjectfor**kwargsvalues is overly broad.With
**kwargssyntax, the annotation describes the type of each value, not the dict itself. Usingobjectis less precise thanAny. Since this is variadic keyword arguments and you're calling dict methods like.update(),.items(),.pop()onstorage_config, the correct annotation would be**storage_config: Any.Suggested fix
+from typing import Any + class FileStorageHelper: `@staticmethod` def file_storage_init( - provider: FileStorageProvider, **storage_config: object + provider: FileStorageProvider, **storage_config: Any ) -> AbstractFileSystem:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@unstract/sdk1/src/unstract/sdk1/file_storage/helper.py` around lines 13 - 14, The type annotation for the variadic kwargs in function file_storage_init is too broad (object) and should be changed to Any to reflect that each value can be any type and since the function calls dict methods like storage_config.update(), storage_config.items(), and storage_config.pop(); update the signature to def file_storage_init(provider: FileStorageProvider, **storage_config: Any) and import Any from typing to match actual usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@unstract/sdk1/src/unstract/sdk1/file_storage/helper.py`:
- Around line 82-86: The error log inside local_file_system_init incorrectly
references FileStorageProvider.GCS.value; update the logger.error call used in
local_file_system_init to reference the correct provider
(FileStorageProvider.LOCAL.value) and preserve the existing error variable (e)
and message formatting so failures initializing the LOCAL file system are logged
accurately.
---
Nitpick comments:
In `@unstract/sdk1/src/unstract/sdk1/file_storage/helper.py`:
- Around line 13-14: The type annotation for the variadic kwargs in function
file_storage_init is too broad (object) and should be changed to Any to reflect
that each value can be any type and since the function calls dict methods like
storage_config.update(), storage_config.items(), and storage_config.pop();
update the signature to def file_storage_init(provider: FileStorageProvider,
**storage_config: Any) and import Any from typing to match actual usage.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2ca84777-daed-4bd5-914a-ffe004b75c0d
📒 Files selected for processing (2)
unstract/sdk1/src/unstract/sdk1/file_storage/helper.pyunstract/sdk1/tests/file_storage/__init__.py
…ack/unstract into UN-2297-Support-AWS-S3-storage-IAM
…ort-AWS-S3-storage-IAM
…ack/unstract into UN-2297-Support-AWS-S3-storage-IAM
Test ResultsSummary
Runner Tests - Full Report
SDK1 Tests - Full Report
|
|



What
region_nameintoclient_kwargsfor s3fs compatibility%formatting in logging functionsWhy
How
unstract/sdk1/src/unstract/sdk1/file_storage/helper.py:fsspec.filesystem(). This lets boto3 fall through to the default credential chain (e.g., IRSA, instance profile).region_namefrom a top-level kwarg intoclient_kwargs, which is where s3fs expects it.%formatting.Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
region_namemove intoclient_kwargsis the correct s3fs API usage and does not change behavior for existing configurations.Database Migrations
Env Config
Relevant Docs
Related Issues or PRs
Dependencies Versions
Notes on Testing
Screenshots
Checklist
I have read and understood the Contribution Guidelines.