Conversation
|
This PR will trigger a minor release when merged. |
ramboz
left a comment
There was a problem hiding this comment.
A few nitpicking comments, but looks good otherwise
solaris007
left a comment
There was a problem hiding this comment.
Thanks for this, @rpapani - really solid work on a non-trivial piece of infrastructure. A few things I especially liked:
- The credential stripping from the remote URL after standard repo clone - nice defensive practice
mkdtempSyncfor unique temp paths + the prefix validation incleanup()to prevent accidental deletion of arbitrary paths- Error sanitization for Bearer tokens and basic-auth URLs in git output
GIT_TERMINAL_PROMPT=0to prevent Lambda hangs- Thorough test suite covering both BYOG and standard flows
The main concerns are around the token lifecycle (the authorization_code grant + no expiry tracking combo could leave the client in an unrecoverable state) and credential exposure during push/pull for standard repos. See the inline comments for details.
One broader question: now that we have a Cloud Manager client that can talk to the CM Repo API, could we leverage this to also enrich our site detection? I'm thinking things like - does a site actually exist in CM, what's its configuration, which programs/repos are associated with it? That kind of context could really help with discovery and onboarding flows. Would be great to hear if that's on the radar or if the CM API surface supports it.
| client_id: clientId, | ||
| client_secret: clientSecret, | ||
| code: clientCode, | ||
| grant_type: 'authorization_code', |
There was a problem hiding this comment.
Two things here that I'd like to understand:
-
grant_type: 'authorization_code'with a staticcodefrom env vars - in standard OAuth 2.0, authorization codes are single-use. If the cached token expires and we try to re-exchange the same code, IMS should reject it and the client can't recover. If this is an IMS-specific reusable code pattern (I've seen some Adobe service-to-service configs do this), could you add a comment explaining that? Otherwise, should this beclient_credentials? -
The
expires_infrom the IMS response is ignored - the token gets cached forever on the instance. For warm Lambdas that can live for hours, we'd eventually use a stale token. Combined with Configure Renovate #1, this means one expired token = dead client. Something like this would help:
if (this.accessToken && Date.now() < this.tokenExpiresAt) {
return this.accessToken;
}There was a problem hiding this comment.
grant_type: 'authorization_code' with a static code from env vars - in standard OAuth 2.0, authorization codes are single-use.
The same access_token, I'm able to use it for multiple API calls to CM, so I'm not sure if it this is single-use. The API response is giving
{
"access_token": "eyJhbGciOiJSU...o4eSRY6a0Kg",
"refresh_token": "eyJhbGciOiJSU..._doaZSAOICPn8pRjDgjhPs_Q",
"token_type": "bearer",
"expires_in": 86399
}
so, I've thought that this token is valid for the whole day and Lambda life time is 15 min, so used the same grant_type the other clients are using in spacecat-shared.
Just realized that in cm-client, I'm making the IMS call directly, instead of leveraging the IMS client like in brand-client, though the IMS Client is also using the same authorization_code
To be consistent with other clients, I'll change cm-client to use the ims client, please let me know if I should avoid using the authorization_code indirectly through ims client
The expires_in from the IMS response is ignored - the token gets cached forever on the instance
good point, will fix
There was a problem hiding this comment.
Thanks for clarifying, @rpapani - and good catch on the existing IMS client.
So I dug into this a bit more. The authorization_code with a static code is indeed an Adobe IMS convention for service-to-service integrations - it's not a standard OAuth single-use code, it's more like a long-lived service credential that IMS lets you exchange repeatedly. All existing SpaceCat clients use this same pattern, so you're in good company.
Switching to use spacecat-shared-ims-client is definitely the right call - centralizes the IMS integration and keeps things consistent with brand-client and others. One thing to be aware of though: the IMS client has the same token expiry gap (caches the token but never checks expires_in). For a 15-min Lambda with a 24h token that's fine in practice, but ideally we'd fix that in the IMS client itself so all consumers benefit. That can be a separate effort though - no need to block this PR on it.
One broader heads-up worth flagging for the team: Adobe's legacy JWT/Service Account credentials have reached end of life as of June 30, 2025, with a hard March 1, 2026 deadline for forced auto-conversion (migration guide, JWT deprecation notice). The recommended server-to-server pattern going forward is grant_type: client_credentials via /ims/token/v3 (implementation guide) - which the IMS client already supports via getServiceAccessTokenV3(). The v4 authorization_code endpoint may be tied to the legacy credential type. Might be worth checking whether the CM Repo Service TA supports OAuth Server-to-Server credentials, and if so, using getServiceAccessTokenV3() instead. That way we're future-proof and don't have to scramble when the deadline hits.
tl;dr:
- Yes, switch to IMS client - good idea
- The
authorization_codepattern is fine for now, it's an established Adobe IMS thing - Consider using
getServiceAccessTokenV3()(client_credentials) if the TA supports it - the v4 path may stop working after March 2026 - Token expiry tracking can be fixed in the IMS client as a separate PR
There was a problem hiding this comment.
Consider using getServiceAccessTokenV3() (client_credentials) if the TA supports it - the v4 path may stop working after March 2026
when I'm using POST /ims/token/v4 with grant_type: client_credentials its failing with
{
"error": "unauthorized_client"
}
Can I use v4 for now and once I get input from CM team, will switch to oauth?
There was a problem hiding this comment.
That unauthorized_client error makes total sense - it means the CM Repo Service Technical Account is provisioned as a legacy "Service Account (JWT)" credential in Adobe Developer Console, not as an "OAuth Server-to-Server" credential. Only the OAuth S2S credential type supports client_credentials.
Using authorization_code via ImsClient (which is what getServiceAccessToken() does) is perfectly fine for now - that's what all other SpaceCat clients use. No need to change anything for this PR.
When you hear back from the CM team, the path forward would be:
- Someone with Dev Console access adds an "OAuth Server-to-Server" credential to the CM Repo Service integration
- Switch from
getServiceAccessToken()togetServiceAccessTokenV3()(which uses/ims/token/v3withclient_credentials) - Configure
IMS_SCOPEenv var (ImsClient picks it up for the v3 flow)
But that's a separate effort entirely. The authorization_code flow isn't tied to the JWT deprecation timeline (it's a different credential type), so there's no urgency here.
| this.log.info(`Cloning CM repository: program=${programId}, repo=${repositoryId}, type=${repoType}`); | ||
|
|
||
| const args = await this.#buildAuthGitArgs('clone', programId, repositoryId, { imsOrgId, repoType, repoUrl }); | ||
| this.#execGit([...args, clonePath]); |
There was a problem hiding this comment.
If the clone throws here (network timeout, auth failure, etc.), the mkdtempSync-created directory at clonePath gets leaked. applyPatch handles this nicely with try/finally below - would be great to do the same here. Lambda's /tmp is 512MB by default, and with retries those orphaned dirs could pile up.
There was a problem hiding this comment.
in the applyPatch, the tmp is a patch file, so I'm cleaning with finally block, but in the case of clone the tmp directory holds the cloned repo, so this shouldn't be cleaned until the client uses the cloned path, either for applying a patch, zipping etc. I've the cleanup(clonePath) which consumers can call to clean this up.
I can clean the tmp directory when there's an exception if you want it clean but its in ephemeral storage, so not sure if that's really important.
Lambda's /tmp is 512MB by default, and with retries those orphaned dirs could pile up.
Good point, I've thought about this. I've seen some repos 460MB in size, so we'll run into issues with such repos. We don't seem to have a way to configure this in hedy, so planning to open a pull request in hedy to support this and then will update our workers with that configuration. I'm thinking of using 1-2 GB for this ephemeral storage, would that be fine?
There was a problem hiding this comment.
To clarify - I'm not suggesting cleaning up the happy path. You're absolutely right that the clone dir is the output and the consumer owns its lifecycle via cleanup(clonePath). That design is solid.
The suggestion is specifically about the error path: if #execGit throws during the clone itself, the mkdtempSync-created directory is already on disk but contains only a partial or empty clone that no caller will ever use. Since the method throws, the caller never gets clonePath back and has no way to call cleanup() on it. unzipRepository already handles this pattern nicely - it cleans up extractPath in its catch block. Something like:
try {
this.#execGit([...args, clonePath]);
} catch (error) {
rmSync(clonePath, { recursive: true, force: true });
throw error;
}On ephemeral storage - good call on the hedy PR. For sizing, think about peak concurrent usage: clone (~460 MB) + .git history, and if unzipRepository runs in the same invocation (zip on disk + extracted tree simultaneously), that can spike to ~1 GB easily. I'd recommend 3 GB as a starting point - comfortable headroom without being wasteful. AWS Lambda supports up to 10 GB, so there's room to grow. Monitoring TmpStorageUsed in CloudWatch would give you the data to right-size it.
Also worth considering - could we reduce clone size with git flags for flows that don't need full history? The import flow needs .git history for downstream, but the autofix flow (clone -> branch -> patch -> push -> PR) likely doesn't. Options like --depth 1 --single-branch for autofix clones could cut that 460 MB down dramatically and ease the /tmp pressure. Even for the import flow, --single-branch --no-tags could help if downstream only needs one branch's history.
There was a problem hiding this comment.
could we reduce clone size with git flags for flows that don't need full history?
git history is required for our learning agent to understand if the suggestions are taken as-is or updated or what code changes contributed to page performance, customer coding style etc. Most of the time we need the production branch, not sure if the other branches are required atm. Can I do this optimization in a separate ticket?
There was a problem hiding this comment.
I think we can be more granular with that requirement.
We can probably limit the full git history to the main branch and only for the onboarding phase to build the project knowledge and best practices once.
For regular fixes, we can reuse the learned best practices as-is and potentially just update them every 6 months or so. We could plan for a usual smaller storage and only use a larger one for those "edge cases" if that helps keep COGS lower
There was a problem hiding this comment.
Totally makes sense - full history for the learning agent is a legit requirement. Commit-level analysis of whether suggestions were adopted, coding style patterns, performance impact - you need that history for all of that.
Separate ticket sounds good, no need to block this PR.
One thought for that ticket though - even keeping full history, --single-branch could be a quick win if you typically only need the production branch. It fetches the complete history for that branch but skips remote tracking refs for all the others. Paired with --no-tags, it could meaningfully reduce transfer size without losing any commit history on the branch you care about.
And since these flags are additive, you could configure them per-flow: full clone for the learning agent, --depth 1 --single-branch for autofix (where you just need a working copy to apply a patch and push), etc.
| async push(clonePath, programId, repositoryId, { | ||
| imsOrgId, repoType, repoUrl, ref, | ||
| } = {}) { | ||
| const pushArgs = await this.#buildAuthGitArgs('push', programId, repositoryId, { imsOrgId, repoType, repoUrl }); |
There was a problem hiding this comment.
Nice job stripping creds from the remote URL after clone (line 294) - but for push and pull, #buildAuthGitArgs re-embeds user:token in the URL and doesn't clean up afterward. The creds end up in /proc/PID/cmdline while git runs.
Any reason not to use the same -c http.extraheader approach with Authorization: Basic <base64> for standard repos too? That would make all repo types consistent and avoid creds in the URL entirely.
There was a problem hiding this comment.
Any reason not to use the same -c http.extraheader approach with Authorization: Basic for standard repos too? That would make all repo types consistent and avoid creds in the URL entirely.
great suggestion, didn't think of this, tried this locally, it worked great, will implement.
but for push and pull, #buildAuthGitArgs re-embeds user:token in the URL and doesn't clean up afterward. The creds end up in /proc/PID/cmdline while git runs.
even with -c extraheader, this problem still there right? To avoid this entirely, should I use GIT_CONFIG_COUNT approach (like below) or is there a better approach?
GIT_CONFIG_COUNT=2
GIT_CONFIG_KEY_0=http.extraheader
GIT_CONFIG_VALUE_0=Authorization: Bearer TOKEN
GIT_CONFIG_KEY_1=http.extraheader
GIT_CONFIG_VALUE_1=x-api-key: my-key
There was a problem hiding this comment.
Great instinct - you're right that -c http.extraheader still exposes credentials in /proc/PID/cmdline (and ps output), so from a cmdline perspective it's the same exposure as URL-embedded creds.
GIT_CONFIG_COUNT is exactly the mechanism designed to solve this. It was added in Git 2.31.0 (commit d8d77153) specifically for this reason - the commit message even calls out the ps(1) credential leakage from -c args. With env vars, credentials live in /proc/PID/environ which is restricted to same-UID/root (needs CAP_SYS_PTRACE), while /proc/PID/cmdline is world-readable. That's a meaningful step up.
One thing to verify first though: which git version does our Lambda layer provide? The public lambci/git-lambda-layer ships Git 2.29.0, and GIT_CONFIG_COUNT needs 2.31+. If we're on 2.29, the env vars would be silently ignored and auth would just fail. Worth checking git --version in a Lambda exec to confirm.
For this PR, I think the -c http.extraheader approach for standard repos (which you've already agreed to) is the right move - it's a clear improvement over URL-embedded creds, works with any git version, and makes all repo types consistent. The /proc/PID/cmdline exposure is a real but low-severity concern in Lambda's threat model: no multi-tenant process namespace, and the execution is ephemeral.
If we want to close that gap later, there are two options:
GIT_CONFIG_COUNT- cleanest, no file I/O, but needs git >= 2.31- Temp config file with
include.path- works with any git version, similar to what GitHub Actions v6 does
Either way, that's a follow-up - the -c extraheader approach is solid for this PR.
solaris007
left a comment
There was a problem hiding this comment.
Nice work on the update, @rpapani - you addressed every item from the first review round. The ImsClient integration, extraheader-based auth for both repo types, timeout handling, sanitization, and clone error cleanup all look solid. Good stuff.
One non-blocking thought for a follow-up:
In-memory zipRepository - archiver currently buffers the entire ZIP into memory via Buffer.concat(chunks). For a 460 MB repo with .git history, the compressed buffer plus the archiver working set could push past Lambda's default 1 GB memory limit (especially if the clone is also still on disk at that point). Might be worth considering a stream-to-S3 approach or writing the zip to /tmp first and then uploading, so memory stays flat regardless of repo size. Not a blocker for this PR - just something to keep an eye on once you're dealing with the larger repos in prod.
|
@solaris007, @ramboz here's the hedy PR - adobe/helix-deploy#890, it'll be great if you can provide the feedback on that.
For ASO auto-fix purposes, we need authorUrl (for content opportunities) and code repo (for code opportunities) that's being used in the prod environment. If we know the program id from customer domain, then we can obtain environment, author url and code repo/branch everything using CM APIs, the manual process is documented in this wiki. We're working with CM team to get access to CM APIs for our |
…path of the code for the site
|
🎉 This PR is included in version @adobe/spacecat-shared-cloud-manager-client-v1.0.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
## [@adobe/spacecat-shared-data-access-v3.2.0](https://github.com/adobe/spacecat-shared/compare/@adobe/spacecat-shared-data-access-v3.1.0...@adobe/spacecat-shared-data-access-v3.2.0) (2026-02-21) ### Features * cloud manager client ([#1335](#1335)) ([2e4b013](2e4b013))
|
🎉 This PR is included in version @adobe/spacecat-shared-data-access-v3.2.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
spacecat-shared-cloud-manager-client— Shared client for Adobe Cloud Manager: git operations (clone, zip, branch, apply patch from S3, push) and CM Repo REST API (create pull request). Supports BYOG (GitHub, Bitbucket, GitLab, Azure DevOps) and standard (non-BYOG) repos. Used by code-import and autofix flows.Requirements
API
Security notes:
mkdtempSyncunder the OS temp directory, producing unique, unpredictable paths safe from symlink attacks and concurrent-run collisions.git remote set-url origin <repoUrl>to strip basic-auth credentials from the stored remote, sogit remote -vnever exposes secrets.finallyblock.[REDACTED]and basic-auth credentials in URLs are masked with***.Validation Status (In Progress)
Example Code config for test site in dev