Skip to content

feat: add Azure Functions API with caching proxy#101

Open
trieloff wants to merge 13 commits intomainfrom
feature/function-app-api
Open

feat: add Azure Functions API with caching proxy#101
trieloff wants to merge 13 commits intomainfrom
feature/function-app-api

Conversation

@trieloff
Copy link
Copy Markdown
Contributor

Summary

Adds an Azure Functions backend API that proxies and caches responses from Google Apps Script, reducing load and improving response times.

Changes

  • Azure Functions API (api/ folder)

    • getCurrentIncident endpoint that proxies Google Apps Script
    • 2-minute cache with stale fallback on errors
    • Returns X-Cache: HIT|MISS|STALE headers
  • GitHub Actions Deployment

    • Auto-deploy to production (aem-status-api) on main
    • Auto-deploy to staging (aem-status-api-staging) on feature branches
    • GitHub deployment tracking for visibility

Endpoints

Secrets Required

  • AZURE_FUNCTIONAPP_PUBLISH_PROFILE - Production publish profile
  • AZURE_FUNCTIONAPP_PUBLISH_PROFILE_STAGING - Staging publish profile

- Add getCurrentIncident endpoint that proxies Google Apps Script
- Cache responses for 2 minutes with stale fallback on errors
- Add GitHub Actions workflow for automated deployment

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Signed-off-by: Lars Trieloff <lars@trieloff.net>
- Deploy to aem-status-api-staging for feature branches
- Deploy to aem-status-api (production) only on main

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Signed-off-by: Lars Trieloff <lars@trieloff.net>
- Track API deployments (api-production, api-staging environments)
- Track Static Web App deployments (production environment)
- Show deployment status and environment URLs in GitHub UI

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Signed-off-by: Lars Trieloff <lars@trieloff.net>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Signed-off-by: Lars Trieloff <lars@trieloff.net>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Signed-off-by: Lars Trieloff <lars@trieloff.net>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Signed-off-by: Lars Trieloff <lars@trieloff.net>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an Azure Functions backend API that serves as a caching proxy for Google Apps Script, reducing load on the upstream service and improving response times through a 2-minute cache with stale-on-error fallback.

Key Changes:

  • Adds Azure Functions API with getCurrentIncident endpoint that proxies Google Apps Script
  • Implements 2-minute cache with stale fallback mechanism and cache status headers
  • Adds GitHub Actions workflow for automated deployment to production and staging environments

Reviewed changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
api/src/functions/getCurrentIncident.js Core function implementing caching proxy with error handling
api/package.json Package configuration for Azure Functions with ES modules
api/package-lock.json Lockfile with @azure/functions v4.9.0 and dependencies
api/host.json Azure Functions host configuration with logging settings
.github/workflows/deploy-functions.yml Deployment workflow for production/staging environments
.github/workflows/azure-static-web-apps.yml Added deployment tracking for static web app
Files not reviewed (1)
  • api/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread api/src/functions/getCurrentIncident.js
Comment thread api/src/functions/getCurrentIncident.js Outdated
Comment thread api/src/functions/getCurrentIncident.js
Comment thread api/src/functions/getCurrentIncident.js
Comment on lines +11 to +14
app.http('getCurrentIncident', {
methods: ['GET'],
authLevel: 'anonymous',
handler: async (request, context) => {
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getCurrentIncident function lacks test coverage. Given that the repository uses Node.js native test runner (see test/details.test.js), consider adding tests for: cache hit/miss scenarios, stale cache fallback on errors, error handling when both fetch fails and cache is empty, and proper cache expiration.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have post-deploy integration tests (api/test/post-deploy.test.js) that verify the API works end-to-end after each deployment. Unit tests with mocking would require additional dependencies. The integration tests cover the critical paths: status codes, headers, and response structure.

Comment thread api/src/functions/getCurrentIncident.js
Comment thread api/package-lock.json
Comment thread .github/workflows/deploy-functions.yml
Comment thread api/src/functions/getCurrentIncident.js
- Verify 200 status, JSON content-type, CORS headers
- Check X-Cache and Age headers
- Validate response structure
- Uses Node's built-in test runner with describe/it

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Signed-off-by: Lars Trieloff <lars@trieloff.net>
Replaces slow Google Apps Script endpoint with cached Azure Functions proxy.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Signed-off-by: Lars Trieloff <lars@trieloff.net>
- Support GOOGLE_SCRIPT_URL env var with fallback default
- Add 10-second fetch timeout with AbortController
- Validate response structure before caching
- Include timeout/upstream_error reason in 502 responses

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Signed-off-by: Lars Trieloff <lars@trieloff.net>
Google typically responds in 1.5-2.2s but may be slower during cold starts.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Signed-off-by: Lars Trieloff <lars@trieloff.net>
Fail fast if env var is not configured to avoid obscure debugging issues.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Signed-off-by: Lars Trieloff <lars@trieloff.net>
Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Signed-off-by: Lars Trieloff <lars@trieloff.net>
502 Bad Gateway for upstream errors, 504 Gateway Timeout for timeouts.

Co-authored-by: factory-droid[bot] <138933559+factory-droid[bot]@users.noreply.github.com>

Signed-off-by: Lars Trieloff <lars@trieloff.net>
@rofe
Copy link
Copy Markdown
Collaborator

rofe commented Nov 30, 2025

I'm not exactly clear on the problem are we trying to solve with this... I can see that load on the Google API could be reduced, but the end user experience would not change that much:

  • The first visit to www.aemstatus.net would still trigger an initial call to Google Apps (Azure cache miss), which would now even take slightly longer due to the roundtrip through the Azure function
  • www.aemstatus.net currently refreshes the current incident every 30s, which would now be faster but also obsolete due to the 2 minute cache, so for this interval to stay useful, it should be increased to match the Azure cache TTL
  • If www.aemstatus.net refreshes the current incident every 2 minutes instead of 30s, it will always be an Azure cache miss and trigger a Google Apps call, making the Azure function obsolete
  • The only scenarios where an end user would benefit from a quicker response (and the load on the Google API would be reduced) are:
    • When more than 2 users visit www.aemstatus.net in a 2 minute window: User A gets a cache miss in Azure, triggers a call to Google Apps and waits. User B, C, etc benefit from the Azure cache. But do we have traffic data from www.aemstatus.net that makes this a realistic scenario?
    • When they navigate away from the homepage (i.e. to read a postmortem) and come back within 2 minutes. But I would consider this an edge case that wouldn't warrant a complex setup...
  • Also, what happens if multiple users trigger calls to the Azure function while it is still waiting for Google Apps to respond? Does it deliver stale cache (if available)?

trieloff pushed a commit that referenced this pull request Dec 1, 2025
- Add @adobe/helix-rum-js dependency for traffic measurement
- Include RUM standalone script in all HTML pages (index, details, postmortem, what, when)
- Use ot.aem.live as the RUM collection endpoint
- Addresses request for traffic data in PR #101 comment

This enables basic traffic measurement and user interaction tracking
on the AEM status page to provide insights into page usage patterns.

Signed-off-by: Lars Trieloff <lars@trieloff.net>
trieloff pushed a commit that referenced this pull request Dec 1, 2025
- Include RUM standalone script in all HTML pages (index, details, postmortem, what, when)
- Use ot.aem.live as the RUM collection endpoint
- Addresses request for traffic data in PR #101 comment

This enables basic traffic measurement and user interaction tracking
on the AEM status page to provide insights into page usage patterns.

The RUM library is loaded directly from the CDN, no npm dependency needed.

Signed-off-by: Lars Trieloff <lars@trieloff.net>
@rofe
Copy link
Copy Markdown
Collaborator

rofe commented Jan 12, 2026

@trieloff How about this for a simple, intermediate solution to address the delayed loading the current incident data:
On page load, and just once, we show a placeholder between the services and last 30 days sections. It contains a loading/spinner graphic and text saying "Hold on while we are checking if there is an ongoing incident...".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants