Skip to content

feat(auth): Implement secure httpOnly cookie authentication for Okta#1920

Open
arjunp99 wants to merge 3 commits intodata-dot-all:mainfrom
arjunp99:feature/secure-httponly-cookie-auth
Open

feat(auth): Implement secure httpOnly cookie authentication for Okta#1920
arjunp99 wants to merge 3 commits intodata-dot-all:mainfrom
arjunp99:feature/secure-httponly-cookie-auth

Conversation

@arjunp99
Copy link
Contributor

@arjunp99 arjunp99 commented Mar 3, 2026

Summary

Replace localStorage token storage with httpOnly cookies to prevent XSS attacks for Okta authentication. This implements a custom PKCE flow while maintaining existing Cognito/Amplify behavior unchanged.

Security Improvements

  • Tokens stored in httpOnly cookies (not accessible via JavaScript - prevents XSS token theft)
  • SameSite=Lax prevents CSRF while allowing OAuth redirects from Okta
  • Secure flag ensures HTTPS-only transmission

Changes

Frontend

  • frontend/src/utils/pkce.js - PKCE utility for secure OAuth code exchange
  • frontend/src/authentication/views/Callback.js - OAuth callback handler
  • frontend/src/authentication/contexts/GenericAuthContext.js - Cookie-based auth for Okta
  • frontend/src/services/hooks/useClient.js - Relative URLs + credentials for cookies
  • frontend/src/routes.js - Added /callback route

Backend

  • backend/auth_handler.py - Token exchange, userinfo, logout endpoints
  • deploy/stacks/lambda_api.py - Auth handler Lambda + API routes
  • deploy/stacks/cloudfront.py - Proxy /auth/, /graphql/, /search/* to API Gateway
  • deploy/custom_resources/custom_authorizer/custom_authorizer_lambda.py - Read tokens from Cookie header

How It Works

  1. User clicks login → redirected to Okta with PKCE challenge
  2. Okta redirects back to /callback with authorization code
  3. Frontend calls /auth/token-exchange with code + PKCE verifier
  4. Backend exchanges code for tokens, sets httpOnly cookies
  5. All subsequent API calls include cookies automatically (same-origin via CloudFront proxy)
  6. Custom authorizer reads access_token from Cookie header

Backward Compatibility

  • Cognito users: No changes - continues using Amplify with Authorization header

arjunp99 added 3 commits March 3, 2026 11:00
Replace localStorage token storage with httpOnly cookies to prevent XSS
attacks. Implements custom PKCE flow for Okta authentication while
maintaining existing Cognito/Amplify behavior unchanged.

Changes:
- Add PKCE utility for secure OAuth code exchange
- Add Callback view for handling OAuth redirects
- Add backend auth_handler for token exchange endpoints
- Update GenericAuthContext with cookie-based auth for Okta
- Update useClient to work without Authorization header for Okta
- Configure CloudFront to proxy /auth/*, /graphql/*, /search/* paths
- Update Lambda API with auth endpoints and CORS for cookies
- Update custom authorizer to read tokens from Cookie header

Security improvements:
- Tokens stored in httpOnly cookies (not accessible via JavaScript)
- SameSite=Lax prevents CSRF while allowing OAuth redirects
- Secure flag ensures HTTPS-only transmission
Security improvements:
- Add structured logging with sanitized error messages
- Remove hardcoded CloudFront URL fallback (requires proper config)
- Move SimpleCookie import to module level for better performance

Frontend enhancements:
- Add 30-second timeout to token exchange requests
- Fix useEffect dependency array in useClient hook
- Implement OAuth callback handler with PKCE validation

Infrastructure updates:
- Configure auth handler Lambda for cookie-based authentication
- Add API Gateway routes for token exchange, logout, and userinfo
- Improve CloudFront URL parsing documentation

All changes pass Ruff linting and formatting checks.
def logout_handler(event):
"""Clear all auth cookies"""
cookies = []
for cookie_name in ['access_token', 'id_token', 'refresh_token']:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also logout from okta. Here we are deleting those cookies but that doesn't mean we will be logging out of Okta. Should we make a call to the okta endpoint to let Okta know that we want to logout ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point! I'll implement Okta logout by redirecting to Okta's /v1/logout endpoint from the frontend after clearing cookies. This fully ends the Okta session so the user must re-authenticate on next login.

The flow will be:

Frontend calls /auth/logout to clear cookies
Frontend redirects to {okta_url}/v1/logout?id_token_hint=...&post_logout_redirect_uri=...
This is handled on the frontend side since it requires a browser redirect.

elif path == '/auth/userinfo' and method == 'GET':
return userinfo_handler(event)
else:
return error_response(404, 'Not Found', event)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of 'Not Found' can we say something more descriptive like Incorrect route for authentication etc ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to: 'Auth endpoint not found. Valid routes: /auth/token-exchange, /auth/logout, /auth/userinfo'

def userinfo_handler(event):
"""Return user info from id_token cookie"""
try:
cookie_header = event.get('headers', {}).get('Cookie') or event.get('headers', {}).get('cookie', '')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does the headers change between browsers. Asking this since you are fetching both Cookie and cookie key

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, HTTP header names can be passed with different casing depending on the proxy/gateway configuration. API Gateway sometimes normalizes headers to lowercase (cookie) while the HTTP spec uses Cookie. Checking both ensures we handle all cases reliably.

return error_response(401, 'Invalid token format', event)

payload = parts[1]
padding = 4 - len(payload) % 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a comment and document what you are trying to do here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments on this logic.

        # JWT format: header.payload.signature (base64url encoded)
        payload = parts[1]

        # Base64 requires padding to be multiple of 4 characters
        # URL-safe base64 in JWTs often omits padding, so we add it back
        padding = 4 - len(payload) % 4
        if padding != 4:
            payload += '=' * padding

),
}

except Exception as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are any specific exception raised by base64.urlsafe_b64decode or other package you are using . Catch the important ones

Copy link
Contributor Author

@arjunp99 arjunp99 Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added specific exception handling:

binascii.Error, ValueError - for base64 decode failures
json.JSONDecodeError - for invalid JSON in JWT payload
Generic Exception kept as fallback for unexpected errors

All errors are logged with details but return generic messages to clients.

)

# Add API Gateway behaviors for cookie-based authentication (when using custom_auth)
if custom_auth and backend_region:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the backend region check for ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backend_region check is redundant - looking at pipeline.py, it's always set via backend_region=target_env.get('region', self.region), so it will never be None. Removed the unnecessary check.

cloudfront_distribution.add_behavior(
path_pattern='/auth/*',
origin=api_gateway_origin,
cache_policy=cloudfront.CachePolicy.CACHING_DISABLED,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the typical behaviour for auth endpoints to not have caching ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is standard practice. Auth endpoints should never be cached - each request is unique (one-time auth codes, session-specific cookies, user-specific data). Caching would return stale data and break login/logout.


# Add behavior for /graphql/* routes
cloudfront_distribution.add_behavior(
path_pattern='/graphql/*',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We didn't have any cloudfront dist behaviour for graphql or search. What's the benefit of adding it here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required for httpOnly cookies to work. Browsers only send cookies to same-origin requests. Before, tokens were in localStorage and sent via Authorization header (works cross-origin). Now with httpOnly cookies, the browser won't send them to a different origin (API Gateway). Routing through CloudFront makes frontend and API same-origin, so cookies are sent automatically.

api_handler_env['frontend_domain_url'] = f'https://{custom_domain.get("hosted_zone_name", None)}'
if custom_auth:
api_handler_env['custom_auth'] = custom_auth.get('provider', None)
api_handler_env['custom_auth_url'] = custom_auth.get('url', None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these needed ?

api_handler_env['frontend_domain_url'] = f'https://{custom_domain.get("hosted_zone_name", None)}'
if custom_auth:
api_handler_env['custom_auth'] = custom_auth.get('provider', None)
api_handler_env['custom_auth_url'] = custom_auth.get('url', None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these needed ?

vpc=vpc,
security_groups=[auth_handler_sg],
memory_size=512 if prod_sizing else 256,
timeout=Duration.seconds(30),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this set timeout for the AWS lambda ?


# Initialize Klayers
klayers = Klayers(self, python_version=PYTHON_LAMBDA_RUNTIME, region=self.region)
runtime = _lambda.Runtime.PYTHON_3_12
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not using the config defined python runtime - PYTHON_LAMBDA_RUNTIME

handler=self.authorizer_fn,
identity_sources=[apigw.IdentitySource.header('Authorization')],
# Empty identity_sources allows Lambda to be invoked without specific headers
# This enables cookie-based auth where tokens come from Cookie header
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the comment could be extended
This enables cookie-based auth where tokens come from Cookie header and also auth with Authorization header

if custom_domain and custom_domain.get('hosted_zone_name'):
cors_origin = f'https://{custom_domain.get("hosted_zone_name")}'
else:
cors_origin = '' # Must be configured via custom_domain in cdk.json
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If custom domain is not present then should we default to the domain URL provided by Cloudfront and add it here ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include this file into the index.js and then import this file over here - frontend/src/authentication/contexts/GenericAuthContext.js


// Use relative URL for custom auth (CloudFront proxy), otherwise use env var
const graphqlUri = CUSTOM_AUTH
? '/graphql/api'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why we need the cloudfront URL vs the API Gateway URL . Can you explain on why we created the cloufront distribution ?

signInWithRedirect,
signOut
} from 'aws-amplify/auth';
import { generatePKCE, generateState } from '../../utils/pkce';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this - https://github.com/data-dot-all/dataall/pull/1920/changes#r2886370803
you can import it like import { generatePKCE, generateState } from 'utils/pkce';

requestInfo: null
}
});
await logout();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you let me know if you tested the ReAuth flow ?

Copy link
Collaborator

@TejasRGitHub TejasRGitHub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @arjunp99 , The changes made by you look very solid. I have added comments some of which are for clarification and some cosmetic. But mostly everything looks solid

There are a few things which I think are missing,

  1. What happens when the user token expires, in the current implementation, the webapp automatically resets to the login page. There is an internal event set when the token reaches expiration. I think if possible we should mimick that.
  2. The logout flow currently only clears the cookies but it should also logout from okta if such an endpoint is present to invalid the tokens in okta at the same time user logouts

@TejasRGitHub TejasRGitHub requested a review from petrkalos March 4, 2026 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants