AWS & Distributed Systems Architect focused on event-driven systems, high-scale telemetry and IoT pipelines, and pragmatic engineering tradeoffs (scalability, failure modes, observability, and cost).
I enjoy building practical tooling and exploring how systems behave under real-world load, concurrency, and failure scenarios.
- Event-Driven Architectures (EDA)
- AWS IoT platforms and telemetry pipelines
- Serverless performance & cost optimization
- Distributed system reliability and observability
- Distributed systems design
- System behavior under load and concurrency
- Observability and production diagnostics
- Performance engineering and reproducible load testing
- Failure-mode analysis in cloud systems
Most of my public projects focus on practical tooling for testing, investigating, and operating distributed systems in AWS environments.
AWS-native incident investigation workflow built around deterministic workers and bounded AI.
Uses Step Functions to orchestrate evidence collection across metrics, logs, and traces, while GenAI serves as an advisory layer to compare competing hypotheses, interpret cross-source evidence, and surface missing evidence.
Generic AWS workload generator built on ECS Fargate with pluggable scenarios (IoT, SQS).
Designed for controlled load generation and analysis of system behavior under stress.
Repeatable k6 load testing framework running on ECS Fargate, generating traffic from a consistent cloud environment instead of developer laptops.
- Load generation and reproducible performance testing environments
- System behavior under high concurrency and burst traffic
- Event-driven system reliability patterns
-
When a few noisy devices took down the system: lessons from a production investigation
A production incident investigation from an IoT canary OTA rollout that exposed how a small subset of noisy devices, hot database paths, app-triggered follow-up API calls, and retry behavior combined into system-wide contention. -
Turning Noisy AWS IoT Presence (connect/disconnect events) into Reliable Connectivity State
How to turn noisy IoT connect/disconnect streams into a bounded, reliable connectivity signal using rate limiting, back-pressure, and derived state instead of treating raw presence events as truth. -
How do you find the cost vs performance sweet spot of an AWS Lambda function?
Exploring Lambda memory tuning impact on cost and latency, validated with AWS Lambda Power Tuning and controlled load tests. -
AWS IoT Services Deprecation: From Managed Pipelines to Modular Cloud Architectures
What the retirement of IoT Analytics and IoT Events signals about AWS’s direction for modern IoT architectures.
- Practical takeaways for reducing AWS Lambda cost
- Going beyond Power Tuning: validating cost vs latency with controlled load tests (k6 on Fargate)
- How did we reduce AWS cloud costs by ~50%? Practical actions + tools
- Most production incidents aren’t hard to fix — they’re hard to understand fast enough
A monitoring maturity journey and lessons from improving incident response.
-
Sometimes the best architecture improvement is removing an unnecessary path
Simplifying a production IoT progress-update flow by replacing a heavier event-processing path with a direct IoT Rule → Device Shadow pattern, reducing latency, cost, and operational complexity. -
Golden AWS IoT insights for reliability, scale, and cost-efficiency
-
Migrating on-prem to AWS is a maturity journey
Business goals first, cloud as leverage.