Oded Keren kerenoded

Hi, I'm Oded 👋

AWS & Distributed Systems Architect focused on event-driven systems, high-scale telemetry and IoT pipelines, and pragmatic engineering tradeoffs (scalability, failure modes, observability, and cost).

I enjoy building practical tooling and exploring how systems behave under real-world load, concurrency, and failure scenarios.

Focus Areas

Event-Driven Architectures (EDA)
AWS IoT platforms and telemetry pipelines
Serverless performance & cost optimization
Distributed system reliability and observability

Architecture Interests

Distributed systems design
System behavior under load and concurrency
Observability and production diagnostics
Performance engineering and reproducible load testing
Failure-mode analysis in cloud systems

Open Source Projects

Most of my public projects focus on practical tooling for testing, investigating, and operating distributed systems in AWS environments.

AWS Incident Investigator

AWS-native incident investigation workflow built around deterministic workers and bounded AI.
Uses Step Functions to orchestrate evidence collection across metrics, logs, and traces, while GenAI serves as an advisory layer to compare competing hypotheses, interpret cross-source evidence, and surface missing evidence.

AWS Fargate Workload Runner

Generic AWS workload generator built on ECS Fargate with pluggable scenarios (IoT, SQS).
Designed for controlled load generation and analysis of system behavior under stress.

k6 Fargate Runner

Repeatable k6 load testing framework running on ECS Fargate, generating traffic from a consistent cloud environment instead of developer laptops.

Currently Exploring

Load generation and reproducible performance testing environments
System behavior under high concurrency and burst traffic
Event-driven system reliability patterns

Writing

Featured articles (long-form)

When a few noisy devices took down the system: lessons from a production investigation
A production incident investigation from an IoT canary OTA rollout that exposed how a small subset of noisy devices, hot database paths, app-triggered follow-up API calls, and retry behavior combined into system-wide contention.
Turning Noisy AWS IoT Presence (connect/disconnect events) into Reliable Connectivity State
How to turn noisy IoT connect/disconnect streams into a bounded, reliable connectivity signal using rate limiting, back-pressure, and derived state instead of treating raw presence events as truth.
How do you find the cost vs performance sweet spot of an AWS Lambda function?
Exploring Lambda memory tuning impact on cost and latency, validated with AWS Lambda Power Tuning and controlled load tests.
AWS IoT Services Deprecation: From Managed Pipelines to Modular Cloud Architectures
What the retirement of IoT Analytics and IoT Events signals about AWS’s direction for modern IoT architectures.

Selected posts

Serverless / performance / cost

Architecture / networking / cost tradeoffs

Private subnet connectivity: NAT vs Gateway Endpoint vs Interface Endpoint

Observability / production readiness

Most production incidents aren’t hard to fix — they’re hard to understand fast enough
A monitoring maturity journey and lessons from improving incident response.

IoT / event-driven architecture

Sometimes the best architecture improvement is removing an unnecessary path
Simplifying a production IoT progress-update flow by replacing a heavier event-processing path with a direct IoT Rule → Device Shadow pattern, reducing latency, cost, and operational complexity.
Load-test your EDA system without permanent infrastructure
Golden AWS IoT insights for reliability, scale, and cost-efficiency

Cloud strategy / delivery

Migrating on-prem to AWS is a maturity journey
Business goals first, cloud as leverage.
Analysis paralysis vs bias for action

Notes from the field

Amazon AgentCore: production AI agents at scale (AWS Israel event takeaways)

Connect

LinkedIn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly