Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
f349462
Add alarms
francisco-videira-nhs Feb 10, 2026
bc7febe
Add alarms dir and fix anomaly metric
francisco-videira-nhs Feb 10, 2026
fb61f97
add alarms readme
francisco-videira-nhs Feb 10, 2026
f00f941
Merge remote-tracking branch 'origin/main' into feature/CCM-12860
francisco-videira-nhs Feb 10, 2026
e6cd9b5
fix return data
francisco-videira-nhs Feb 10, 2026
7f45981
Fix some values and increase cert expiry from 14 to 30
francisco-videira-nhs Feb 11, 2026
00a58df
Merge remote-tracking branch 'origin/main' into feature/CCM-12860
francisco-videira-nhs Feb 12, 2026
a5ffa22
Make sqs msg age alarms same period; some tf lint
francisco-videira-nhs Feb 12, 2026
c962260
Merge remote-tracking branch 'origin/main' into feature/CCM-12860
francisco-videira-nhs Feb 12, 2026
9cae234
Fix expiry unit tests
francisco-videira-nhs Feb 12, 2026
19315df
Merge remote-tracking branch 'origin/main' into feature/CCM-12860
francisco-videira-nhs Feb 17, 2026
dff2437
Merge remote-tracking branch 'origin/main' into feature/CCM-12860
francisco-videira-nhs Feb 19, 2026
efa830e
Split into files
francisco-videira-nhs Feb 19, 2026
224a3a4
Split alarm modules into files
francisco-videira-nhs Feb 20, 2026
fba88fd
Merge remote-tracking branch 'origin/main' into feature/CCM-12860
francisco-videira-nhs Feb 20, 2026
f42a638
Fix tf
francisco-videira-nhs Feb 20, 2026
7dee2f5
revert me
francisco-videira-nhs Feb 20, 2026
5cf233d
Fix dirs
francisco-videira-nhs Feb 20, 2026
0b44f74
fix path
francisco-videira-nhs Feb 20, 2026
4b18cde
fix new resources
francisco-videira-nhs Feb 20, 2026
b4455c6
Fix for each
francisco-videira-nhs Feb 20, 2026
9e235ed
Fix patch tests
francisco-videira-nhs Feb 24, 2026
f7b9ab9
Merge remote-tracking branch 'origin/main' into feature/CCM-12860
francisco-videira-nhs Feb 24, 2026
bfb41a4
Add wait to patch tests
francisco-videira-nhs Feb 25, 2026
75ae0c5
Add wait status to post tests
francisco-videira-nhs Feb 25, 2026
e095434
Merge remote-tracking branch 'origin/main' into feature/CCM-12860
francisco-videira-nhs Feb 25, 2026
94e1e91
Add new; fix error logs alarm
francisco-videira-nhs Feb 25, 2026
578fd4f
Fix letter queue alarm
francisco-videira-nhs Feb 26, 2026
f35a34b
Peer review comp tests
francisco-videira-nhs Feb 26, 2026
f3370af
Peer review platform; inline api gw alarms; name convention
francisco-videira-nhs Feb 26, 2026
be9acee
Add optional alarm trigger for PR env
francisco-videira-nhs Feb 26, 2026
5d56cbf
Move alarm toggle to account level
francisco-videira-nhs Feb 26, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions infrastructure/terraform/components/api/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ No requirements.
| <a name="input_core_environment"></a> [core\_environment](#input\_core\_environment) | Environment of Core | `string` | `"prod"` | no |
| <a name="input_default_tags"></a> [default\_tags](#input\_default\_tags) | A map of default tags to apply to all taggable resources within the component | `map(string)` | `{}` | no |
| <a name="input_disable_gateway_execute_endpoint"></a> [disable\_gateway\_execute\_endpoint](#input\_disable\_gateway\_execute\_endpoint) | Disable the execution endpoint for the API Gateway | `bool` | `true` | no |
| <a name="input_enable_alarms"></a> [enable\_alarms](#input\_enable\_alarms) | Enable CloudWatch alarms for this deployed environment | `bool` | `true` | no |
| <a name="input_enable_api_data_trace"></a> [enable\_api\_data\_trace](#input\_enable\_api\_data\_trace) | Enable API Gateway data trace logging | `bool` | `false` | no |
| <a name="input_enable_backups"></a> [enable\_backups](#input\_enable\_backups) | Enable backups | `bool` | `false` | no |
| <a name="input_enable_event_cache"></a> [enable\_event\_cache](#input\_enable\_event\_cache) | Enable caching of events to an S3 bucket | `bool` | `true` | no |
Expand Down Expand Up @@ -46,6 +47,10 @@ No requirements.
| <a name="module_amendment_event_transformer"></a> [amendment\_event\_transformer](#module\_amendment\_event\_transformer) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.29/terraform-lambda.zip | n/a |
| <a name="module_amendments_queue"></a> [amendments\_queue](#module\_amendments\_queue) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.24/terraform-sqs.zip | n/a |
| <a name="module_authorizer_lambda"></a> [authorizer\_lambda](#module\_authorizer\_lambda) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.29/terraform-lambda.zip | n/a |
| <a name="module_ddb_alarms_letter_queue"></a> [ddb\_alarms\_letter\_queue](#module\_ddb\_alarms\_letter\_queue) | ../../modules/alarms-ddb | n/a |
| <a name="module_ddb_alarms_letters"></a> [ddb\_alarms\_letters](#module\_ddb\_alarms\_letters) | ../../modules/alarms-ddb | n/a |
| <a name="module_ddb_alarms_mi"></a> [ddb\_alarms\_mi](#module\_ddb\_alarms\_mi) | ../../modules/alarms-ddb | n/a |
| <a name="module_ddb_alarms_suppliers"></a> [ddb\_alarms\_suppliers](#module\_ddb\_alarms\_suppliers) | ../../modules/alarms-ddb | n/a |
| <a name="module_domain_truststore"></a> [domain\_truststore](#module\_domain\_truststore) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.26/terraform-s3bucket.zip | n/a |
| <a name="module_eventpub"></a> [eventpub](#module\_eventpub) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.31/terraform-eventpub.zip | n/a |
| <a name="module_eventsub"></a> [eventsub](#module\_eventsub) | ../../modules/eventsub | n/a |
Expand All @@ -54,6 +59,7 @@ No requirements.
| <a name="module_get_letters"></a> [get\_letters](#module\_get\_letters) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.29/terraform-lambda.zip | n/a |
| <a name="module_get_status"></a> [get\_status](#module\_get\_status) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.29/terraform-lambda.zip | n/a |
| <a name="module_kms"></a> [kms](#module\_kms) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.26/terraform-kms.zip | n/a |
| <a name="module_lambda_alarms"></a> [lambda\_alarms](#module\_lambda\_alarms) | ../../modules/alarms-lambda | n/a |
| <a name="module_letter_status_updates_queue"></a> [letter\_status\_updates\_queue](#module\_letter\_status\_updates\_queue) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.24/terraform-sqs.zip | n/a |
| <a name="module_letter_updates_transformer"></a> [letter\_updates\_transformer](#module\_letter\_updates\_transformer) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.29/terraform-lambda.zip | n/a |
| <a name="module_logging_bucket"></a> [logging\_bucket](#module\_logging\_bucket) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.26/terraform-s3bucket.zip | n/a |
Expand All @@ -62,6 +68,7 @@ No requirements.
| <a name="module_post_letters"></a> [post\_letters](#module\_post\_letters) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.29/terraform-lambda.zip | n/a |
| <a name="module_post_mi"></a> [post\_mi](#module\_post\_mi) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.29/terraform-lambda.zip | n/a |
| <a name="module_s3bucket_test_letters"></a> [s3bucket\_test\_letters](#module\_s3bucket\_test\_letters) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.26/terraform-s3bucket.zip | n/a |
| <a name="module_sqs_alarms"></a> [sqs\_alarms](#module\_sqs\_alarms) | ../../modules/alarms-sqs | n/a |
| <a name="module_sqs_letter_updates"></a> [sqs\_letter\_updates](#module\_sqs\_letter\_updates) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.26/terraform-sqs.zip | n/a |
| <a name="module_sqs_supplier_allocator"></a> [sqs\_supplier\_allocator](#module\_sqs\_supplier\_allocator) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.26/terraform-sqs.zip | n/a |
| <a name="module_supplier_allocator"></a> [supplier\_allocator](#module\_supplier\_allocator) | https://github.com/NHSDigital/nhs-notify-shared-modules/releases/download/v2.0.29/terraform-lambda.zip | n/a |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
resource "aws_cloudwatch_metric_alarm" "apigw_five_xx" {
count = local.alarms_enabled ? 1 : 0

alarm_name = "${local.csi}-apigw-5xx"
alarm_description = "RELIABILITY: API Gateway 5xx responses"

namespace = "AWS/ApiGateway"
metric_name = "5XXError"
statistic = "Sum"
period = 60

evaluation_periods = 1
threshold = 0
comparison_operator = "GreaterThanThreshold"
treat_missing_data = "notBreaching"

dimensions = local.apigw_alarm_dimensions

actions_enabled = false
alarm_actions = []
ok_actions = []
insufficient_data_actions = []
tags = local.default_tags
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
resource "aws_cloudwatch_metric_alarm" "apigw_latency_anomaly" {
count = local.alarms_enabled ? 1 : 0

alarm_name = "${local.csi}-apigw-latency-anomaly"
alarm_description = "RELIABILITY: API Gateway latency anomaly"
comparison_operator = "GreaterThanUpperThreshold"
evaluation_periods = 5
datapoints_to_alarm = 3
threshold_metric_id = "ad1"
treat_missing_data = "notBreaching"

actions_enabled = false
alarm_actions = []
ok_actions = []
insufficient_data_actions = []
tags = local.default_tags

metric_query {
id = "m1"
metric {
metric_name = "Latency"
namespace = "AWS/ApiGateway"
stat = "Average"
period = 60
dimensions = local.apigw_alarm_dimensions
}
return_data = true
}

metric_query {
id = "ad1"
expression = "ANOMALY_DETECTION_BAND(m1, 2)"
label = "Latency (expected)"
return_data = true
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
resource "aws_cloudwatch_metric_alarm" "apigw_latency_threshold" {
count = local.alarms_enabled ? 1 : 0

alarm_name = "${local.csi}-apigw-latency-threshold"
alarm_description = "RELIABILITY: API Gateway latency above threshold"

namespace = "AWS/ApiGateway"
metric_name = "Latency"
statistic = "Average"
period = 60

evaluation_periods = 5
threshold = 29000
comparison_operator = "GreaterThanThreshold"
treat_missing_data = "notBreaching"

dimensions = local.apigw_alarm_dimensions

actions_enabled = false
alarm_actions = []
ok_actions = []
insufficient_data_actions = []
tags = local.default_tags
}
32 changes: 32 additions & 0 deletions infrastructure/terraform/components/api/locals_alarms.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
locals {
alarms_enabled = var.enable_alarms

apigw_alarm_dimensions = {
ApiName = aws_api_gateway_rest_api.main.name
Stage = aws_api_gateway_stage.main.stage_name
}

lambda_alarm_targets = {
authorizer_lambda = module.authorizer_lambda.function_name
get_letter = module.get_letter.function_name
get_letters = module.get_letters.function_name
get_letter_data = module.get_letter_data.function_name
get_status = module.get_status.function_name
patch_letter = module.patch_letter.function_name
post_letters = module.post_letters.function_name
post_mi = module.post_mi.function_name
update_letter_queue = module.update_letter_queue.function_name
upsert_letter = module.upsert_letter.function_name
amendment_event_transformer = module.amendment_event_transformer.function_name
letter_updates_transformer = module.letter_updates_transformer.function_name
mi_updates_transformer = module.mi_updates_transformer.function_name
supplier_allocator = module.supplier_allocator.function_name
}

sqs_alarm_targets = {
sqs_letter_updates = module.sqs_letter_updates.sqs_queue_name
amendments_queue = module.amendments_queue.sqs_queue_name
letter_status_updates_queue = module.letter_status_updates_queue.sqs_queue_name
sqs_supplier_allocator = module.sqs_supplier_allocator.sqs_queue_name
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ module "authorizer_lambda" {

lambda_env_vars = {
CLOUDWATCH_NAMESPACE = "/aws/api-gateway/supplier/alarms",
CLIENT_CERTIFICATE_EXPIRATION_ALERT_DAYS = 14,
CLIENT_CERTIFICATE_EXPIRATION_ALERT_DAYS = 30,
APIM_SUPPLIER_ID_HEADER = "NHSD-Supplier-ID",
SUPPLIERS_TABLE_NAME = aws_dynamodb_table.suppliers.name
}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
module "ddb_alarms_letter_queue" {
count = local.alarms_enabled ? 1 : 0
source = "../../modules/alarms-ddb"
alarm_prefix = local.csi
table_name = aws_dynamodb_table.letter_queue.name
tags = local.default_tags
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
module "ddb_alarms_letters" {
count = local.alarms_enabled ? 1 : 0
source = "../../modules/alarms-ddb"
alarm_prefix = local.csi
table_name = aws_dynamodb_table.letters.name
tags = local.default_tags
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
module "ddb_alarms_mi" {
count = local.alarms_enabled ? 1 : 0
source = "../../modules/alarms-ddb"
alarm_prefix = local.csi
table_name = aws_dynamodb_table.mi.name
tags = local.default_tags
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
module "ddb_alarms_suppliers" {
count = local.alarms_enabled ? 1 : 0
source = "../../modules/alarms-ddb"
alarm_prefix = local.csi
table_name = aws_dynamodb_table.suppliers.name
tags = local.default_tags
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
module "lambda_alarms" {
for_each = local.alarms_enabled ? local.lambda_alarm_targets : {}
source = "../../modules/alarms-lambda"

alarm_prefix = local.csi
function_name = each.value
log_group_name = "/aws/lambda/${each.value}"
tags = local.default_tags
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
module "sqs_alarms" {
for_each = local.alarms_enabled ? local.sqs_alarm_targets : {}
source = "../../modules/alarms-sqs"

alarm_prefix = local.csi
queue_name = each.value
dlq_queue_name = replace(each.value, "-queue", "-dlq")
tags = local.default_tags
}
6 changes: 6 additions & 0 deletions infrastructure/terraform/components/api/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -193,3 +193,9 @@ variable "enable_api_data_trace" {
description = "Enable API Gateway data trace logging"
default = false
}

variable "enable_alarms" {
type = bool
description = "Enable CloudWatch alarms for this deployed environment"
default = true
}
29 changes: 29 additions & 0 deletions infrastructure/terraform/modules/alarms-ddb/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
<!-- BEGIN_TF_DOCS -->
<!-- markdownlint-disable -->
<!-- vale off -->

## Requirements

| Name | Version |
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >= 1.9.0 |
## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_alarm_prefix"></a> [alarm\_prefix](#input\_alarm\_prefix) | n/a | `string` | n/a | yes |
| <a name="input_evaluation_periods"></a> [evaluation\_periods](#input\_evaluation\_periods) | n/a | `number` | `1` | no |
| <a name="input_period_seconds"></a> [period\_seconds](#input\_period\_seconds) | n/a | `number` | `60` | no |
| <a name="input_read_throttle_threshold"></a> [read\_throttle\_threshold](#input\_read\_throttle\_threshold) | n/a | `number` | `0` | no |
| <a name="input_table_name"></a> [table\_name](#input\_table\_name) | n/a | `string` | n/a | yes |
| <a name="input_tags"></a> [tags](#input\_tags) | n/a | `map(string)` | `{}` | no |
| <a name="input_write_throttle_threshold"></a> [write\_throttle\_threshold](#input\_write\_throttle\_threshold) | n/a | `number` | `0` | no |
## Modules

No modules.
## Outputs

No outputs.
<!-- vale on -->
<!-- markdownlint-enable -->
<!-- END_TF_DOCS -->
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
resource "aws_cloudwatch_metric_alarm" "read_throttle" {
alarm_name = "${var.alarm_prefix}-ddb-${var.table_name}-read-throttle"
alarm_description = "RELIABILITY: DynamoDB read throttling"

namespace = "AWS/DynamoDB"
metric_name = "ReadThrottleEvents"
statistic = "Sum"
period = var.period_seconds

evaluation_periods = var.evaluation_periods
threshold = var.read_throttle_threshold
comparison_operator = "GreaterThanThreshold"
treat_missing_data = "notBreaching"

dimensions = { TableName = var.table_name }

actions_enabled = false
alarm_actions = []
ok_actions = []
insufficient_data_actions = []
tags = var.tags
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
resource "aws_cloudwatch_metric_alarm" "write_throttle" {
alarm_name = "${var.alarm_prefix}-ddb-${var.table_name}-write-throttle"
alarm_description = "RELIABILITY: DynamoDB write throttling"

namespace = "AWS/DynamoDB"
metric_name = "WriteThrottleEvents"
statistic = "Sum"
period = var.period_seconds

evaluation_periods = var.evaluation_periods
threshold = var.write_throttle_threshold
comparison_operator = "GreaterThanThreshold"
treat_missing_data = "notBreaching"

dimensions = { TableName = var.table_name }

actions_enabled = false
alarm_actions = []
ok_actions = []
insufficient_data_actions = []
tags = var.tags
}
32 changes: 32 additions & 0 deletions infrastructure/terraform/modules/alarms-ddb/variables.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
variable "alarm_prefix" {
type = string
}

variable "table_name" {
type = string
}

variable "tags" {
type = map(string)
default = {}
}

variable "period_seconds" {
type = number
default = 60
}

variable "evaluation_periods" {
type = number
default = 1
}

variable "read_throttle_threshold" {
type = number
default = 0
}

variable "write_throttle_threshold" {
type = number
default = 0
}
9 changes: 9 additions & 0 deletions infrastructure/terraform/modules/alarms-ddb/versions.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@

terraform {
required_providers {
aws = {
source = "hashicorp/aws"
}
}
required_version = ">= 1.9.0"
}
36 changes: 36 additions & 0 deletions infrastructure/terraform/modules/alarms-lambda/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
<!-- BEGIN_TF_DOCS -->
<!-- markdownlint-disable -->
<!-- vale off -->

## Requirements

| Name | Version |
|------|---------|
| <a name="requirement_terraform"></a> [terraform](#requirement\_terraform) | >= 1.9.0 |
## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| <a name="input_alarm_prefix"></a> [alarm\_prefix](#input\_alarm\_prefix) | n/a | `string` | n/a | yes |
| <a name="input_enable_error_log_metric"></a> [enable\_error\_log\_metric](#input\_enable\_error\_log\_metric) | n/a | `bool` | `true` | no |
| <a name="input_error_log_evaluation_periods"></a> [error\_log\_evaluation\_periods](#input\_error\_log\_evaluation\_periods) | n/a | `number` | `1` | no |
| <a name="input_error_log_metric_filter_pattern"></a> [error\_log\_metric\_filter\_pattern](#input\_error\_log\_metric\_filter\_pattern) | n/a | `string` | `"{ $.level = \"ERROR\" || $.level = \"FATAL\" }"` | no |
| <a name="input_error_log_metric_name_prefix"></a> [error\_log\_metric\_name\_prefix](#input\_error\_log\_metric\_name\_prefix) | n/a | `string` | `"LambdaErrorLogs-"` | no |
| <a name="input_error_log_metric_namespace"></a> [error\_log\_metric\_namespace](#input\_error\_log\_metric\_namespace) | n/a | `string` | `"Custom/LambdaErrorLogs"` | no |
| <a name="input_error_log_threshold"></a> [error\_log\_threshold](#input\_error\_log\_threshold) | n/a | `number` | `0` | no |
| <a name="input_errors_threshold"></a> [errors\_threshold](#input\_errors\_threshold) | n/a | `number` | `0` | no |
| <a name="input_evaluation_periods"></a> [evaluation\_periods](#input\_evaluation\_periods) | n/a | `number` | `1` | no |
| <a name="input_function_name"></a> [function\_name](#input\_function\_name) | n/a | `string` | n/a | yes |
| <a name="input_log_group_name"></a> [log\_group\_name](#input\_log\_group\_name) | n/a | `string` | `""` | no |
| <a name="input_period_seconds"></a> [period\_seconds](#input\_period\_seconds) | n/a | `number` | `300` | no |
| <a name="input_tags"></a> [tags](#input\_tags) | n/a | `map(string)` | `{}` | no |
| <a name="input_throttles_threshold"></a> [throttles\_threshold](#input\_throttles\_threshold) | n/a | `number` | `0` | no |
## Modules

No modules.
## Outputs

No outputs.
<!-- vale on -->
<!-- markdownlint-enable -->
<!-- END_TF_DOCS -->
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
resource "aws_cloudwatch_log_metric_filter" "error_logs" {
count = var.enable_error_log_metric ? 1 : 0
name = "${var.alarm_prefix}-lambda-${var.function_name}-error-logs"
log_group_name = var.log_group_name
pattern = var.error_log_metric_filter_pattern

metric_transformation {
name = "${var.error_log_metric_name_prefix}${var.function_name}"
namespace = var.error_log_metric_namespace
value = "1"
}
}
Loading
Loading