Skip to content

Commit 3fcc647

Browse files
committed
Migrate infrastructure from MongoDB EC2 to managed services with complete Terraform configuration
- Replace MongoDB EC2 instance with managed DocumentDB cluster - Add ElastiCache Redis cluster for caching and session storage - Implement complete Terraform infrastructure as code with modular design - Update domain configuration from prettyclear.com to sandbox-prettyclear.com - Add comprehensive infrastructure documentation and Q&A guide - Configure multi-environment support with backend configs for dev/staging/prod - Add monitoring, security groups, and networking modules - Update ECS environment variables to include Redis URI and DocumentDB endpoint
1 parent ce33511 commit 3fcc647

19 files changed

Lines changed: 1836 additions & 29 deletions

docs/Infrastructure.md

Lines changed: 113 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
* **AWS Region**: `us-east-1`
66
* **Environments**: `dev`, `staging`, `prod`
7-
* **Domain**: `sf-website-<env>.prettyclear.com`
7+
* **Domain**: `sf-website-<env>.sandbox-prettyclear.com`
88
* **Structure**: Modular Terraform setup for multi-environment support
99
* **Resource Tags** (applied to all resources):
1010

@@ -158,7 +158,8 @@
158158
* ECR (container image repository)
159159
* ALB (for ingress)
160160
* CloudWatch (for logs & metrics)
161-
* MongoDB EC2 instance (database)
161+
* DocumentDB Cluster (database)
162+
* ElastiCache Redis (caching)
162163
* S3 Attachments Bucket
163164
* **Service Name**: `sf-website`
164165
* **Container Image**: Built from project Dockerfile and stored in ECR
@@ -167,7 +168,8 @@
167168
* **Auto-scaling**: Based on CPU usage (target: 70%)
168169
* **Environment Variables**:
169170
* `NODE_ENV=production`
170-
* `APOS_MONGODB_URI=mongodb://<mongodb-hostname>:27017/apostrophe`
171+
* `APOS_MONGODB_URI=mongodb://<documentdb-cluster-endpoint>:27017/apostrophe`
172+
* `REDIS_URI=redis://<elasticache-cluster-endpoint>:6379`
171173
* `SESSION_SECRET=<from parameter store>`
172174
* `APOS_S3_BUCKET=sf-website-s3-attachments-<env>`
173175
* `APOS_S3_REGION=us-east-1`
@@ -198,7 +200,7 @@
198200
* ACM (for SSL certificates)
199201
* **Type**: HTTPS-only
200202
* **SSL**: Via AWS ACM
201-
* **Domain**: `sf-website-<env>.prettyclear.com`
203+
* **Domain**: `sf-website-<env>.sandbox-prettyclear.com`
202204

203205
---
204206

@@ -219,7 +221,7 @@
219221
* ECS Cluster (via APOS_CDN_URL environment variable)
220222
* **Origin**: S3 bucket `sf-website-s3-attachments-<env>`
221223
* **Access**: Origin access identity (OAI) to restrict direct S3 access
222-
* **Custom domain**: `sf-website-media-<env>.prettyclear.com`
224+
* **Custom domain**: `sf-website-media-<env>.sandbox-prettyclear.com`
223225
* **SSL Certificate**: Managed through AWS ACM
224226
* **Cache Behavior**:
225227
* Default TTL: 86400 seconds (1 day)
@@ -247,51 +249,133 @@
247249
* **Resource Integration**:
248250
* ECS Apostrophe Cluster
249251
* ALB
252+
* DocumentDB Cluster
253+
* ElastiCache Redis
250254
* Slack (for alerts)
251255
* **Features**:
252256
* ECS logs and detailed metrics
253257
* ALB metrics (e.g., 5xx, latency)
258+
* DocumentDB cluster and instance metrics
259+
* ElastiCache Redis performance metrics
254260
* CloudWatch alarms for key metrics
255261
* **Alerts**: Sent to Slack
256262
* **Log retention**: 90 days
257263

258264
---
259265

260-
### 📄 MongoDB on EC2
266+
### 🔴 Amazon ElastiCache (Redis)
261267

262-
* **MongoDB**:
263-
* **Instance Name**: `sf-website-mongodb-<env>`
264-
* **Purpose**: Primary data store for ApostropheCMS
268+
* **ElastiCache Redis Cluster**:
269+
* **Cluster Name**: `sf-website-redis-<env>`
270+
* **Purpose**: Managed Redis service for session storage and application caching
265271
* **Resource Tags**:
266-
* `Name: sf-website-mongodb-<env>`
272+
* `Name: sf-website-redis-<env>`
267273
* `Project: Website`
268274
* `CostCenter: Website`
269275
* `Environment: <environment>`
270276
* `Owner: peter.ovchyn`
271277
* **Resource Integration**:
272278
* ECS Apostrophe Cluster
273-
* AWS Backup service
274279
* CloudWatch (for monitoring)
275-
* Parameter Store (for credentials)
276-
* **Instance Type**: t3.medium (2 vCPU, 4GB RAM)
277-
* **Storage**: 100GB gp3 EBS volume with 3000 IOPS
278-
* **AMI**: Amazon Linux 2
279-
* **Deployment**: Single EC2 instance in private subnet
280+
* Cache Subnet Group (for networking)
281+
* **Engine Version**: Redis 7.0 (latest stable)
282+
* **Node Configuration**:
283+
* **Node Type**: `cache.t3.micro` (1 vCPU, 0.5GB RAM) for dev/staging
284+
* **Node Type**: `cache.t3.small` (2 vCPU, 1.5GB RAM) for production
285+
* **Number of Nodes**: 1 (single node for simplicity)
286+
* **Port**: 6379 (Redis standard)
287+
* **Deployment**:
288+
* Deployed in private subnets
289+
* Cache Subnet Group spans both availability zones
280290
* **Security**:
281-
* No public IP assigned
282-
* Security group allows ingress only from ECS service security group on port 27017
283-
* SSH access via Session Manager (no direct SSH allowed)
284-
* **Authentication**: Username/password authentication enabled
285-
* Credentials stored in AWS Parameter Store
291+
* VPC security group restricting access to ECS service only
292+
* No public access
293+
* Transit encryption enabled
294+
* Auth token enabled for authentication
295+
* **Authentication**:
296+
* Auth token stored in AWS Parameter Store
286297
* Referenced in ECS task environment variables
287298
* **Backup Strategy**:
288-
* Daily automated snapshots of EBS volume
289-
* Retention period: 7 daily, 4 weekly
290-
* Snapshot automation via AWS Backup service
299+
* **Automatic Backups**:
300+
* Daily snapshots enabled
301+
* Retention period: 5 days
302+
* Backup window: 02:00-03:00 UTC
303+
* **Monitoring**:
304+
* CloudWatch metrics for cluster performance
305+
* CloudWatch alarms for:
306+
* CPU utilization > 80%
307+
* Memory usage > 80%
308+
* Connection count thresholds
309+
* Cache hit ratio < 80%
310+
* **High Availability**:
311+
* Automatic failover enabled
312+
* Multi-AZ deployment for production environment
313+
* Automatic minor version updates during maintenance window
314+
* **Network Configuration**:
315+
* **Cache Subnet Group**: `sf-website-redis-subnet-group-<env>`
316+
* **Security Group**: `sf-website-redis-sg-<env>`
317+
* **Endpoint**: Primary endpoint for read/write operations
318+
319+
---
320+
321+
### 📄 Amazon DocumentDB
322+
323+
* **DocumentDB Cluster**:
324+
* **Cluster Name**: `sf-website-documentdb-<env>`
325+
* **Purpose**: Managed MongoDB-compatible database service for ApostropheCMS
326+
* **Resource Tags**:
327+
* `Name: sf-website-documentdb-<env>`
328+
* `Project: Website`
329+
* `CostCenter: Website`
330+
* `Environment: <environment>`
331+
* `Owner: peter.ovchyn`
332+
* **Resource Integration**:
333+
* ECS Apostrophe Cluster
334+
* CloudWatch (for monitoring)
335+
* Parameter Store (for credentials)
336+
* DB Subnet Group (for networking)
337+
* **Engine Version**: 4.0.0 (MongoDB compatible)
338+
* **Cluster Configuration**:
339+
* **Primary Instance**: `db.t3.medium` (2 vCPU, 4GB RAM)
340+
* **Replica Instances**: 1 replica for high availability
341+
* **Storage**: Encrypted with AWS managed keys
342+
* **Port**: 27017 (MongoDB standard)
343+
* **Deployment**:
344+
* Multi-AZ deployment across private subnets
345+
* DB Subnet Group spans both availability zones
346+
* **Security**:
347+
* VPC security group restricting access to ECS service only
348+
* TLS encryption in transit required
349+
* No public access
350+
* Authentication required
351+
* **Authentication**:
352+
* Master username/password stored in AWS Parameter Store
353+
* Referenced in ECS task environment variables via Parameter Store
354+
* Database: `apostrophe`
355+
* **Backup Strategy**:
356+
* **Automated Backups**:
357+
* Backup retention period: 7 days
358+
* Backup window: 03:00-04:00 UTC
359+
* Point-in-time recovery enabled
360+
* **Manual Snapshots**: Available for major releases
291361
* **Monitoring**:
292-
* CloudWatch agent for system metrics
293-
* Custom MongoDB metrics published to CloudWatch
294-
* Alerts for disk usage, connections, and query performance
362+
* CloudWatch metrics for cluster and instance performance
363+
* Enhanced monitoring enabled (60-second granularity)
364+
* CloudWatch alarms for:
365+
* CPU utilization > 80%
366+
* Database connections > 80% of max
367+
* Free storage < 20%
368+
* Read/Write latency thresholds
369+
* **Parameter Group**:
370+
* Custom parameter group for performance optimization
371+
* TLS enforcement enabled
372+
* Audit logging enabled for security compliance
295373
* **High Availability**:
296-
* Configured for future upgrade to a replica set
297-
* Placeholder DNS record for future replica nodes
374+
* Multi-AZ replica instance for automatic failover
375+
* Cross-AZ backup replication
376+
* Automatic minor version updates during maintenance window
377+
* **Network Configuration**:
378+
* **DB Subnet Group**: `sf-website-documentdb-subnet-group-<env>`
379+
* **Security Group**: `sf-website-documentdb-sg-<env>`
380+
* **Endpoint**: Cluster endpoint for write operations
381+
* **Reader Endpoint**: Available for read-only operations

docs/infrastructureQNA.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Infrastructure Q&A for Terraform Implementation
2+
3+
## Questions and Answers
4+
5+
### Q1: Certificate ARNs
6+
**Question**: What are the actual ARN values for your existing SSL certificates?
7+
- Main app certificates: `sf-website-{env}.sandbox-prettyclear.com`
8+
- Media certificates: `sf-website-media-{env}.sandbox-prettyclear.com`
9+
10+
**Answer**: Wildcard certificate `*.sandbox-prettyclear.com` covers all subdomains
11+
**ARN**: `arn:aws:acm:us-east-1:548271326349:certificate/7e11016f-f90e-4800-972d-622bf1a82948`
12+
13+
---
14+
15+
### Q2: Route 53 Hosted Zone ID
16+
**Question**: What's the hosted zone ID for `sandbox-prettyclear.com`?
17+
18+
**Answer**: [Skipped for now - will address later]
19+
20+
---
21+
22+
### Q3: Parameter Store Secrets
23+
**Question**: Should I generate these automatically or do you have specific values?
24+
- DocumentDB master username/password
25+
- SESSION_SECRET
26+
- Any other app secrets?
27+
28+
**Answer**:
29+
- **DocumentDB master username/password**: Store in tfvars files
30+
- **SESSION_SECRET**: User will provide specific value in tfvars
31+
- **Other secrets**: Based on docker-compose.yml:
32+
- **REDIS_URI**: Will be auto-generated (ElastiCache endpoint)
33+
- **BASE_URL**: Will be auto-generated from ALB domain
34+
- **SERVICE_ACCOUNT_PRIVATE_KEY**: User will provide if using Google Cloud Storage
35+
- **NODE_ENV**: Will be set to 'production'
36+
37+
---
38+
39+
### Q4: Deployment Scope
40+
**Question**: Should I create Terraform to deploy all three environments at once, or one environment at a time (which one first)?
41+
42+
**Answer**: Terraform script should create 1 environment at a time. Environment should be specified via tfvars file.
43+
44+
---
45+
46+
### Q5: Remote State
47+
**Question**: Do you want S3 backend for Terraform state storage?
48+
49+
**Answer**: Yes, use S3 bucket for Terraform state storage with DynamoDB for state locking.
50+
51+
---
52+
53+
### Q6: CI/CD Integration
54+
**Question**: Do you need IAM roles for GitHub Actions to deploy?
55+
56+
**Answer**: Yes, include all 3:
57+
- IAM role that GitHub Actions can assume
58+
- Permissions for Terraform operations (creating/updating resources)
59+
- ECR permissions for pushing Docker images
60+
61+
---
62+
63+
### Q7: CloudWatch Alerts
64+
**Question**: For notifications, do you have Slack webhook URLs, or should I create SNS topics instead?
65+
66+
**Answer**: Slack webhook URLs - should be provided in tfvars file

0 commit comments

Comments
 (0)