diff --git a/docs/DEPLOYMENT_GUIDE.md b/docs/DEPLOYMENT_GUIDE.md index 5f16e13..bb943ea 100644 --- a/docs/DEPLOYMENT_GUIDE.md +++ b/docs/DEPLOYMENT_GUIDE.md @@ -40,9 +40,10 @@ See [Custom Data Commons: When do I need a custom instance?](https://docs.dataco This solution provisions a complete data exploration platform: - **Data Commons Accelerator Web Application**: Interactive interface for data exploration and visualization +- **GKE Cluster**: A new GKE cluster is created automatically +- **VPC and networking**: Network, subnet, Cloud Router, and Cloud NAT are set up automatically - **CloudSQL MySQL Database**: Persistent storage for datasets and metadata (with optional high availability) - **Cloud Storage Bucket**: Scalable storage for custom data imports -- **Kubernetes Workload**: Application deployed to your existing GKE cluster (not created by this solution) with Workload Identity authentication - **Service Account**: Secure identity for accessing cloud resources No additional infrastructure setup is required—everything integrates with your existing GCP project. @@ -57,12 +58,14 @@ This Marketplace solution deploys the following GCP resources: | Component | Description | |-----------|-------------| -| **GKE Workload** | Data Commons application pods running in your existing cluster (namespace matches your deployment name) | -| **CloudSQL MySQL** | Managed database with private IP (via VPC Private Service Access) for dataset storage | +| **GKE Cluster** | A new GKE cluster is created for you during deployment | +| **VPC and networking** | Network, subnet, Cloud Router, and Cloud NAT — all created automatically | +| **GKE Workload** | Data Commons application pods deployed to the cluster (namespace matches your deployment name) | +| **CloudSQL MySQL** | Managed database with private connectivity for dataset storage | | **GCS Bucket** | Cloud Storage for custom data imports | -| **Service Account** | Workload Identity-enabled SA for secure access to CloudSQL, GCS, and Maps API | -| **db-init Job** | One-time Kubernetes Job that initializes the database schema | -| **db-sync CronJob** | Recurring job that syncs custom data from GCS to CloudSQL (every 3 hours) | +| **Service Account** | Secure identity for accessing CloudSQL, GCS, and Maps API | +| **db-init Job** | One-time job that initializes the database schema | +| **db-sync CronJob** | Recurring job that syncs custom data from Cloud Storage to the database (every 3 hours) | For details on Data Commons architecture and how the application works internally, see [Custom Data Commons documentation](https://docs.datacommons.org/custom_dc/). @@ -88,9 +91,9 @@ User Browser GCS Bucket (Custom Data) **Deployment Workflow:** 1. You fill out the GCP Marketplace form with your preferences -2. Infrastructure Manager uses Terraform to provision CloudSQL, create GCS bucket, and bind service account -3. Helm deploys the Data Commons application to your GKE cluster -4. All resources are linked via Workload Identity and VPC Private Service Access +2. Infrastructure Manager uses Terraform to provision the cluster, database, Cloud Storage bucket, and service account +3. The Data Commons application is deployed to your cluster +4. All resources are linked via Workload Identity and private database connectivity --- @@ -100,13 +103,13 @@ Before deploying Data Commons Accelerator, ensure you have the following: ### Infrastructure Requirements -| Requirement | Details | -|-------------|---------| -| **GKE Cluster** | Kubernetes 1.27+, Standard or Autopilot cluster | -| **Workload Identity** | Must be enabled (default on GKE 1.27+) | -| **VPC Network** | VPC with Private Service Access configured (PSA could be created via Marketplace form if it's not existing) | +No existing infrastructure is required. The solution creates all necessary resources automatically, including: -**GKE Cluster:** If you need to create a cluster first, use the [GCP Kubernetes Engine creation form](https://console.cloud.google.com/kubernetes/add?). +- GKE cluster +- VPC network with subnet, Cloud Router, and Cloud NAT +- CloudSQL MySQL database with private connectivity +- Cloud Storage bucket +- Service accounts and IAM bindings ### Required IAM Roles @@ -129,12 +132,13 @@ The deployment automatically creates a Service Account with the following roles. | Role | Purpose | |------|---------| -| roles/container.developer | Application workload management on GKE | +| roles/container.admin | GKE cluster and workload management | | roles/storage.admin | Read/write access to the GCS data bucket | | roles/cloudsql.admin | Database instance management | | roles/config.agent | Infrastructure Manager operations | | roles/iam.infrastructureAdmin | Infrastructure resource management | | roles/iam.serviceAccountAdmin | Service account lifecycle management | +| roles/iam.serviceAccountUser | Act as service accounts for Workload Identity binding | | roles/serviceusage.serviceUsageAdmin | API enablement | | roles/serviceusage.apiKeysAdmin | API key management | | roles/resourcemanager.projectIamAdmin | IAM binding management | @@ -178,32 +182,14 @@ This section walks through deploying Data Commons Accelerator via GCP Marketplac ### Step 2: Complete the Deployment Configuration Form -The Marketplace will open a deployment configuration form organized into several sections: **Basic** (deployment name and project), **GKE** (cluster details), **CloudSQL** (database settings), **Cloud Storage** (bucket configuration), **API** (Data Commons API key), and **Application** (pod replicas and resource sizing). +The Marketplace will open a deployment configuration form. Enter your **Deployment Name** and select a **GCP Region** at the top, then configure **Application Settings** (resource tier and sample) and provide your **API Keys** (Data Commons API key). A new GKE cluster is created automatically. -Each field has built-in tooltips with detailed guidance—hover over or click the help icon next to any field for clarification. The form validates your inputs and shows clear error messages if anything is incorrect. +> [!TIP] +> Each field has built-in tooltips with detailed guidance—hover over or click the help icon next to any field for clarification. The form validates your inputs and shows clear error messages if anything is incorrect. For detailed descriptions of every form field, valid values, and tips, see [Marketplace Fields Reference](MARKETPLACE_FIELDS.md). -**Before you start, gather these from Prerequisites:** -- Your **GKE cluster name and location** -- Your **Data Commons API key** - -#### Private Service Access (PSA) - -The CloudSQL section of the form asks how to configure Private Service Access for database connectivity: - -| Option | When to Use | Configuration | -|--------|------------|----------------| -| **Create New PSA** | First deployment, no existing PSA | The form will create a /20 PSA range automatically | -| **Use Existing PSA** | PSA already configured, multiple deployments in same VPC | Provide your existing PSA range name | - -**To find your existing PSA range:** - -```bash -gcloud compute addresses list --global \ - --filter="purpose=VPC_PEERING AND network=YOUR_VPC_NAME" \ - --format="table(name,address,prefixLength,network)" -``` +**Before you start, have your Data Commons API key ready** (required for all deployments). ### Step 3: Review and Deploy @@ -213,14 +199,20 @@ Once you've completed all sections: 2. **Accept the terms** by checking the Terms checkbox 3. **Click the Deploy button** -Deployment takes approximately **10–15 minutes**. A progress indicator will appear. **Do not close the browser tab** during deployment. +Deployment takes approximately **20–30 minutes**. A progress indicator will appear. + +> [!WARNING] +> **Do not close the browser tab** during deployment. Closing it may interrupt the provisioning process. When the status shows **"Active"**, your deployment is complete. Proceed to the next section for accessing your application. ### Step 4: Access Your Deployment -All deployment outputs—resource names, connection strings, and commands—are available in: -**Infrastructure Manager** > **Deployments** > your deployment > **Outputs** tab. +> [!TIP] +> After deployment completes, useful commands and resource information are available in the deployment details: +> [Infrastructure Manager Deployments](https://console.cloud.google.com/infra-manager/deployments) > your deployment > **Outputs** tab. +> +> The Outputs tab contains ready-to-use commands for connecting to your cluster, port-forwarding, uploading data, and viewing logs — you can copy and run them directly. #### Quick Access via Cloud Shell (Recommended) @@ -230,9 +222,9 @@ The easiest way to access your deployment—no local tools needed: 2. Click **Run in Cloud Shell** 3. Run the port-forward command: ```bash - until kubectl port-forward -n NAMESPACE svc/datacommons 8080:8080; do echo "Port-forward crashed. Respawning..." >&2; sleep 1; done + until kubectl port-forward -n [NAMESPACE] svc/datacommons 8080:8080; do echo "Port-forward crashed. Respawning..." >&2; sleep 1; done ``` - (Replace `NAMESPACE` with your deployment name — the namespace matches your deployment name) + Replace `[NAMESPACE]` with your deployment name — the namespace matches your deployment name. 4. In the Cloud Shell toolbar, click **Web Preview** > **Preview on port 8080** #### Local Access via kubectl @@ -241,13 +233,13 @@ If you have `gcloud` and `kubectl` installed locally: 1. Configure kubectl: ```bash - gcloud container clusters get-credentials CLUSTER --location=LOCATION --project=PROJECT + gcloud container clusters get-credentials [CLUSTER_NAME] --location=[LOCATION] --project=[PROJECT_ID] ``` 2. Port-forward: ```bash - until kubectl port-forward -n NAMESPACE svc/datacommons 8080:8080; do echo "Port-forward crashed. Respawning..." >&2; sleep 1; done + until kubectl port-forward -n [NAMESPACE] svc/datacommons 8080:8080; do echo "Port-forward crashed. Respawning..." >&2; sleep 1; done ``` -3. Open http://localhost:8080 in your browser +3. Open in your browser #### Production Access @@ -282,9 +274,9 @@ For additional resources, refer to the official Data Commons documentation: 4. Review the Terraform execution log for provisioning errors **Common deployment errors:** -- **"GKE cluster not found"** — Verify cluster name and project match + - **"Insufficient permissions"** — Check [Required IAM Roles](#required-iam-roles) -- **"PSA not configured"** — See [PSA Issues](#private-service-access-issues) below +- **"PSA not configured"** — See [Private Connectivity Issues](#private-connectivity-issues) below ### Pod Status and Logs @@ -293,25 +285,26 @@ For additional resources, refer to the official Data Commons documentation: **Quick diagnostics from Cloud Shell:** ```bash -kubectl get pods -n NAMESPACE -kubectl describe pod POD_NAME -n NAMESPACE -kubectl logs -n NAMESPACE -l app.kubernetes.io/name=datacommons +kubectl get pods -n [NAMESPACE] +kubectl describe pod [POD_NAME] -n [NAMESPACE] +kubectl logs -n [NAMESPACE] -l app.kubernetes.io/name=datacommons ``` **Common pod issues:** + - **Pending** — Cluster needs more capacity -- **CrashLoopBackOff** — Check logs; often CloudSQL still initializing (wait 2–3 min) -- **ImagePullBackOff** — Verify `dc_api_key` is correct +- **CrashLoopBackOff** — Check logs; often the database is still initializing (wait 2–3 min) +- **ImagePullBackOff** — Verify your Data Commons API key is correct -### Private Service Access Issues +### Private Connectivity Issues ```bash -# Check existing PSA ranges +# Check existing private connectivity ranges gcloud compute addresses list --global --filter="purpose=VPC_PEERING" ``` -- **"Couldn't find free blocks"** — Use `psa_range_configuration: "create_16"` for more IPs -- **"Peering already exists"** — Use `psa_range_configuration: "existing"` with your existing range name +- **"Couldn't find free blocks"** — Select the /16 range option for more IP addresses +- **"Peering already exists"** — Select "Use my existing range" and enter your existing range name ### Port-Forward Connection Refused @@ -321,27 +314,26 @@ gcloud compute addresses list --global --filter="purpose=VPC_PEERING" E0206 portforward.go:424 "Unhandled Error" err="an error occurred forwarding 8080 -> 8080: connection refused" ``` -**Cause:** The port-forward connection drops when the application receives too many concurrent requests — for example, opening the `/explore` page which loads many data widgets simultaneously. It can also occur during pod startup while the application is initializing. +> [!NOTE] +> This is expected behavior, not a critical error. The connection drops when the application receives too many concurrent requests — for example, opening the `/explore` page which loads many data widgets simultaneously. It can also occur during pod startup while the application is initializing. **Fix:** + 1. If using the auto-retry loop (`until kubectl port-forward ...`), it will reconnect automatically 2. If running a single port-forward, simply re-run the command -3. If the error persists, check pod status: `kubectl get pods -n NAMESPACE` — ensure the pod is `Running` with `1/1` Ready - -### Error Loading GKE Cluster Location - -**Error:** The Marketplace form shows "Error loading GKE Cluster Location" when selecting a cluster. - -**Fix:** Refresh the browser page. This is a transient UI loading error. Your previously entered values may need to be re-entered. +3. If the error persists, check pod status: `kubectl get pods -n [NAMESPACE]` — ensure the pod is `Running` with `1/1` Ready --- ## Deleting Your Deployment -If you no longer need the Data Commons Accelerator, delete the deployment to stop incurring costs. +> [!IMPORTANT] +> Delete your deployment when no longer needed to stop incurring costs for the database, Kubernetes workloads, and Cloud Storage. 1. Go to [Google Cloud Console](https://console.cloud.google.com) 2. Search for "Solution deployments" 3. Find your deployment and click the **three-dot menu** (⋮) 4. Click **Delete** 5. Confirm the deletion + +**What gets deleted:** All resources provisioned by this deployment — the database, Cloud Storage bucket, Kubernetes workloads, service account, and IAM bindings. If the deployment created a new GKE cluster and network, those are deleted as well. diff --git a/docs/MARKETPLACE_FIELDS.md b/docs/MARKETPLACE_FIELDS.md index 3fd7c46..39bf769 100644 --- a/docs/MARKETPLACE_FIELDS.md +++ b/docs/MARKETPLACE_FIELDS.md @@ -4,387 +4,56 @@ ## Form Overview -The Marketplace deployment form is organized into five sections. Fields are listed below in the order they appear in the form. +The deployment form has **5 fields** across **2 sections**. A new GKE cluster is created automatically in your selected region. --- -## 1. Kubernetes Cluster +## Deployment Name -Configure your existing GKE cluster for Data Commons Accelerator deployment. - -| Field | Type | Required | Default | Description | -|-------|------|----------|---------|-------------| -| **GKE Cluster Location** | Location selector | Yes | — | Region or zone where your GKE cluster runs | -| **GKE Cluster Name** | Cluster selector | Yes | — | Your existing GKE cluster | - -### GKE Cluster Location - -**Description:** Select the region or zone where your existing GKE cluster is located. The Marketplace UI defaults to showing the "Regional" tab. - -**Details:** -- This must match the actual location of your GKE cluster -- CloudSQL will be automatically deployed in the same region -- Choose a region close to your users to minimize latency - -**Validation:** Must be a valid GCP region or zone. - ---- - -### GKE Cluster Name - -**Description:** Select your existing GKE cluster from the dropdown. The list shows only clusters in the location you selected above. - -**Details:** -- You must have an existing GKE cluster before deploying -- The cluster must have sufficient resources for Data Commons workloads -- Workload Identity must be enabled on the cluster for GCP service integration - -**Validation:** Must be an existing GKE cluster in the selected location. - ---- - -## 2. CloudSQL Database - -MySQL database configuration for Data Commons Accelerator metadata and state. CloudSQL is automatically deployed in the same region as your GKE cluster with private IP connectivity. - -| Field | Type | Required | Default | Description | -|-------|------|----------|---------|-------------| -| **CloudSQL Instance Tier** | Dropdown | Yes | `db-n1-standard-1` | Machine type for CloudSQL instance | -| **CloudSQL Disk Size (GB)** | Number | Yes | `20` | Initial disk size in GB | -| **Enable High Availability** | Boolean | Yes | `false` | Enable CloudSQL HA for production | -| **Private Service Access Configuration** | Dropdown | Yes | `create_20` | PSA range configuration | -| **Existing PSA Range Name** | String | Conditional | — | Name of existing PSA IP range | - -### CloudSQL Instance Tier - -**Description:** Machine tier for the CloudSQL MySQL instance with private IP connectivity. Choose based on workload requirements. - -**Default:** `db-n1-standard-1` (3.75GB RAM, 1 vCPU) - -**Valid Options:** - -| Option | Value | RAM | vCPU | Use Case | -|--------|-------|-----|------|----------| -| Micro - 0.6GB RAM (dev/test only) | `db-f1-micro` | 0.6GB | Shared | Development/testing only | -| Small - 1.7GB RAM, 1 vCPU | `db-g1-small` | 1.7GB | 1 | Small dev/test environments | -| Standard-1 - 3.75GB RAM, 1 vCPU | `db-n1-standard-1` | 3.75GB | 1 | Small production workloads | -| Standard-2 - 7.5GB RAM, 2 vCPU (recommended) | `db-n1-standard-2` | 7.5GB | 2 | **Recommended for production** | -| Standard-4 - 15GB RAM, 4 vCPU | `db-n1-standard-4` | 15GB | 4 | High-traffic production | -| Standard-8 - 30GB RAM, 8 vCPU | `db-n1-standard-8` | 30GB | 8 | Very high-traffic (100+ users) | - -**Recommendations:** -- **Dev/Test:** `db-f1-micro` or `db-g1-small` -- **Production:** `db-n1-standard-2` or higher -- **High-traffic:** `db-n1-standard-4` or `db-n1-standard-8` (100+ concurrent users) - -**Note:** Micro and Small tiers are NOT recommended for production use. +| Field | Default | Description | +|-------|---------|-------------| +| **Deployment Name** | — | A unique name for this deployment (2-18 characters). Used to generate resource names for the cluster, database, and storage. | --- -### CloudSQL Disk Size (GB) - -**Description:** Initial disk size in GB for the CloudSQL instance. CloudSQL will automatically grow the disk if needed. - -**Type:** Number -**Default:** `20` -**Minimum:** `10` (MySQL requirement) +## GCP Region -**Details:** -- Storage auto-grows when needed, so start conservatively -- You can adjust this later if needed -- Storage costs scale with size - -**Validation:** Must be a positive integer >= 10. -**Regex Pattern:** `^[1-9][0-9]*$` - -**Recommendations:** -- **Dev/Test:** 10-20 GB -- **Production:** 20-50 GB (depending on dataset size) -- **Large datasets:** 100+ GB +| Field | Default | Description | +|-------|---------|-------------| +| **GCP Region** | us-central1 | The GCP region where the cluster and all cloud resources will be created. | --- -### Enable High Availability - -**Description:** Enable CloudSQL high availability for production workloads. Creates a standby instance in a different zone for automatic failover. - -**Type:** Boolean -**Default:** `false` -**Recommended for production:** `true` - -**Details:** -- **Enabled:** Creates a standby replica in a different zone within the same region -- **Disabled:** Single instance (no automatic failover) -- HA provides automatic failover with minimal downtime -- HA increases cost (standby instance + synchronous replication) - -**Use Cases:** -- **Enable (true):** Production workloads requiring high availability and automatic failover -- **Disable (false):** Development/testing environments, cost-sensitive non-critical workloads - ---- - -### Private Service Access Configuration - -**Description:** Choose how to configure Private Service Access (PSA) for CloudSQL private IP connectivity. - -**Type:** Dropdown -**Default:** `create_20` (Create new /20 range) -**Required:** Yes - -**Valid Options:** - -| Option | Value | IP Count | Use Case | -|--------|-------|----------|----------| -| Create new /20 range (4,096 IPs - recommended) | `create_20` | 4,096 | **Recommended** - Standard production | -| Create new /24 range (256 IPs - dev/test) | `create_24` | 256 | Development/testing | -| Create new /16 range (65,536 IPs - large deployments) | `create_16` | 65,536 | Very large multi-deployment environments | -| Use my existing PSA range (enter name below) | `existing` | — | VPCs with existing PSA configuration | - -**⚠️ CRITICAL WARNING:** - -"Create new" options should **ONLY** be used on VPCs with **NO existing PSA configuration**. - -Creating a new range on a VPC with existing PSA will **REPLACE all existing reserved peering ranges**, which may disrupt connectivity for other services using Private Service Access (CloudSQL, Cloud Composer, etc.). - -**When to use each option:** - -1. **`create_20` (default)** - Use for first PSA deployment on a VPC, standard production workloads -2. **`create_24`** - Use for dev/test on a dedicated VPC with no other PSA services -3. **`create_16`** - Use for large environments with many CloudSQL instances planned -4. **`existing`** - **Use this if your VPC already has PSA configured** (requires "Existing PSA Range Name" below) - -**How to check if your VPC has existing PSA:** -```bash -gcloud compute addresses list --global --filter="purpose=VPC_PEERING" --format="value(name)" -``` - -If this returns results, select **"Use my existing PSA range"** and enter the name below. - ---- - -### Existing PSA Range Name - -**Description:** Name of your existing Private Service Access IP range. Required when you selected "Use my existing PSA range" above. - -**Type:** String -**Default:** — (empty) -**Required:** Only if `psa_range_configuration = existing` -**Placeholder:** `google-managed-services-default` - -**Details:** -- This field is ignored unless you selected "Use my existing PSA range" above -- Enter the exact name of your existing PSA IP allocation -- Common names: `google-managed-services-default`, `cloudsql-private-ip-range` - -**How to find your PSA range name:** -```bash -gcloud compute addresses list --global \ - --filter="purpose=VPC_PEERING" \ - --format="value(name)" -``` - -**Example values:** -- `google-managed-services-default` -- `cloudsql-private-ip-range` -- `private-service-access` - ---- +## Section 0: Application Settings -## 3. Cloud Storage - -GCS bucket configuration for Data Commons Accelerator data and custom datasets. - -| Field | Type | Required | Default | Description | -|-------|------|----------|---------|-------------| -| **GCS Bucket Location** | Dropdown | Yes | `US` | Storage location for your bucket | -| **GCS Storage Class** | Dropdown | Yes | `STANDARD` | Storage class (cost vs access speed) | - -### GCS Bucket Location - -**Description:** Storage location for your Data Commons bucket. Choose between multi-region (higher availability) or single-region (lower latency and cost). - -**Type:** Dropdown -**Default:** `US` (Multi-region) -**Required:** Yes - -**Valid Options:** - -**Multi-Region:** -| Option | Value | Description | -|--------|-------|-------------| -| US (Multi-region) | `US` | United States multi-region (highest availability) | -| EU (Multi-region) | `EU` | European Union multi-region | -| ASIA (Multi-region) | `ASIA` | Asia-Pacific multi-region | - -**Single-Region (Americas):** -| Option | Value | Description | -|--------|-------|-------------| -| us-central1 | `us-central1` | Iowa, USA | -| us-east1 | `us-east1` | South Carolina, USA | -| us-west1 | `us-west1` | Oregon, USA | - -**Single-Region (Europe):** -| Option | Value | Description | -|--------|-------|-------------| -| europe-west1 | `europe-west1` | Belgium | -| europe-west3 | `europe-west3` | Frankfurt, Germany | - -**Single-Region (Asia):** -| Option | Value | Description | -|--------|-------|-------------| -| asia-southeast1 | `asia-southeast1` | Singapore | -| asia-northeast1 | `asia-northeast1` | Tokyo, Japan | - -**💡 TIP:** Choose a location that matches your GKE cluster region to minimize latency and data transfer costs. - -**Examples:** -- GKE cluster in `us-central1` → choose `US` or `us-central1` -- GKE cluster in `europe-west3` → choose `EU` or `europe-west1` -- GKE cluster in `asia-southeast1` → choose `ASIA` or `asia-southeast1` - -**Cost & Performance Trade-offs:** -- **Multi-region:** Higher availability, higher cost, geo-redundant -- **Single-region:** Lower cost, lower latency within region, single-region redundancy - ---- - -### GCS Storage Class - -**Description:** Storage class determines cost and access speed. Choose based on how frequently you'll access your data. - -**Type:** Dropdown -**Default:** `STANDARD` -**Required:** Yes - -**Valid Options:** - -| Option | Value | Use Case | Access Pattern | -|--------|-------|----------|----------------| -| Standard - Frequent access | `STANDARD` | Production workloads | Frequent/real-time access | -| Nearline - Monthly access | `NEARLINE` | Infrequent access | ~1x/month | -| Coldline - Quarterly access | `COLDLINE` | Archival data | ~1x/quarter | -| Archive - Long-term storage | `ARCHIVE` | Long-term archival | Rarely accessed | - -**Recommendations:** -- **`STANDARD`** - Recommended for most Data Commons deployments (active datasets) -- **`NEARLINE`** - For backup datasets accessed occasionally -- **`COLDLINE`** - For compliance/archival datasets -- **`ARCHIVE`** - For long-term retention with rare access - -**Note:** Lower-tier classes have retrieval costs and minimum storage durations (e.g., Nearline = 30 days minimum). - ---- - -## 4. API Keys - -API keys required for Data Commons integration. - -| Field | Type | Required | Default | Description | -|-------|------|----------|---------|-------------| -| **Data Commons API Key** | String | Yes | — | API key for Data Commons API access | - -### Data Commons API Key - -**Description:** API key for accessing Data Commons APIs. Required for the application to function. - -**Type:** String (sensitive) -**Required:** Yes -**Placeholder:** `...` - -**Where to get it:** -Follow the instructions at: [https://docs.datacommons.org/custom_dc/quickstart.html#get-a-data-commons-api-key](https://docs.datacommons.org/custom_dc/quickstart.html#get-a-data-commons-api-key) - -**Format:** -Alphanumeric string with underscores and hyphens. - -**Security:** -- Stored as Kubernetes Secret in your cluster -- Rotatable after deployment by updating the secret - ---- - -## 5. Application Settings - -Data Commons Accelerator application configuration and resource allocation. - -| Field | Type | Required | Default | Description | -|-------|------|----------|---------|-------------| -| **Resource Tier** | Dropdown | Yes | `medium` | Resource allocation for pods | -| **Domain Template** | Dropdown | Yes | — | Pre-built domain configuration | -| **Application Replicas** | Number | Yes | `1` | Number of application replicas | +| Field | Default | Description | +|-------|---------|-------------| +| **Resource Tier** | Medium | Controls how much CPU and memory the application gets, and the size of the database | +| **Samples** | Health | Pre-built configuration optimized for your domain | ### Resource Tier -**Description:** Resource allocation tier for Data Commons Accelerator pods. Determines CPU and memory limits. +| Option | Memory | CPU | Replicas | Database size | High availability | +|--------|--------|-----|----------|---------------|-------------------| +| Small | 2 GB | 1 core | 1 | Standard | No | +| Medium (recommended) | 4 GB | 2 cores | 2 | Standard | No | +| Large | 8 GB | 4 cores | 3 | Large | Yes | -**Type:** Dropdown -**Default:** `medium` (recommended) -**Required:** Yes +### Samples -**Valid Options:** +| Option | Best for | +|--------|----------| +| Health | Health and epidemiology data | +| Education | School, enrollment, and outcomes data | +| Energy | Energy consumption and generation data | +| Custom | Custom data configuration with no pre-built datasets | -| Option | Value | RAM | CPU | Use Case | -|--------|-------|-----|-----|----------| -| Small - 2Gi RAM, 1 CPU | `small` | 2Gi | 1 | Development, small datasets | -| Medium - 4Gi RAM, 2 CPU (recommended) | `medium` | 4Gi | 2 | **Recommended for production** | -| Large - 8Gi RAM, 4 CPU | `large` | 8Gi | 4 | Large datasets, high concurrency | - -**Recommendations:** -- **`small`** - Dev/test only, small datasets -- **`medium`** - Standard production deployments (recommended starting point) -- **`large`** - High-traffic production, large datasets (>100GB), many concurrent users - -**Note:** Ensure your GKE cluster has sufficient capacity for the selected tier × number of replicas. +You can customize the sample after deployment. --- -### Domain Template - -**Description:** Select a pre-built Data Commons configuration optimized for a specific domain. Each domain includes curated datasets, statistical variables, and visualizations tailored to that subject area. Choose the domain that best matches your use case. - -**Type:** Dropdown -**Required:** Yes - -**Valid Options:** - -| Option | Value | Description | -|--------|-------|-------------| -| Education (education related data) | `education` | Pre-configured for education-related datasets (schools, enrollment, outcomes) | -| Health (health related data) | `health` | Pre-configured for health-related datasets (epidemiology, healthcare) | -| Energy (energy related data) | `energy` | Pre-configured for energy-related datasets (consumption, generation, emissions) | - -**Note:** You can customize any template after deployment. The template just provides a starting point. - ---- - -### Application Replicas (Advanced) - -**Description:** Number of Data Commons Accelerator application replicas for high availability and load distribution. - -**Type:** Number -**Default:** `1` -**Required:** Yes -**Level:** 1 (Advanced) - -**Valid Range:** 1-10 - -**Details:** -- **1 replica** - Single instance (no HA, suitable for dev/test) -- **2-3 replicas** - High availability with load balancing (recommended for production) -- **4+ replicas** - High-traffic production with automatic scaling - -**Recommendations:** -- **Dev/Test:** 1 replica -- **Production:** 2-3 replicas (for HA and rolling updates) -- **High-traffic:** 4+ replicas - -**Capacity Planning:** -- Total resource usage = resource_tier × app_replicas -- Example: `medium` tier (4Gi RAM, 2 CPU) × 3 replicas = 12Gi RAM, 6 CPU total -- Ensure your GKE cluster has sufficient capacity +## Section 1: API Keys -**Note:** Multiple replicas provide: -- High availability (if one pod fails, others continue serving) -- Load distribution across pods -- Zero-downtime rolling updates +| Field | Default | Description | +|-------|---------|-------------| +| **Data Commons API Key** | — | Required for the application to access Data Commons data. Get yours at [docs.datacommons.org](https://docs.datacommons.org/custom_dc/quickstart.html#get-a-data-commons-api-key). | diff --git a/docs/USER_GUIDE.md b/docs/USER_GUIDE.md index 1a8a174..d8d98d2 100644 --- a/docs/USER_GUIDE.md +++ b/docs/USER_GUIDE.md @@ -2,200 +2,267 @@ This guide explains how to access, configure, and use your Custom Data Commons instance deployed via the Google Cloud Marketplace. -# Getting Started +--- -To configure the User Interface of the landing page, upload your company logo and private data you need to login as custom Data Commons Administrator. This is applicable to all domains. +## Table of Contents -## Retrieve Administrator Credentials +1. [Getting Started](#getting-started) +2. [Data Commons for Education](#data-commons-for-education) +3. [Data Commons for Health](#data-commons-for-health) +4. [Data Commons for Energy](#data-commons-for-energy) +5. [Data Commons for Custom](#data-commons-for-custom) +6. [Known Limitations](#known-limitations) +7. [Request Support](#request-support) + +--- + +## Getting Started + +To configure the landing page, upload your company logo, and manage private data, you need to log in as the Data Commons Administrator. The steps below apply to all samples (Education, Health, Energy, Custom). + +> [!TIP] +> For deployment and initial setup instructions, see the [Deployment Guide](DEPLOYMENT_GUIDE.md). + +### Retrieve Administrator Credentials The application administrator password is not provided in the deployment outputs for security reasons. To retrieve your initial credentials: +> [!TIP] +> These commands are available pre-populated with your deployment values in the [Infrastructure Manager Deployments](https://console.cloud.google.com/infra-manager/deployments) > your deployment > **Outputs** tab. You can copy and run them directly. + 1. **Connect to your cluster** via Cloud Shell: - ```bash - gcloud container clusters get-credentials [CLUSTER_NAME] --region [REGION] + ```bash + gcloud container clusters get-credentials [CLUSTER_NAME] --region [REGION] ``` -2. **Run the secret retrieval command**: +2. **Run the secret retrieval command**: ```bash - kubectl get secret [DEPLOYMENT_NAME] -n [NAMESPACE] -o json | jq -r '.data | to_entries[] | "\\(.key): \\(.value | @base64d)"' + echo 'Admin Username:' && kubectl get secret datacommons -n [NAMESPACE] -o jsonpath='{.data.ADMIN_PANEL_USERNAME}' | base64 -d && echo && echo 'Admin Password:' && kubectl get secret datacommons -n [NAMESPACE] -o jsonpath='{.data.ADMIN_PANEL_PASSWORD}' | base64 -d && echo ``` -## Administrator Log In + Replace `[CLUSTER_NAME]`, `[REGION]`, and `[NAMESPACE]` with your deployment values. The namespace matches your deployment name. -1. Navigate to your application URL. -e.g. `https://education.example.com/`. -2. To access the **Admin Panel**, append `/admin` to the URL: -e.g.`https://education.example.com/admin/` -3. Enter the username and password generated in the previous step. -4. Depending on your choice during deployment you will be logged in as an admin for one of the custom data Commons templates for different domains (education, health, energy etc). +### Administrator Log In -# Data Commons for Education +1. Navigate to your application URL (e.g., `https://education.example.com/`) +2. To access the **Admin Panel**, append `/admin` to the URL (e.g., `https://education.example.com/admin/`) +3. Enter the username and password retrieved in the previous step +4. You will be logged in as an administrator for the sample selected during deployment (Education, Health, Energy, or Custom) -***Template: Student Recruitment Intelligence Center*** +### Upload Custom Data -## Overview +To populate the dashboard with your custom data: -The Education template combines your private applicant data with public demographic trends to help universities identify high-potential recruitment regions. +1. See [Prepare and load your own data](https://docs.datacommons.org/custom_dc/custom_data.html). +2. Ensure your data matches the required schema for your selected sample. You can download a sample CSV directly from the application **Data & Files** tab and fill in your data there. +3. Log in and navigate to the **Admin Panel**. +4. Go to **Data & Files** tab. +5. Locate the **Data Upload** section. +6. Click **Choose File**, select your CSV, and click **Upload**. + - *Success*: You will see a "Rows successfully uploaded" message. + - *Error*: The system will indicate specific line/column issues. + +> [!NOTE] +> Large CSV files may take a few moments to process. The dashboard refreshes automatically after upload. + +> [!TIP] +> **Trigger data sync immediately** — After uploading your CSV, the data is synced to the database by a CronJob (`datacommons-db-sync`) that runs every 3 hours. To avoid waiting, trigger it manually: +> +> **Via GKE Console (recommended):** +> 1. Go to **Kubernetes Engine** > **Workloads** +> 2. Filter by your namespace (matches your deployment name) +> 3. Click the **datacommons-db-sync** CronJob +> 4. Click **Run now** in the top toolbar +> 5. Monitor progress in the **Events** and **Logs** tabs +> +> **Via kubectl:** +> ```bash +> kubectl create job --from=cronjob/datacommons-db-sync manual-sync -n [NAMESPACE] +> kubectl logs -n [NAMESPACE] -l job-name=manual-sync -f +> ``` -## For Administrators +### Customize User Interface -### Prepare Custom Data +1. In the Admin Panel, navigate to **Theme Settings**. +2. **Organization Branding**: + - **Name:** Update the organization name displayed in the top bar. + - **Logo:** Upload a PNG image. -To populate the dashboard with your university's private data: +3. **Dashboard Text**: + - **Header Text:** Edit the main title (e.g., "Student Recruitment Intelligence Center"). + - **Hero Description:** Update the subtitle describing the purpose of your custom Data Commons. -1. See [Prepare and load your own data](https://docs.datacommons.org/custom_dc/custom_data.html). -2. Ensure your data matches the required schema for Education template. You can download a sample CSV directly from the application **Data & Files** tab and fill in your data there. +4. Click **Save Changes**. Updates are applied immediately. -### Upload Custom Data +### Data Security -To populate the dashboard with your university's private data: +Your data resides within a Google Cloud Storage bucket inside your dedicated GCP environment project. This setup allows you to control who can upload and manage the data that will be used for subsequent analytics. -1. Log in and navigate to the **Admin Panel**. -2. Go to **Data & Files** tab. -3. Locate **Applicant Data Upload** section. -4. Click **Choose File**, select your CSV, and click **Upload**. - *Success*: You will see a "Rows successfully uploaded" message. - *Error*: The system will indicate specific line/column issues. +> [!NOTE] +> Your data is kept private and is not shared with the public Data Commons. Data mixing and processing occur only on your deployed Custom Data Commons instance. -### Customize User Interface +--- -1. In the Admin Panel, navigate to **Theme Settings**. -2. **University Branding**: +## Data Commons for Education -* **Name:** Update the university name displayed in the top bar. -* **Logo:** Upload a PNG image. +***Template: Student Recruitment Intelligence Center*** -3. **Dashboard Text**: +### Education Overview -* **Header Text:** Edit the main title (e.g., "Student Recruitment Intelligence Center"). -* **Hero Description:** Update the subtitle describing the purpose of your custom Data Commons. +The Education template combines your private applicant data with public demographic trends to help universities identify high-potential recruitment regions. -4. Click **Save Changes**. Updates are applied immediately. +### Education: For Administrators -### Data Security +Your CSV data must match the Education template schema. Download a sample CSV from the **Data & Files** tab in the Admin Panel. -Your uploaded CSV data is stored securely within your Google Cloud SQL instance. It is combined with public Data Commons data only at the visualization layer and is not shared externally. +For upload instructions and UI customization, see [Getting Started](#getting-started). -## Data commons for Data Analysts & Researchers +### Education: For Data Analysts & Researchers -### Explore Recruitment Metrics +#### Explore Recruitment Metrics The dashboard provides a high-level view of your recruitment landscape: -* **Total Applicants:** Aggregated count for the target year. -* **Avg Opportunity Score:** A calculated metric indicating regional potential. -* **High Opportunity Markets:** Count of regions exceeding your target criteria. -* **Avg Household Income:** Public demographic data correlated with your target regions. +- **Total Applicants:** Aggregated count for the target year. +- **Avg Opportunity Score:** A calculated metric indicating regional potential. +- **High Opportunity Markets:** Count of regions exceeding your target criteria. +- **Avg Household Income:** Public demographic data correlated with your target regions. -### Interactive Maps +#### Interactive Maps The **Recruitment Potential by State** map visualizes where your applicants are coming from versus high-opportunity areas. -* **Hover:** Hover over a state to see specific applicant counts and opportunity scores. +- **Hover:** Hover over a state to see specific applicant counts and opportunity scores. + +#### Filtering & Deep Dives -### Filtering & Deep Dives +- **Filters:** Use the dropdowns at the top (e.g., Target Year) to filter all widgets on the page. +- **Standard Tools:** Click "Explore in Timeline Tool" on specific widgets to analyze the data using standard Data Commons graphing tools. -Filters: Use the dropdowns at the top (e.g., Target Year) to filter all widgets on the page. -Standard Tools: Click "Explore in Timeline Tool" on specific widgets to analyze the data using standard Data Commons graphing tools. +--- -# Data Commons for Health +## Data Commons for Health ***Template: Population Health & City Comparison*** -## Overview +### Health Overview The Health template allows organizations to compare specific health metrics (e.g., obesity, diabetes, smoking) across different cities, blending local private data with public CDC data. -## For Administrators - -### Upload & Configuration - -Follow the standard upload procedure outlined in the ***getting-started*** section. +### Health: For Administrators -* **Data Requirement:** Ensure your CSV contains city-level health metrics formatted according to the template schema. +Your CSV must contain city-level health metrics formatted according to the Health template schema. -### Customize Branding +For upload instructions and UI customization, see [Getting Started](#getting-started). -Follow the standard Theme Settings instructions to update the Organization Name and Logo. +### Health: For Data Analysts & Researchers -## For Data Analysts & Researchers - -### Compare Cities +#### Compare Cities The primary feature of this dashboard is the City Comparator. 1. Locate the "Compare Cities" section at the top. -2. The default city (e.g., Boston, MA) is selected. -3. Click **+ Add City** to select up to 4 additional cities. +2. The default city (e.g., Boston, MA) is selected. +3. Click **+ Add City** to select up to 4 additional cities. 4. The dashboard will update to show side-by-side metrics. -### Key Metrics Indicators +> [!TIP] +> The dashboard updates in real-time as you add or remove cities from the comparison. + +#### Key Metrics Indicators View cards displaying current percentages for: -* Obesity -* Smoking -* Physical Health -* Diabetes -* High Blood Pressure +- Obesity +- Smoking +- Physical Health +- Diabetes +- High Blood Pressure + +#### Visual Comparison & Distribution -### Visual Comparison & Distribution +- **Bar Charts:** Compare the selected cities against each other across multiple health categories (e.g., "People Vaccinated," "People Who Are Sick"). +- **Trend Lines:** View the "Health Issue Distribution" over time (2020–2024) to identify rising or falling trends. -* **Bar Charts:** Compare the selected cities against each other across multiple health categories (e.g., "People Vaccinated," "People Who Are Sick"). -* **Trend Lines:** View the "Health Issue Distribution" over time (2020–2024) to identify rising or falling trends. +--- -# Data Commons for Energy +## Data Commons for Energy ***Template: Methane Insights & Asset Risk*** -## Overview +### Energy Overview The Energy template focuses on environmental monitoring, specifically correlating private asset locations with public methane plume data to identify high-risk leaks and community impact. -## For Administrators +### Energy: For Administrators -### Upload Asset Data +> [!IMPORTANT] +> Your CSV must contain latitude/longitude coordinates for each asset. See the sample file in the Admin Panel for the required format. -* Navigate to **Admin Panel > Data & Files.** -* Upload your **Asset Locations CSV.** This file must contain coordinates (latitude/longitude) of your infrastructure. +For upload instructions and UI customization, see [Getting Started](#getting-started). -## For Data Analysts & Researchers +### Energy: For Data Analysts & Researchers -### Risk Overview (KPI Cards) +#### Risk Overview (KPI Cards) -* **Total Plume-Asset Intersections:** Percentage of assets currently intersecting with detected methane plumes. -* **Assets Near Communities:** Count of assets within a specific radius of populated areas. -* **High-Risk Issues:** Critical alerts detected in the last 30 days. +- **Total Plume-Asset Intersections:** Percentage of assets currently intersecting with detected methane plumes. +- **Assets Near Communities:** Count of assets within a specific radius of populated areas. +- **High-Risk Issues:** Critical alerts detected in the last 30 days. -### Methane Map Chart +#### Methane Map Chart This interactive map layers three datasets: -* **Methane Plumes (Public):** Satellite detection data. -* **Asset Density (Private):** Your uploaded infrastructure. -* **Community Risk Index (Public):** Census data indicating vulnerable populations. +- **Methane Plumes (Public):** Satellite detection data. +- **Asset Density (Private):** Your uploaded infrastructure. +- **Community Risk Index (Public):** Census data indicating vulnerable populations. -### Detailed Intersections Table +#### Detailed Intersections Table Review specific leak events in the table at the bottom of the dashboard: -* **Risk Score:** Low / Medium / High / Critical. -* **Leak Event ID:** Unique identifier for the detection. -* **Suspected Asset:** The specific asset ID linked to the leak. -* **Vulnerability Level:** Demographic risk score of the nearby community. -* **Action Status:** Current operational status (e.g., "Normal Operations"). +- **Risk Score:** Low / Medium / High / Critical. +- **Leak Event ID:** Unique identifier for the detection. +- **Suspected Asset:** The specific asset ID linked to the leak. +- **Vulnerability Level:** Demographic risk score of the nearby community. +- **Action Status:** Current operational status (e.g., "Normal Operations"). + +--- + +## Data Commons for Custom + +***Template: Custom Configuration*** + +### Custom Overview + +The Custom sample provides a blank-slate Data Commons instance with no pre-loaded domain-specific datasets, statistical variables, or visualizations. Use this option when your use case doesn't align with the Education, Health, or Energy samples, or when you want to build a fully custom configuration from scratch. + +### Custom: For Administrators + +Upload your own data following the generic CSV schema. See [Getting Started](#getting-started) for upload instructions and UI customization. + +### Custom: For Data Analysts & Researchers + +The Custom instance starts with an empty dashboard. After your administrator uploads data, the dashboard will populate based on the uploaded dataset structure. Use the standard Data Commons exploration and visualization tools to analyze your data. + +--- + +## Known Limitations + +- **Data Sync:** Dashboard data refreshes automatically after upload, but large CSVs may take a few moments to process. +- **Browser Support:** For best performance, use the latest version of Chrome. -# Known Limitations +> [!TIP] +> For troubleshooting deployment issues, port-forwarding errors, or pod status problems, see the [Deployment Guide — Troubleshooting](DEPLOYMENT_GUIDE.md#troubleshooting). -* **Data Sync:** Dashboard data refreshes automatically after upload, but large CSVs may take a few moments to process. -* **Browser Support:** For best performance, use the latest version of Chrome. +--- -# Request Support +## Request Support -If you encounter issues not covered in this guide: +If you encounter issues not covered in this guide: 1. Check the deployment logs in your Google Cloud Console. -2. Contact your organization's system administrator. -3. To report bugs, request new features [Get Data Commons support](https://docs.datacommons.org/support.html). +2. Contact your organization's system administrator. +3. To report bugs, request new features [Get Data Commons support](https://docs.datacommons.org/support.html). \ No newline at end of file diff --git a/mp-pkg/charts/datacommons/.helmignore b/mp-pkg/charts/datacommons/.helmignore new file mode 100644 index 0000000..266489d --- /dev/null +++ b/mp-pkg/charts/datacommons/.helmignore @@ -0,0 +1,35 @@ +.DS_Store +.git/ +.gitignore +.bzr/ +.bzrignore +.hg/ +.hgignore +.svn/ + +# Common backup files +*.swp +*.bak +*.tmp +*.orig +*~ + +# Various IDEs +.project +.idea/ +*.tmproj +.vscode/ + +# Testing +tests/ +test/ +ci/ + +# Documentation +README.md +CHANGELOG.md +LICENSE + +# CI/CD +.gitlab-ci.yml +Makefile diff --git a/mp-pkg/charts/datacommons/Chart.yaml b/mp-pkg/charts/datacommons/Chart.yaml new file mode 100644 index 0000000..9ae914b --- /dev/null +++ b/mp-pkg/charts/datacommons/Chart.yaml @@ -0,0 +1,34 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +apiVersion: v2 +name: datacommons +description: Custom Data Commons Accelerator deployment +type: application +version: 3.4.3 +appVersion: "v3.4.3" + +# GCP Marketplace metadata +annotations: + marketplace.cloud.google.com/deploy-info: '{}' + +# Chart metadata +icon: https://datacommons.org/images/dc-logo.svg +keywords: + - datacommons + - data + - visualization + - gcp + - kubernetes + - marketplace diff --git a/mp-pkg/charts/datacommons/crds/app-crd.yaml b/mp-pkg/charts/datacommons/crds/app-crd.yaml new file mode 100644 index 0000000..8e08c7d --- /dev/null +++ b/mp-pkg/charts/datacommons/crds/app-crd.yaml @@ -0,0 +1,532 @@ +# Copyright 2020 The Kubernetes Authors. +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + api-approved.kubernetes.io: https://github.com/kubernetes-sigs/application/pull/2 + controller-gen.kubebuilder.io/version: v0.4.0 + creationTimestamp: null + name: applications.app.k8s.io +spec: + group: app.k8s.io + names: + categories: + - all + kind: Application + listKind: ApplicationList + plural: applications + shortNames: + - app + singular: application + scope: Namespaced + versions: + - additionalPrinterColumns: + - description: The type of the application + jsonPath: .spec.descriptor.type + name: Type + type: string + - description: The creation date + jsonPath: .spec.descriptor.version + name: Version + type: string + - description: The application object owns the matched resources + jsonPath: .spec.addOwnerRef + name: Owner + type: boolean + - description: Numbers of components ready + jsonPath: .status.componentsReady + name: Ready + type: string + - description: The creation date + jsonPath: .metadata.creationTimestamp + name: Age + type: date + name: v1beta1 + schema: + openAPIV3Schema: + description: Application is the Schema for the applications API + properties: + apiVersion: + description: 'APIVersion defines the versioned schema of this representation + of an object. Servers should convert recognized schemas to the latest + internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources' + type: string + kind: + description: 'Kind is a string value representing the REST resource this + object represents. Servers may infer this from the endpoint the client + submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' + type: string + metadata: + type: object + spec: + description: ApplicationSpec defines the specification for an Application. + properties: + addOwnerRef: + description: AddOwnerRef objects - flag to indicate if we need to + add OwnerRefs to matching objects Matching is done by using Selector + to query all ComponentGroupKinds + type: boolean + assemblyPhase: + description: AssemblyPhase represents the current phase of the application's + assembly. An empty value is equivalent to "Succeeded". + type: string + componentKinds: + description: ComponentGroupKinds is a list of Kinds for Application's + components (e.g. Deployments, Pods, Services, CRDs). It can be used + in conjunction with the Application's Selector to list or watch + the Applications components. + items: + description: GroupKind specifies a Group and a Kind, but does not + force a version. This is useful for identifying concepts during + lookup stages without having partially valid types + properties: + group: + type: string + kind: + type: string + required: + - group + - kind + type: object + type: array + descriptor: + description: Descriptor regroups information and metadata about an + application. + properties: + description: + description: Description is a brief string description of the + Application. + type: string + icons: + description: Icons is an optional list of icons for an application. + Icon information includes the source, size, and mime type. + items: + description: ImageSpec contains information about an image used + as an icon. + properties: + size: + description: (optional) The size of the image in pixels + (e.g., 25x25). + type: string + src: + description: The source for image represented as either + an absolute URL to the image or a Data URL containing + the image. Data URLs are defined in RFC 2397. + type: string + type: + description: (optional) The mine type of the image (e.g., + "image/png"). + type: string + required: + - src + type: object + type: array + keywords: + description: Keywords is an optional list of key words associated + with the application (e.g. MySQL, RDBMS, database). + items: + type: string + type: array + links: + description: Links are a list of descriptive URLs intended to + be used to surface additional documentation, dashboards, etc. + items: + description: Link contains information about an URL to surface + documentation, dashboards, etc. + properties: + description: + description: Description is human readable content explaining + the purpose of the link. + type: string + url: + description: Url typically points at a website address. + type: string + type: object + type: array + maintainers: + description: Maintainers is an optional list of maintainers of + the application. The maintainers in this list maintain the the + source code, images, and package for the application. + items: + description: ContactData contains information about an individual + or organization. + properties: + email: + description: Email is the email address. + type: string + name: + description: Name is the descriptive name. + type: string + url: + description: Url could typically be a website address. + type: string + type: object + type: array + notes: + description: Notes contain a human readable snippets intended + as a quick start for the users of the Application. CommonMark + markdown syntax may be used for rich text representation. + type: string + owners: + description: Owners is an optional list of the owners of the installed + application. The owners of the application should be contacted + in the event of a planned or unplanned disruption affecting + the application. + items: + description: ContactData contains information about an individual + or organization. + properties: + email: + description: Email is the email address. + type: string + name: + description: Name is the descriptive name. + type: string + url: + description: Url could typically be a website address. + type: string + type: object + type: array + type: + description: Type is the type of the application (e.g. WordPress, + MySQL, Cassandra). + type: string + version: + description: Version is an optional version indicator for the + Application. + type: string + type: object + info: + description: Info contains human readable key,value pairs for the + Application. + items: + description: InfoItem is a human readable key,value pair containing + important information about how to access the Application. + properties: + name: + description: Name is a human readable title for this piece of + information. + type: string + type: + description: Type of the value for this InfoItem. + type: string + value: + description: Value is human readable content. + type: string + valueFrom: + description: ValueFrom defines a reference to derive the value + from another source. + properties: + configMapKeyRef: + description: Selects a key of a ConfigMap. + properties: + apiVersion: + description: API version of the referent. + type: string + fieldPath: + description: 'If referring to a piece of an object instead + of an entire object, this string should contain a + valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. + For example, if the object reference is to a container + within a pod, this would take on a value like: "spec.containers{name}" + (where "name" refers to the name of the container + that triggered the event) or if no container name + is specified "spec.containers[2]" (container with + index 2 in this pod). This syntax is chosen only to + have some well-defined way of referencing a part of + an object. TODO: this design is not final and this + field is subject to change in the future.' + type: string + key: + description: The key to select. + type: string + kind: + description: 'Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' + type: string + name: + description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names' + type: string + namespace: + description: 'Namespace of the referent. More info: + https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/' + type: string + resourceVersion: + description: 'Specific resourceVersion to which this + reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency' + type: string + uid: + description: 'UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids' + type: string + type: object + ingressRef: + description: Select an Ingress. + properties: + apiVersion: + description: API version of the referent. + type: string + fieldPath: + description: 'If referring to a piece of an object instead + of an entire object, this string should contain a + valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. + For example, if the object reference is to a container + within a pod, this would take on a value like: "spec.containers{name}" + (where "name" refers to the name of the container + that triggered the event) or if no container name + is specified "spec.containers[2]" (container with + index 2 in this pod). This syntax is chosen only to + have some well-defined way of referencing a part of + an object. TODO: this design is not final and this + field is subject to change in the future.' + type: string + host: + description: The optional host to select. + type: string + kind: + description: 'Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' + type: string + name: + description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names' + type: string + namespace: + description: 'Namespace of the referent. More info: + https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/' + type: string + path: + description: The optional HTTP path. + type: string + protocol: + description: Protocol for the ingress + type: string + resourceVersion: + description: 'Specific resourceVersion to which this + reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency' + type: string + uid: + description: 'UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids' + type: string + type: object + secretKeyRef: + description: Selects a key of a Secret. + properties: + apiVersion: + description: API version of the referent. + type: string + fieldPath: + description: 'If referring to a piece of an object instead + of an entire object, this string should contain a + valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. + For example, if the object reference is to a container + within a pod, this would take on a value like: "spec.containers{name}" + (where "name" refers to the name of the container + that triggered the event) or if no container name + is specified "spec.containers[2]" (container with + index 2 in this pod). This syntax is chosen only to + have some well-defined way of referencing a part of + an object. TODO: this design is not final and this + field is subject to change in the future.' + type: string + key: + description: The key to select. + type: string + kind: + description: 'Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' + type: string + name: + description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names' + type: string + namespace: + description: 'Namespace of the referent. More info: + https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/' + type: string + resourceVersion: + description: 'Specific resourceVersion to which this + reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency' + type: string + uid: + description: 'UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids' + type: string + type: object + serviceRef: + description: Select a Service. + properties: + apiVersion: + description: API version of the referent. + type: string + fieldPath: + description: 'If referring to a piece of an object instead + of an entire object, this string should contain a + valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. + For example, if the object reference is to a container + within a pod, this would take on a value like: "spec.containers{name}" + (where "name" refers to the name of the container + that triggered the event) or if no container name + is specified "spec.containers[2]" (container with + index 2 in this pod). This syntax is chosen only to + have some well-defined way of referencing a part of + an object. TODO: this design is not final and this + field is subject to change in the future.' + type: string + kind: + description: 'Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' + type: string + name: + description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names' + type: string + namespace: + description: 'Namespace of the referent. More info: + https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/' + type: string + path: + description: The optional HTTP path. + type: string + port: + description: The optional port to select. + format: int32 + type: integer + protocol: + description: Protocol for the service + type: string + resourceVersion: + description: 'Specific resourceVersion to which this + reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency' + type: string + uid: + description: 'UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids' + type: string + type: object + type: + description: Type of source. + type: string + type: object + type: object + type: array + selector: + description: 'Selector is a label query over kinds that created by + the application. It must match the component objects'' labels. More + info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors' + properties: + matchExpressions: + description: matchExpressions is a list of label selector requirements. + The requirements are ANDed. + items: + description: A label selector requirement is a selector that + contains values, a key, and an operator that relates the key + and values. + properties: + key: + description: key is the label key that the selector applies + to. + type: string + operator: + description: operator represents a key's relationship to + a set of values. Valid operators are In, NotIn, Exists + and DoesNotExist. + type: string + values: + description: values is an array of string values. If the + operator is In or NotIn, the values array must be non-empty. + If the operator is Exists or DoesNotExist, the values + array must be empty. This array is replaced during a strategic + merge patch. + items: + type: string + type: array + required: + - key + - operator + type: object + type: array + matchLabels: + additionalProperties: + type: string + description: matchLabels is a map of {key,value} pairs. A single + {key,value} in the matchLabels map is equivalent to an element + of matchExpressions, whose key field is "key", the operator + is "In", and the values array contains only "value". The requirements + are ANDed. + type: object + type: object + type: object + status: + description: ApplicationStatus defines controller's the observed state + of Application + properties: + components: + description: Object status array for all matching objects + items: + description: ObjectStatus is a generic status holder for objects + properties: + group: + description: Object group + type: string + kind: + description: Kind of object + type: string + link: + description: Link to object + type: string + name: + description: Name of object + type: string + status: + description: 'Status. Values: InProgress, Ready, Unknown' + type: string + type: object + type: array + componentsReady: + description: 'ComponentsReady: status of the components in the format + ready/total' + type: string + conditions: + description: Conditions represents the latest state of the object + items: + description: Condition describes the state of an object at a certain + point. + properties: + lastTransitionTime: + description: Last time the condition transitioned from one status + to another. + format: date-time + type: string + lastUpdateTime: + description: Last time the condition was probed + format: date-time + type: string + message: + description: A human readable message indicating details about + the transition. + type: string + reason: + description: The reason for the condition's last transition. + type: string + status: + description: Status of the condition, one of True, False, Unknown. + type: string + type: + description: Type of condition. + type: string + required: + - status + - type + type: object + type: array + observedGeneration: + description: ObservedGeneration is the most recent generation observed. + It corresponds to the Object's generation, which is updated on mutation + by the API Server. + format: int64 + type: integer + type: object + type: object + served: true + storage: true + subresources: + status: {} +status: + acceptedNames: + kind: "" + plural: "" + conditions: [] + storedVersions: [] \ No newline at end of file diff --git a/mp-pkg/charts/datacommons/templates/NOTES.txt b/mp-pkg/charts/datacommons/templates/NOTES.txt new file mode 100644 index 0000000..00d6c86 --- /dev/null +++ b/mp-pkg/charts/datacommons/templates/NOTES.txt @@ -0,0 +1,85 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +********************************************************************** +* * +* Data Commons has been successfully deployed! * +* * +********************************************************************** + +Your Data Commons application is now running in the {{ .Release.Namespace }} namespace. + +## Quick Status Check + +To verify your deployment: + + kubectl get pods -n {{ .Release.Namespace }} + kubectl get service {{ include "datacommons.fullname" . }} -n {{ .Release.Namespace }} + kubectl get application {{ .Release.Name }} -n {{ .Release.Namespace }} + +## Accessing the Application + +The Data Commons service is running on ClusterIP. To access it: + +### Option 1: Port Forward (for testing) + + until kubectl port-forward svc/{{ include "datacommons.fullname" . }} 8080:8080 -n {{ .Release.Namespace }}; do + echo "Port-forward crashed. Respawning..." >&2 + sleep 1 + done + +Then open http://localhost:8080 in your browser. + +### Option 2: Quick Access via Cloud Shell (recommended) + +For the fastest way to access Data Commons: + +1. Go to GKE Console: https://console.cloud.google.com/kubernetes/list +2. Click your cluster name +3. Click the three-dot menu (more options) next to your cluster +4. Select "Connect" and click "Run in Cloud Shell" +5. Once Cloud Shell connects, run: + + until kubectl port-forward svc/{{ include "datacommons.fullname" . }} 8080:8080 -n {{ .Release.Namespace }}; do + echo "Port-forward crashed. Respawning..." >&2 + sleep 1 + done + +6. Click "Web Preview" (top-right toolbar) > "Preview on port 8080" + +### Option 3: Configure Ingress (recommended for production) + +This chart does NOT create ingress resources. Create your own Ingress or Gateway: +See more: https://docs.cloud.google.com/kubernetes-engine/docs/how-to/exposing-apps + +## Monitoring + +View application logs: + + kubectl logs -l app=datacommons -n {{ .Release.Namespace }} --tail=100 -f + +Monitor pod health: + + kubectl describe pod -l app=datacommons -n {{ .Release.Namespace }} + +## Troubleshooting + +If pods are not starting: +1. Check events: kubectl describe pod -l app=datacommons -n {{ .Release.Namespace }} +2. Verify secrets exist: kubectl get secret {{ .Values.existingSecret }} -n {{ .Release.Namespace }} +3. Check Workload Identity: kubectl get serviceaccount {{ include "datacommons.serviceAccountName" . }} -n {{ .Release.Namespace }} -o yaml + +For more information, see: https://docs.datacommons.org/ + +********************************************************************** diff --git a/mp-pkg/charts/datacommons/templates/_helpers.tpl b/mp-pkg/charts/datacommons/templates/_helpers.tpl new file mode 100644 index 0000000..20a1ac6 --- /dev/null +++ b/mp-pkg/charts/datacommons/templates/_helpers.tpl @@ -0,0 +1,104 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +{{/* +Expand the name of the chart. +*/}} +{{- define "datacommons.name" -}} +{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }} +{{- end }} + +{{/* +Create a default fully qualified app name. +*/}} +{{- define "datacommons.fullname" -}} +{{- if .Values.fullnameOverride }} +{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }} +{{- else }} +{{- .Release.Name | trunc 63 | trimSuffix "-" }} +{{- end }} +{{- end }} + +{{/* +Create chart name and version as used by the chart label. +*/}} +{{- define "datacommons.chart" -}} +{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }} +{{- end }} + +{{/* +Common labels +*/}} +{{- define "datacommons.labels" -}} +helm.sh/chart: {{ include "datacommons.chart" . }} +{{ include "datacommons.selectorLabels" . }} +{{- if .Chart.AppVersion }} +app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} +{{- end }} +app.kubernetes.io/managed-by: {{ .Release.Service }} +app.kubernetes.io/part-of: datacommons +{{- end }} + +{{/* +Selector labels +*/}} +{{- define "datacommons.selectorLabels" -}} +app.kubernetes.io/name: {{ include "datacommons.name" . }} +app.kubernetes.io/instance: {{ .Release.Name }} +app: datacommons +{{- end }} + +{{/* +Service Account name +*/}} +{{- define "datacommons.serviceAccountName" -}} +{{- if .Values.serviceAccount.create }} +{{- default "datacommons-ksa" .Values.serviceAccount.name }} +{{- else }} +{{- default "default" .Values.serviceAccount.name }} +{{- end }} +{{- end }} + +{{/* +GCS output directory +*/}} +{{- define "datacommons.gcsOutputDir" -}} +{{- $bucket := .Values.config.gcs.bucket }} +{{- $prefix := .Values.config.gcs.pathPrefix }} +{{- if $prefix }} +{{- printf "%s/%s/output" $bucket $prefix }} +{{- else }} +{{- printf "%s/output" $bucket }} +{{- end }} +{{- end }} + +{{/* +GCS input directory +*/}} +{{- define "datacommons.gcsInputDir" -}} +{{- $bucket := .Values.config.gcs.bucket }} +{{- $prefix := .Values.config.gcs.pathPrefix }} +{{- if $prefix }} +{{- printf "%s/%s/input" $bucket $prefix }} +{{- else }} +{{- printf "%s/input" $bucket }} +{{- end }} +{{- end }} + +{{/* +Namespace for resources +*/}} +{{- define "datacommons.namespace" -}} +{{- .Release.Namespace }} +{{- end }} diff --git a/mp-pkg/charts/datacommons/templates/application.yaml b/mp-pkg/charts/datacommons/templates/application.yaml new file mode 100644 index 0000000..8882cd5 --- /dev/null +++ b/mp-pkg/charts/datacommons/templates/application.yaml @@ -0,0 +1,92 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +apiVersion: app.k8s.io/v1beta1 +kind: Application +metadata: + name: {{ .Release.Name }} + namespace: {{ .Release.Namespace }} + labels: + {{- include "datacommons.labels" . | nindent 4 }} + annotations: + kubernetes-engine.cloud.google.com/icon: >- + data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMjgiIGhlaWdodD0iMjgiIHZpZXdCb3g9IjAgMCAyOCAyOCIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPGcgaWQ9InVwZGF0ZWQtZGMtbG9nby5zdmciIGNsaXAtcGF0aD0idXJsKCNjbGlwMF81MTI5XzIyNTA4KSI+CjxnIGlkPSJkYy1sb2dvLnN2ZyBmaWxsIiBjbGlwLXBhdGg9InVybCgjY2xpcDFfNTEyOV8yMjUwOCkiPgo8ZyBpZD0iZGMtbG9nby5zdmciIGNsaXAtcGF0aD0idXJsKCNjbGlwMl81MTI5XzIyNTA4KSI+CjxnIGlkPSJHcm91cCI+CjxwYXRoIGlkPSJWZWN0b3IiIGQ9Ik0yNCAwSDRDMS43OTA4NiAwIDAgMS43OTA4NiAwIDRWMjRDMCAyNi4yMDkxIDEuNzkwODYgMjggNCAyOEgyNEMyNi4yMDkxIDI4IDI4IDI2LjIwOTEgMjggMjRWNEMyOCAxLjc5MDg2IDI2LjIwOTEgMCAyNCAwWiIgZmlsbD0iIzA3NTFCMyIvPgo8L2c+CjwvZz4KPC9nPgo8cGF0aCBpZD0iREMiIGQ9Ik0xOS42MTMyIDkuMDE4MzlDMjAuMTI1NCA5LjAxODQyIDIwLjYwMjkgOS4wODE4NyAyMS4wNDQ5IDkuMjEwNzdDMjEuNDg1OCA5LjMzOTM4IDIxLjg5MTEgOS41MjM5MyAyMi4yNTk3IDkuNzYzNTFDMjIuNjI3MiAxMC4wMDI0IDIyLjk2MjEgMTAuMjkxNiAyMy4yNjQ2IDEwLjYzMDdMMjMuMzI5IDEwLjcwMkwyMy4yNTk3IDEwLjc2OTRMMjIuMjIwNiAxMS43ODIxTDIyLjE0MjUgMTEuODU3M0wyMi4wNzQyIDExLjc3MzNDMjEuODYzNiAxMS41MTg4IDIxLjYzNTggMTEuMzA5NSAyMS4zOTI1IDExLjE0NDRMMjEuMzkwNiAxMS4xNDM0QzIxLjE0OCAxMC45NzAyIDIwLjg3OTQgMTAuODM5NyAyMC41ODM5IDEwLjc1MjhIMjAuNTgyOUMyMC4yOTY3IDEwLjY2NiAxOS45Nzc2IDEwLjYyMTkgMTkuNjI1OSAxMC42MjE5QzE4Ljk5NDYgMTAuNjIyIDE4LjQxNzcgMTAuNzcxMiAxNy44OTM1IDExLjA2ODJIMTcuODkyNUMxNy4zNjg4IDExLjM1NjIgMTYuOTQ4NyAxMS43NjE0IDE2LjYzMzcgMTIuMjg2TDE2LjYzNDcgMTIuMjg2OUMxNi4zMzAzIDEyLjgwODkgMTYuMTc1NyAxMy40MzQyIDE2LjE3NTcgMTQuMTY2OEMxNi4xNzU3IDE0Ljg4OTMgMTYuMzI5NiAxNS41MTQ3IDE2LjYzMzcgMTYuMDQ1N0MxNi45NDg5IDE2LjU3MTEgMTcuMzY4OSAxNi45ODE4IDE3Ljg5MzUgMTcuMjc5MUgxNy44OTI1QzE4LjQxNjggMTcuNTY3NSAxOC45OTQxIDE3LjcxMTcgMTkuNjI1OSAxNy43MTE4QzIwLjE5NjcgMTcuNzExOCAyMC42OTg0IDE3LjU5MzMgMjEuMTMxOCAxNy4zNTkyTDIxLjI5NjggMTcuMjY2NEMyMS42NzY1IDE3LjA0MDIgMjIuMDE3IDE2Ljc0NjYgMjIuMzE3MyAxNi4zODQ2TDIyLjM4NTcgMTYuMzAxNkwyMi40NjI4IDE2LjM3NjhMMjMuNTMwMiAxNy40MDIyTDIzLjU5NzYgMTcuNDY3NkwyMy41MzcgMTcuNTM4OUMyMy4yMzM3IDE3Ljg5NzQgMjIuODc5MiAxOC4yMTAzIDIyLjQ3NTUgMTguNDc2NEMyMi4wNzE1IDE4Ljc0MjYgMjEuNjMwOCAxOC45NDg5IDIxLjE1NDIgMTkuMDk1NUMyMC42NzU5IDE5LjI0MjcgMjAuMTYyIDE5LjMxNjIgMTkuNjEzMiAxOS4zMTYyQzE4Ljg5MDMgMTkuMzE2MiAxOC4yMTU2IDE5LjE4NzcgMTcuNTkwOCAxOC45MzA1TDE3LjU4ODggMTguOTI5NUMxNi45NjYgMTguNjYzOCAxNi40MjAyIDE4LjMwMTUgMTUuOTUyMSAxNy44NDI2TDE1Ljk1MDEgMTcuODQxNkMxNS40OTEyIDE3LjM3MzUgMTUuMTI4OSAxNi44MjY5IDE0Ljg2MzIgMTYuMjAzOUwxNC44NjIyIDE2LjIwM1YxNi4yMDJDMTQuNjA1MyAxNS41Njg1IDE0LjQ3NzUgMTQuODg5NyAxNC40Nzc1IDE0LjE2NjhDMTQuNDc3NSAxMy40MzUgMTQuNjA1OCAxMi43NTU3IDE0Ljg2MzIgMTIuMTMwN1YxMi4xMjk3QzE1LjEyOSAxMS41MDY2IDE1LjQ5MTggMTAuOTY1MSAxNS45NTExIDEwLjUwNTdDMTYuMzYxMSAxMC4wOTU3IDE2LjgzMTEgOS43NjYzNyAxNy4zNjAzIDkuNTE4MzlMMTcuNTkwOCA5LjQxNjgzQzE4LjIxNTUgOS4xNTA2NCAxOC44OTAzIDkuMDE4MzkgMTkuNjEzMiA5LjAxODM5Wk04LjE5NTI1IDkuMjM0MjFDOS4yMjQ4NCA5LjIzNDI1IDEwLjExNDggOS40NDM2MyAxMC44NjEzIDkuODY3MDJDMTEuNjE1NCAxMC4yODA5IDEyLjE5NjIgMTAuODYxOCAxMi42MDE1IDExLjYwNzNDMTMuMDE1OCAxMi4zNTMgMTMuMjIyNiAxMy4yMDcyIDEzLjIyMjYgMTQuMTY2OEMxMy4yMjI2IDE1LjEyNjUgMTMuMDE2NyAxNS45ODA3IDEyLjYwMjUgMTYuNzI2NEwxMi42MDE1IDE2LjcyNTRDMTIuMTk2NCAxNy40NzEgMTEuNjE3IDE4LjA1NjQgMTAuODYzMiAxOC40NzkzTDEwLjg2MjIgMTguNDgwM0MxMC4xMTU2IDE4Ljg5NSA5LjIyNTI0IDE5LjEwMDQgOC4xOTUyNSAxOS4xMDA0SDUuMDAzODVWOS4yMzQyMUg4LjE5NTI1Wk02LjY4ODQyIDE3LjQ5NTlIOC4xNTQyNEM4Ljg3MDM2IDE3LjQ5NTkgOS40Nzg4MiAxNy4zNjMgOS45ODMzNCAxNy4xMDI0QzEwLjQ4OCAxNi44MzI2IDEwLjg3MDMgMTYuNDUxNyAxMS4xMzA4IDE1Ljk1NjlDMTEuMzkxNyAxNS40NjExIDExLjUyNDQgMTQuODY1MiAxMS41MjQ0IDE0LjE2NjhDMTEuNTI0MyAxMy40Njg2IDExLjM5MTYgMTIuODczNCAxMS4xMzA4IDEyLjM3NzhDMTAuODcwNSAxMS44ODMyIDEwLjQ4OTQgMTEuNTA1NyA5Ljk4NTI5IDExLjI0NUg5Ljk4MzM0QzkuNDc4ODggMTAuOTc1NCA4Ljg3MDQyIDEwLjgzNzcgOC4xNTQyNCAxMC44Mzc3SDYuNjg4NDJWMTcuNDk1OVoiIGZpbGw9IndoaXRlIiBzdHJva2U9IndoaXRlIiBzdHJva2Utd2lkdGg9IjAuMiIvPgo8L2c+CjxkZWZzPgo8Y2xpcFBhdGggaWQ9ImNsaXAwXzUxMjlfMjI1MDgiPgo8cmVjdCB3aWR0aD0iMjgiIGhlaWdodD0iMjgiIGZpbGw9IndoaXRlIi8+CjwvY2xpcFBhdGg+CjxjbGlwUGF0aCBpZD0iY2xpcDFfNTEyOV8yMjUwOCI+CjxyZWN0IHdpZHRoPSIyOCIgaGVpZ2h0PSIyOCIgZmlsbD0id2hpdGUiLz4KPC9jbGlwUGF0aD4KPGNsaXBQYXRoIGlkPSJjbGlwMl81MTI5XzIyNTA4Ij4KPHJlY3Qgd2lkdGg9IjI4IiBoZWlnaHQ9IjI4IiBmaWxsPSJ3aGl0ZSIvPgo8L2NsaXBQYXRoPgo8L2RlZnM+Cjwvc3ZnPgo= + marketplace.cloud.google.com/deploy-info: '{}' +spec: + descriptor: + type: "Data Commons Accelerator" + version: "{{ .Chart.AppVersion }}" + description: |- + Data Commons Accelerator - a ready-to-deploy instance of Custom Data Commons + for exploring and visualizing structured data on GKE. + + **Features:** + - Interactive data exploration and visualization + - Integration with Google Cloud SQL for data storage + - Scalable Kubernetes-native architecture + - Support for custom data imports from GCS + - Pre-built samples (Education, Health, Energy, Custom) + links: + - description: "Data Commons Documentation" + url: "https://docs.datacommons.org/" + - description: "Source Code" + url: "https://github.com/datacommonsorg/website" + - description: "User Guide" + url: "https://docs.datacommons.org/custom_dc/" + notes: |- + # Getting Started + + Your Data Commons application has been deployed successfully! + + ## Next Steps + + 1. **Configure Ingress**: Create an Ingress or Gateway API to expose the service: + See [Exposing Apps](https://docs.cloud.google.com/kubernetes-engine/docs/how-to/exposing-apps) for more details. + ``` + kubectl get service {{ include "datacommons.fullname" . }} -n {{ .Release.Namespace }} + ``` + + 2. **Verify Database**: Check that the database initialization completed: + ``` + kubectl get jobs -n {{ .Release.Namespace }} + ``` + + 3. **Monitor Pods**: Ensure all pods are running: + ``` + kubectl get pods -n {{ .Release.Namespace }} + ``` + + 4. **View Logs**: Check application logs: + ``` + kubectl logs -l app.kubernetes.io/name=datacommons -n {{ .Release.Namespace }} + ``` + selector: + matchLabels: + {{- include "datacommons.selectorLabels" . | nindent 6 }} + componentKinds: + - group: apps/v1 + kind: Deployment + - group: v1 + kind: Service + - group: v1 + kind: ConfigMap + - group: v1 + kind: ServiceAccount + - group: batch/v1 + kind: Job + {{- if .Values.dbSync.enabled }} + - group: batch/v1 + kind: CronJob + {{- end }} + addOwnerRef: false diff --git a/mp-pkg/charts/datacommons/templates/configmap.yaml b/mp-pkg/charts/datacommons/templates/configmap.yaml new file mode 100644 index 0000000..cf619a3 --- /dev/null +++ b/mp-pkg/charts/datacommons/templates/configmap.yaml @@ -0,0 +1,82 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ include "datacommons.fullname" . }}-config + namespace: {{ .Release.Namespace }} + labels: + {{- include "datacommons.labels" . | nindent 4 }} + annotations: + "helm.sh/hook": pre-install,pre-upgrade + "helm.sh/hook-weight": "-10" + "helm.sh/hook-delete-policy": before-hook-creation +data: + # Data Commons API Configuration + DC_API_ROOT: {{ .Values.config.dcApiRoot | quote }} + WEBSITE_MIXER_API_ROOT: "http://localhost:8081" + + # CloudSQL Configuration + {{- if .Values.config.cloudsql.enabled }} + USE_CLOUDSQL: "true" + CLOUDSQL_INSTANCE: {{ .Values.config.cloudsql.instance | quote }} + CLOUDSQL_USE_PRIVATE_IP: {{ .Values.config.cloudsql.usePrivateIP | default "true" | quote }} + DB_NAME: {{ .Values.config.cloudsql.database | quote }} + DB_USER: {{ .Values.config.cloudsql.user | quote }} + {{- end }} + + # GCS Configuration + {{- if .Values.config.gcs.bucket }} + OUTPUT_DIR: {{ include "datacommons.gcsOutputDir" . | quote }} + INPUT_DIR: {{ include "datacommons.gcsInputDir" . | quote }} + {{- end }} + + # Natural Language / Model Configuration + ENABLE_MODEL: {{ .Values.config.enableNaturalLanguage | quote }} + {{- if not .Values.config.enableNaturalLanguage }} + NL_DISASTER_CONFIG: "" + NL_FULFILLMENT_CONFIG: "" + {{- end }} + + # GCP Project + {{- if .Values.global.projectId }} + GCP_PROJECT_ID: {{ .Values.global.projectId | quote }} + {{- end }} + + # Sample + FLASK_ENV: {{ .Values.config.flaskEnv | quote }} + + # Mixer Settings + {{- if .Values.config.gomaxprocs }} + GOMAXPROCS: {{ .Values.config.gomaxprocs | quote }} + {{- end }} + {{- if .Values.config.maxConnections }} + MAX_CONNECTIONS: {{ .Values.config.maxConnections | quote }} + {{- end }} + + # Debug Settings + {{- if .Values.config.debug }} + DEBUG: "true" + {{- end }} + {{- if .Values.config.enableAdmin }} + ENABLE_ADMIN: "true" + {{- end }} + + {{- with .Values.config.extraEnv }} + # Additional environment variables + {{- range $key, $value := . }} + {{ $key }}: {{ $value | quote }} + {{- end }} + {{- end }} diff --git a/mp-pkg/charts/datacommons/templates/db-init-job.yaml b/mp-pkg/charts/datacommons/templates/db-init-job.yaml new file mode 100644 index 0000000..5ff766f --- /dev/null +++ b/mp-pkg/charts/datacommons/templates/db-init-job.yaml @@ -0,0 +1,76 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +{{- if .Values.dbInit.enabled }} +apiVersion: batch/v1 +kind: Job +metadata: + name: {{ include "datacommons.fullname" . }}-db-init + namespace: {{ .Release.Namespace }} + labels: + {{- include "datacommons.labels" . | nindent 4 }} + app.kubernetes.io/component: db-init + annotations: + "helm.sh/hook": pre-install,pre-upgrade + "helm.sh/hook-weight": "-5" + "helm.sh/hook-delete-policy": before-hook-creation +spec: + ttlSecondsAfterFinished: {{ .Values.dbInit.ttlSecondsAfterFinished | default 3600 }} + backoffLimit: {{ .Values.dbInit.backoffLimit | default 3 }} + activeDeadlineSeconds: {{ .Values.dbInit.activeDeadlineSeconds | default 120 }} + template: + metadata: + labels: + {{- include "datacommons.selectorLabels" . | nindent 8 }} + app.kubernetes.io/component: db-init + annotations: + # Checksum to ensure fresh config on each run + checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }} + checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }} + spec: + serviceAccountName: {{ include "datacommons.serviceAccountName" . }} + restartPolicy: Never + + containers: + - name: db-init + image: "{{ .Values.dbInit.image.repository }}:{{ .Values.dbInit.image.tag }}" + imagePullPolicy: {{ .Values.dbInit.image.pullPolicy | default "IfNotPresent" }} + # Environment variables from ConfigMap and Secrets + envFrom: + # Application configuration + - configMapRef: + name: {{ include "datacommons.fullname" . }}-config + # Helm-generated admin credentials + - secretRef: + name: {{ include "datacommons.fullname" . }} + {{- if .Values.existingSecret }} + # Terraform-managed secrets (DB_PASS, DC_API_KEY, MAPS_API_KEY) + - secretRef: + name: {{ .Values.existingSecret }} + {{- end }} + + # Additional job-specific environment variables + env: + # Data run mode + - name: DATA_RUN_MODE + value: {{ .Values.dbInit.mode | quote }} + # Input/Output directories + - name: INPUT_DIR + value: {{ include "datacommons.gcsInputDir" . | quote }} + - name: OUTPUT_DIR + value: {{ include "datacommons.gcsOutputDir" . | quote }} + + resources: + {{- toYaml .Values.dbInit.resources | nindent 12 }} +{{- end }} diff --git a/mp-pkg/charts/datacommons/templates/db-sync-cronjob.yaml b/mp-pkg/charts/datacommons/templates/db-sync-cronjob.yaml new file mode 100644 index 0000000..5a1a962 --- /dev/null +++ b/mp-pkg/charts/datacommons/templates/db-sync-cronjob.yaml @@ -0,0 +1,77 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +{{- if .Values.dbSync.enabled }} +apiVersion: batch/v1 +kind: CronJob +metadata: + name: {{ include "datacommons.fullname" . }}-db-sync + namespace: {{ .Release.Namespace }} + labels: + {{- include "datacommons.labels" . | nindent 4 }} + app.kubernetes.io/component: db-sync +spec: + schedule: {{ .Values.dbSync.schedule | default "0 */3 * * *" | quote }} + successfulJobsHistoryLimit: {{ .Values.dbSync.successfulJobsHistoryLimit | default 3 }} + failedJobsHistoryLimit: {{ .Values.dbSync.failedJobsHistoryLimit | default 1 }} + concurrencyPolicy: {{ .Values.dbSync.concurrencyPolicy | default "Forbid" }} + jobTemplate: + spec: + ttlSecondsAfterFinished: {{ .Values.dbSync.ttlSecondsAfterFinished | default 3600 }} + backoffLimit: {{ .Values.dbSync.backoffLimit | default 3 }} + template: + metadata: + labels: + {{- include "datacommons.selectorLabels" . | nindent 12 }} + app.kubernetes.io/component: db-sync + annotations: + # Checksum to ensure fresh config on each run + checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }} + checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }} + spec: + serviceAccountName: {{ include "datacommons.serviceAccountName" . }} + restartPolicy: Never + + containers: + - name: db-sync + image: "{{ .Values.dbSync.image.repository }}:{{ .Values.dbSync.image.tag }}" + imagePullPolicy: {{ .Values.dbSync.image.pullPolicy | default "IfNotPresent" }} + # Environment variables from ConfigMap and Secrets + envFrom: + # Application configuration + - configMapRef: + name: {{ include "datacommons.fullname" . }}-config + # Helm-generated admin credentials + - secretRef: + name: {{ include "datacommons.fullname" . }} + {{- if .Values.existingSecret }} + # Terraform-managed secrets (DB_PASS, DC_API_KEY, MAPS_API_KEY) + - secretRef: + name: {{ .Values.existingSecret }} + {{- end }} + + # Additional job-specific environment variables + env: + # Data run mode + - name: DATA_RUN_MODE + value: {{ .Values.dbSync.mode | quote }} + # Input/Output directories + - name: INPUT_DIR + value: {{ include "datacommons.gcsInputDir" . | quote }} + - name: OUTPUT_DIR + value: {{ include "datacommons.gcsOutputDir" . | quote }} + + resources: + {{- toYaml .Values.dbSync.resources | nindent 16 }} +{{- end }} diff --git a/mp-pkg/charts/datacommons/templates/deployment.yaml b/mp-pkg/charts/datacommons/templates/deployment.yaml new file mode 100644 index 0000000..b93c02e --- /dev/null +++ b/mp-pkg/charts/datacommons/templates/deployment.yaml @@ -0,0 +1,130 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: {{ include "datacommons.fullname" . }} + namespace: {{ .Release.Namespace }} + labels: + {{- include "datacommons.labels" . | nindent 4 }} +spec: + replicas: {{ .Values.deployment.replicas }} + selector: + matchLabels: + {{- include "datacommons.selectorLabels" . | nindent 6 }} + app.kubernetes.io/component: website + template: + metadata: + labels: + {{- include "datacommons.labels" . | nindent 8 }} + app.kubernetes.io/component: website + {{- with .Values.deployment.podLabels }} + {{- toYaml . | nindent 8 }} + {{- end }} + annotations: + # Force pod restart when ConfigMap or Secret changes + checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }} + checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }} + {{- with .Values.deployment.podAnnotations }} + {{- toYaml . | nindent 8 }} + {{- end }} + spec: + serviceAccountName: {{ include "datacommons.serviceAccountName" . }} + + {{- with .Values.deployment.securityContext }} + securityContext: + {{- toYaml . | nindent 8 }} + {{- end }} + + containers: + - name: website + image: "{{ .Values.deployment.image.repository }}:{{ .Values.deployment.image.tag }}" + imagePullPolicy: {{ .Values.deployment.image.pullPolicy }} + + ports: + - name: http + containerPort: {{ .Values.service.targetPort }} + protocol: TCP + + # Environment variables from ConfigMap and Secrets + envFrom: + # Application configuration + - configMapRef: + name: {{ include "datacommons.fullname" . }}-config + # Helm-generated admin credentials + - secretRef: + name: {{ include "datacommons.fullname" . }} + {{- if .Values.existingSecret }} + # Terraform-managed secrets (DB_PASS, DC_API_KEY, MAPS_API_KEY) + - secretRef: + name: {{ .Values.existingSecret }} + {{- end }} + + {{- with .Values.deployment.resources }} + resources: + {{- toYaml . | nindent 12 }} + {{- end }} + + {{- if .Values.deployment.probes.startup.enabled }} + startupProbe: + httpGet: + path: /healthz + port: http + initialDelaySeconds: {{ .Values.deployment.probes.startup.initialDelaySeconds }} + periodSeconds: {{ .Values.deployment.probes.startup.periodSeconds }} + timeoutSeconds: {{ .Values.deployment.probes.startup.timeoutSeconds }} + failureThreshold: {{ .Values.deployment.probes.startup.failureThreshold }} + {{- end }} + + {{- if .Values.deployment.probes.readiness.enabled }} + readinessProbe: + httpGet: + path: /healthz + port: http + periodSeconds: {{ .Values.deployment.probes.readiness.periodSeconds }} + timeoutSeconds: {{ .Values.deployment.probes.readiness.timeoutSeconds }} + failureThreshold: {{ .Values.deployment.probes.readiness.failureThreshold }} + {{- end }} + + {{- if .Values.deployment.probes.liveness.enabled }} + livenessProbe: + httpGet: + path: /healthz + port: http + initialDelaySeconds: {{ .Values.deployment.probes.liveness.initialDelaySeconds }} + periodSeconds: {{ .Values.deployment.probes.liveness.periodSeconds }} + timeoutSeconds: {{ .Values.deployment.probes.liveness.timeoutSeconds }} + failureThreshold: {{ .Values.deployment.probes.liveness.failureThreshold }} + {{- end }} + + {{- with .Values.deployment.containerSecurityContext }} + securityContext: + {{- toYaml . | nindent 12 }} + {{- end }} + + {{- with .Values.deployment.nodeSelector }} + nodeSelector: + {{- toYaml . | nindent 8 }} + {{- end }} + + {{- with .Values.deployment.affinity }} + affinity: + {{- toYaml . | nindent 8 }} + {{- end }} + + {{- with .Values.deployment.tolerations }} + tolerations: + {{- toYaml . | nindent 8 }} + {{- end }} diff --git a/mp-pkg/charts/datacommons/templates/secret.yaml b/mp-pkg/charts/datacommons/templates/secret.yaml new file mode 100644 index 0000000..76db52c --- /dev/null +++ b/mp-pkg/charts/datacommons/templates/secret.yaml @@ -0,0 +1,37 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +apiVersion: v1 +kind: Secret +metadata: + name: {{ include "datacommons.fullname" . }} + namespace: {{ include "datacommons.namespace" . }} + labels: + {{- include "datacommons.labels" . | nindent 4 }} + annotations: + "helm.sh/hook": pre-install + "helm.sh/hook-weight": "-10" +type: Opaque +data: + # Admin Panel Credentials + {{- $existingSecret := lookup "v1" "Secret" (include "datacommons.namespace" .) (include "datacommons.fullname" .) }} + {{- if $existingSecret }} + # Preserve existing admin credentials on upgrade + ADMIN_PANEL_USERNAME: {{ index $existingSecret.data "ADMIN_PANEL_USERNAME" }} + ADMIN_PANEL_PASSWORD: {{ index $existingSecret.data "ADMIN_PANEL_PASSWORD" }} + {{- else }} + # Generate new admin credentials on first install + ADMIN_PANEL_USERNAME: {{ "admin" | b64enc | quote }} + ADMIN_PANEL_PASSWORD: {{ randAlphaNum 12 | b64enc | quote }} + {{- end }} diff --git a/mp-pkg/charts/datacommons/templates/service.yaml b/mp-pkg/charts/datacommons/templates/service.yaml new file mode 100644 index 0000000..855e443 --- /dev/null +++ b/mp-pkg/charts/datacommons/templates/service.yaml @@ -0,0 +1,34 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +apiVersion: v1 +kind: Service +metadata: + name: {{ include "datacommons.fullname" . }} + namespace: {{ .Release.Namespace }} + labels: + {{- include "datacommons.labels" . | nindent 4 }} + {{- with .Values.service.annotations }} + annotations: + {{- toYaml . | nindent 4 }} + {{- end }} +spec: + type: {{ .Values.service.type }} + ports: + - port: {{ .Values.service.port }} + targetPort: {{ .Values.service.targetPort }} + protocol: TCP + name: http + selector: + {{- include "datacommons.selectorLabels" . | nindent 4 }} diff --git a/mp-pkg/charts/datacommons/templates/serviceaccount.yaml b/mp-pkg/charts/datacommons/templates/serviceaccount.yaml new file mode 100644 index 0000000..5f450fa --- /dev/null +++ b/mp-pkg/charts/datacommons/templates/serviceaccount.yaml @@ -0,0 +1,33 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +{{- if .Values.serviceAccount.create }} +apiVersion: v1 +kind: ServiceAccount +metadata: + name: {{ include "datacommons.serviceAccountName" . }} + namespace: {{ .Release.Namespace }} + labels: + {{- include "datacommons.labels" . | nindent 4 }} + annotations: + "helm.sh/hook": pre-install,pre-upgrade + "helm.sh/hook-weight": "-10" + "helm.sh/hook-delete-policy": before-hook-creation + {{- if .Values.serviceAccount.gcpServiceAccountEmail }} + iam.gke.io/gcp-service-account: {{ .Values.serviceAccount.gcpServiceAccountEmail }} + {{- end }} + {{- with .Values.serviceAccount.annotations }} + {{- toYaml . | nindent 4 }} + {{- end }} +{{- end }} diff --git a/mp-pkg/charts/datacommons/values.yaml b/mp-pkg/charts/datacommons/values.yaml new file mode 100644 index 0000000..d5485a3 --- /dev/null +++ b/mp-pkg/charts/datacommons/values.yaml @@ -0,0 +1,221 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Global Settings +# ============================================ +global: + # GCP Project ID (injected by Terraform) + projectId: "" + +# ============================================ +# Deployment Configuration +# ============================================ +deployment: + replicas: 1 + + # Container image (injected by Terraform) + image: + repository: "" + tag: "" + pullPolicy: IfNotPresent + + # Resource limits + resources: + requests: + memory: "4Gi" + cpu: "2" + limits: + memory: "4Gi" + cpu: "2" + + # Health probes + probes: + startup: + enabled: true + initialDelaySeconds: 60 + periodSeconds: 10 + timeoutSeconds: 5 + failureThreshold: 30 + readiness: + enabled: true + periodSeconds: 10 + timeoutSeconds: 5 + failureThreshold: 3 + liveness: + enabled: true + initialDelaySeconds: 120 + periodSeconds: 30 + timeoutSeconds: 5 + failureThreshold: 3 + + # Pod security context + securityContext: + fsGroup: 1000 + + # Container security context + containerSecurityContext: {} + + # Node selector + nodeSelector: {} + + # Tolerations + tolerations: [] + + # Affinity + affinity: {} + + # Additional pod annotations + podAnnotations: {} + + # Additional pod labels + podLabels: {} + +# ============================================ +# Database Initialization Job +# ============================================ +dbInit: + enabled: true + + # Data container image (injected by Terraform) + image: + repository: "" + tag: "" + pullPolicy: IfNotPresent + + # Run mode: "schemaupdate" for schema creation only + mode: "schemaupdate" + + # Job settings + ttlSecondsAfterFinished: 3600 + backoffLimit: 3 + activeDeadlineSeconds: 600 + + # Resource limits + resources: + requests: + cpu: "500m" + memory: "1Gi" + limits: + cpu: "2" + memory: "4Gi" + +# ============================================ +# Database Sync CronJob +# ============================================ +dbSync: + enabled: true + + # Cron schedule (default: every 3 hours) + schedule: "0 */3 * * *" + + # Data container image (injected by Terraform) + image: + repository: "" + tag: "" + pullPolicy: IfNotPresent + + # Run mode: "customdc" for data import. But should be left empty to enable it. + mode: "" + + # CronJob settings + concurrencyPolicy: "Forbid" + successfulJobsHistoryLimit: 3 + failedJobsHistoryLimit: 1 + ttlSecondsAfterFinished: 3600 + backoffLimit: 3 + + # Resource limits + resources: + requests: + cpu: "500m" + memory: "1Gi" + limits: + cpu: "2" + memory: "4Gi" + +# ============================================ +# Application Configuration +# ============================================ +config: + # Enable Natural Language features + enableNaturalLanguage: true + + # Enable periodic data sync + enableDataSync: true + + # Sample (FLASK_ENV) + flaskEnv: "custom" + + # Data Commons API root + dcApiRoot: "https://api.datacommons.org" + + # CloudSQL configuration (injected by Terraform) + cloudsql: + enabled: true + # Instance connection name: project:region:instance + instance: "" + # Database name + database: "datacommons" + # Database user + user: "datacommons" + # Use private IP connection + usePrivateIP: true + + # GCS configuration (injected by Terraform) + gcs: + # Base bucket URL + bucket: "" + # Optional path prefix within bucket + pathPrefix: "" + + # Mixer settings + gomaxprocs: 50 + maxConnections: 100 + + # Debug settings + debug: false + enableAdmin: false + + # Additional environment variables + extraEnv: {} + +# ============================================ +# Service Account (Workload Identity) +# ============================================ +serviceAccount: + create: true + # K8s ServiceAccount name + name: "datacommons-ksa" + # GCP Service Account email (injected by Terraform) + gcpServiceAccountEmail: "" + # Additional annotations + annotations: {} + +# ============================================ +# Secrets Configuration +# ============================================ +# Secrets are created by Terraform and referenced here +# The chart does NOT create secrets - only references them +existingSecret: "datacommons-secrets" + + +# ============================================ +# Service Configuration +# ============================================ +service: + type: ClusterIP + port: 8080 + targetPort: 8080 + annotations: {} diff --git a/mp-pkg/terraform/README.md b/mp-pkg/terraform/README.md new file mode 100644 index 0000000..f291d9f --- /dev/null +++ b/mp-pkg/terraform/README.md @@ -0,0 +1,152 @@ +# Data Commons Accelerator - Terraform Infrastructure + +This Terraform configuration deploys Data Commons Accelerator to Google Kubernetes Engine (GKE) with CloudSQL MySQL and Cloud Storage backends. It manages Private Service Access for database connectivity, Workload Identity for secure GCP service integration, and deploys the application via Helm. + +## Enterprise Flexibility Features + +This solution is designed for enterprise environments with maximum flexibility: + +### Resource Naming with Random Suffixes + +By default, resources use auto-generated names with random suffixes to prevent collisions: +- CloudSQL instance: `{deployment-name}-db-{random-suffix}` +- GCS bucket: `{deployment-name}-data-{random-suffix}` +- Service account: `{deployment-name}-sa-{random-suffix}` + +### Name Overrides (Optional) + +For environments requiring specific resource names (compliance, naming conventions): +- `cloudsql_instance_name_override` - Specify exact CloudSQL instance name +- `gcs_bucket_name_override` - Specify exact GCS bucket name +- `service_account_name_override` - Specify exact service account name + +### Pre-existing Resources Support + +- **Namespace**: Set `create_namespace = false` if namespace already exists +- **APIs**: Handles pre-enabled APIs idempotently (no errors if already enabled) + +### Example: Enterprise Deployment with Existing Resources + +```hcl +# Use existing namespace +create_namespace = false +namespace = "datacommons-prod" + +# Override resource names to match enterprise conventions +cloudsql_instance_name_override = "dc-mysql-prod-001" +gcs_bucket_name_override = "company-dc-data-prod" +service_account_name_override = "svc-datacommons-prod" +``` + +## Requirements + +- GCP project with required APIs enabled +- Existing GKE cluster (VPC-native, Workload Identity enabled) — only required when deploying to an existing cluster +- Terraform >= 1.5.7 +- Google Cloud Provider >= 7.0.0 +- Kubernetes Provider >= 2.20 +- Helm Provider >= 2.12 + +### Deployment Service Account IAM Roles + +The deployment service account (created by GCP Marketplace and used by Infrastructure Manager to run Terraform) must have the following IAM roles assigned at the project level: + +| Role | Purpose | +|------|---------| +| `roles/container.developer` | Deploy to GKE cluster (Helm releases, namespaces, secrets) | +| `roles/cloudsql.admin` | Create and manage CloudSQL MySQL instances | +| `roles/storage.admin` | Create and manage GCS buckets | +| `roles/iam.serviceAccountAdmin` | Create and manage GCP service accounts (Workload Identity) | +| `roles/compute.networkAdmin` | Manage VPC networking and Private Service Access | +| `roles/serviceusage.serviceUsageAdmin` | Enable required GCP APIs | +| `roles/serviceusage.apiKeysAdmin` | Create Google Maps API keys | +| `roles/resourcemanager.projectIamAdmin` | Assign IAM roles to service accounts at project level | + +These roles are configured in the GCP Marketplace Producer Portal during solution setup. Additionally, the **Infrastructure Manager Agent** service account requires `roles/resourcemanager.projectIamAdmin` to function. + +## What Gets Deployed + +- CloudSQL MySQL instance with private IP +- Cloud Storage bucket for data artifacts +- Workload Identity binding (GKE service account ↔ GCP service account) +- Google Maps API keys (if needed) +- Kubernetes secrets for database credentials +- Helm release of Data Commons application + +## Private Service Access + +CloudSQL uses private IP connectivity via Private Service Access (PSA). A /20 IP range is automatically allocated and a PSA connection is created during deployment. If the VPC already has PSA configured, the existing ranges are preserved alongside the new one. + +## Configuration + +### Terraform Requirements + +| Name | Version | +|------|---------| +| terraform | >= 1.5.7 | +| google | >= 7.0.0, < 8.0.0 | +| google-beta | >= 7.0.0, < 8.0.0 | +| helm | ~> 2.12.0 | +| kubernetes | >= 2.20.0 | +| random | ~> 3.6.0 | + +### Providers + +| Name | Version | +|------|---------| +| google | 7.15.0 | +| helm | 2.12.1 | +| kubernetes | 3.0.1 | +| random | 3.6.3 | + +### Modules + +| Name | Source | +|------|--------| +| cloudsql | ./modules/cloudsql | +| gcs_bucket | ./modules/gcs-bucket | +| k8s_secrets | ./modules/k8s-secrets | +| maps_api_keys | ./modules/maps-api-keys | + +## Inputs + +| Name | Description | Type | Default | Required | +|------|-------------|------|---------|:--------:| +| goog_cm_deployment_name | Deployment name for the Data Commons Accelerator solution (used by GCP Marketplace for tracking and avoiding resource name collisions) | string | n/a | yes | +| project_id | GCP project ID where Data Commons Accelerator will be deployed | string | n/a | yes | +| create_new_cluster | Create a new GKE cluster with VPC networking (VPC, subnet, Cloud Router, Cloud NAT, and PSA). When false, deploy to an existing cluster specified by gke_cluster_name. | bool | `true` | no | +| region | GCP region for the new GKE cluster and networking resources. Only used when create_new_cluster is true. | string | `"us-central1"` | no | +| gke_cluster_name | Name of the existing GKE cluster to deploy to. Only used when create_new_cluster is false. | string | `""` | no | +| gke_cluster_location | Location (region or zone) of the existing GKE cluster. The GCP region for CloudSQL and other resources is derived from this value. Only used when create_new_cluster is false. | string | `""` | no | +| namespace | Kubernetes namespace for Data Commons Accelerator deployment. Defaults to the deployment name if not provided. | string | `""` | no | +| create_namespace | Create new Kubernetes namespace. Set to false if namespace already exists. | bool | `true` | no | +| cloudsql_instance_name_override | Override CloudSQL instance name (uses generated name if not specified) | string | `""` | no | +| gcs_bucket_name_override | Override GCS bucket name (uses generated name if not specified) | string | `""` | no | +| service_account_name_override | Override service account name (uses generated name if not specified) | string | `""` | no | +| cdc_services_image_repo | Container image repository for CDC Services (populated by GCP Marketplace) | string | n/a | yes | +| cdc_services_image_tag | Container image tag for CDC Services (populated by GCP Marketplace) | string | n/a | yes | +| data_image_repo | Container image repository for Data service (populated by GCP Marketplace) | string | n/a | yes | +| data_image_tag | Container image tag for Data service (populated by GCP Marketplace) | string | n/a | yes | +| helm_chart_repo | Helm chart repository URL (populated by GCP Marketplace) | string | n/a | yes | +| helm_chart_version | Helm chart version (populated by GCP Marketplace) | string | n/a | yes | +| helm_chart_name | Helm chart name (populated by GCP Marketplace) | string | `"datacommons"` | no | +| app_replicas | Number of replicas for the Data Commons Accelerator application deployment | number | `1` | no | +| resource_tier | Resource allocation tier for the application (small, medium, large). Also controls CloudSQL machine tier and high availability. | string | `"medium"` | no | +| flask_env | Data Commons sample (health, education, energy, custom) | string | `"health"` | no | +| dc_api_key | Data Commons API key for accessing Data Commons APIs | string | n/a | yes | +| enable_natural_language | Enable natural language query features | bool | `true` | no | +| enable_data_sync | Enable automatic synchronization of custom data from GCS bucket to CloudSQL database | bool | `true` | no | + +## Outputs + +| Name | Description | +|------|-------------| +| namespace | Kubernetes namespace where DataCommons is deployed | +| gcs\_bucket\_url | GCS bucket URL (gs://\) | +| kubectl\_configure | Command to configure kubectl for your GKE cluster | +| verify\_pods | Command to verify Data Commons pods are running | +| port\_forward | Port-forward command to access Data Commons locally (with auto-retry) | +| cloud\_shell\_access | Cloud Shell quick access instructions for Data Commons | +| upload\_data | Command to upload custom data to GCS bucket | +| view\_logs | Command to view application logs | +| retrieve\_admin\_credentials | Commands to retrieve admin panel credentials (username and password) | diff --git a/mp-pkg/terraform/api-keys.tf b/mp-pkg/terraform/api-keys.tf new file mode 100644 index 0000000..f1d956f --- /dev/null +++ b/mp-pkg/terraform/api-keys.tf @@ -0,0 +1,37 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Google Maps API Keys +# ============================================ +module "maps_api_keys" { + source = "./modules/maps-api-keys" + + project_id = var.project_id + name_prefix = "${local.deployment_name}-maps" + api_targets = [ + "maps-backend.googleapis.com", + "places-backend.googleapis.com" + ] + + labels = merge( + local.common_labels, + { + component = "api-keys" + purpose = "maps-integration" + } + ) + + depends_on = [google_project_service.apis["apikeys.googleapis.com"]] +} diff --git a/mp-pkg/terraform/cloudsql.tf b/mp-pkg/terraform/cloudsql.tf new file mode 100644 index 0000000..e4a0c35 --- /dev/null +++ b/mp-pkg/terraform/cloudsql.tf @@ -0,0 +1,131 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Private Service Access (PSA) - Auto-Detection +# ============================================ + +# Query the Service Networking API to detect existing PSA connections on the VPC. +# Only active in BYO mode — in create mode, the new VPC has no existing PSA. +data "http" "psa_connections" { + count = var.create_new_cluster ? 0 : 1 + + url = "https://servicenetworking.googleapis.com/v1/services/servicenetworking.googleapis.com/connections?network=projects/${data.google_project.current.number}/global/networks/${local.vpc_network_name}" + + request_headers = { + Authorization = "Bearer ${data.google_client_config.default.access_token}" + Accept = "application/json" + } + + depends_on = [ + google_project_service.apis["servicenetworking.googleapis.com"], + ] + + lifecycle { + postcondition { + # Accept 200 (success) as well as 403/404 (no access or no connections) — all are non-fatal. + # Any of these statuses means we can safely fall through to the create path. + condition = contains([200, 403, 404], self.status_code) + error_message = "Service Networking API returned unexpected status ${self.status_code}. Expected 200, 403, or 404." + } + } +} + +# Allocate a /20 IP range for Private Service Access. +# Always created; in BYO mode existing ranges are preserved via concat below. +resource "google_compute_global_address" "cloudsql_private_ip" { + count = local.create_psa_range ? 1 : 0 + + provider = google + + name = "${local.deployment_name}-psa-${local.resource_suffix}" + purpose = "VPC_PEERING" + address_type = "INTERNAL" + prefix_length = local.psa_range_prefix_length + network = local.vpc_network_self_link + project = var.project_id + + labels = merge( + local.common_labels, + { + component = "networking" + purpose = "cloudsql-psa" + } + ) + + depends_on = [ + google_project_service.apis["compute.googleapis.com"], + google_project_service.apis["servicenetworking.googleapis.com"], + ] +} + +# Create or update the Private Service Access connection. +# Always created; reserved_peering_ranges includes BOTH any existing ranges AND the new +# range to prevent destructive replacement of existing peering configuration. +resource "google_service_networking_connection" "cloudsql_private_vpc_connection" { + count = local.create_psa_connection ? 1 : 0 + + provider = google + + network = local.vpc_network_self_link + service = "servicenetworking.googleapis.com" + reserved_peering_ranges = distinct(concat( + local.existing_psa_range_names, + [google_compute_global_address.cloudsql_private_ip[0].name] + )) + + # Prevent deletion of VPC peering connection before CloudSQL instance is removed + deletion_policy = "ABANDON" + + # Handle case where connection already exists with different ranges + update_on_creation_fail = true + + depends_on = [ + google_project_service.apis["compute.googleapis.com"], + google_project_service.apis["servicenetworking.googleapis.com"], + google_compute_global_address.cloudsql_private_ip[0], + ] +} + +# ============================================ +# CloudSQL MySQL Instance +# ============================================ + +module "cloudsql" { + source = "./modules/cloudsql" + + project_id = var.project_id + region = local.cloudsql_region + instance_name = local.cloudsql_instance_name + tier = local.cloudsql_tier + disk_size = local.cloudsql_disk_size + availability_type = local.cloudsql_availability_type + network_self_link = local.vpc_network_self_link + allocated_ip_range = local.psa_range_name + database_name = "datacommons" + user_name = "datacommons" + + labels = merge( + local.common_labels, + { + component = "database" + tier = replace(local.cloudsql_tier, "db-", "") + } + ) + + depends_on = [ + google_service_networking_connection.cloudsql_private_vpc_connection, + data.http.psa_connections, + ] +} \ No newline at end of file diff --git a/mp-pkg/terraform/gcs.tf b/mp-pkg/terraform/gcs.tf new file mode 100644 index 0000000..84ea3ce --- /dev/null +++ b/mp-pkg/terraform/gcs.tf @@ -0,0 +1,71 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# GCS Bucket for DataCommons Data +# ============================================ +module "gcs_bucket" { + source = "./modules/gcs-bucket" + + project_id = var.project_id + bucket_name = local.gcs_bucket_name + location = local.gcs_location + storage_class = "STANDARD" + force_destroy = true + lifecycle_rules = [] + + # IAM Members - empty during plan to avoid for_each dependency issues + # IAM will be granted separately via google_storage_bucket_iam_member + iam_members = [] + + labels = merge( + local.common_labels, + { + component = "storage" + purpose = "datacommons-data" + } + ) + + depends_on = [google_project_service.apis["storage.googleapis.com"]] +} + +# Grant workload service account access to the bucket +# Using separate resource to avoid for_each dependency issue during plan +resource "google_storage_bucket_iam_member" "datacommons_workload_storage_admin" { + bucket = module.gcs_bucket.bucket_name + role = "roles/storage.objectAdmin" + member = "serviceAccount:${google_service_account.datacommons_workload.email}" + + depends_on = [ + module.gcs_bucket, + google_service_account.datacommons_workload + ] +} + +# Create default directory structure required by DataCommons application +resource "google_storage_bucket_object" "input_dir" { + bucket = module.gcs_bucket.bucket_name + name = "input/" + content = " " + + depends_on = [module.gcs_bucket] +} + +resource "google_storage_bucket_object" "output_dir" { + bucket = module.gcs_bucket.bucket_name + name = "output/" + content = " " + + depends_on = [module.gcs_bucket] +} diff --git a/mp-pkg/terraform/gke.tf b/mp-pkg/terraform/gke.tf new file mode 100644 index 0000000..0a98e8d --- /dev/null +++ b/mp-pkg/terraform/gke.tf @@ -0,0 +1,109 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# GKE Node Service Account +# ============================================ + +resource "google_service_account" "gke_node" { + count = var.create_new_cluster ? 1 : 0 + + provider = google + + account_id = local.gke_node_sa_name + display_name = "GKE Node SA (${local.deployment_name})" + description = "Service account for GKE node pools" + project = var.project_id + + depends_on = [google_project_service.apis["iam.googleapis.com"]] +} + +resource "google_project_iam_member" "gke_node_default_sa" { + count = var.create_new_cluster ? 1 : 0 + project = var.project_id + role = "roles/container.defaultNodeServiceAccount" + member = "serviceAccount:${google_service_account.gke_node[0].email}" +} + +resource "google_project_iam_member" "gke_node_metric_writer" { + count = var.create_new_cluster ? 1 : 0 + project = var.project_id + role = "roles/monitoring.metricWriter" + member = "serviceAccount:${google_service_account.gke_node[0].email}" +} + +resource "google_project_iam_member" "gke_node_resource_metadata_writer" { + count = var.create_new_cluster ? 1 : 0 + project = var.project_id + role = "roles/stackdriver.resourceMetadata.writer" + member = "serviceAccount:${google_service_account.gke_node[0].email}" +} + +# ============================================ +# GKE Cluster +# ============================================ + +resource "google_container_cluster" "autopilot" { + count = var.create_new_cluster ? 1 : 0 + + provider = google-beta + + name = "${local.deployment_name}-gke" + location = local.region + project = var.project_id + + enable_autopilot = true + + network = google_compute_network.vpc[0].name + subnetwork = google_compute_subnetwork.primary[0].name + + ip_allocation_policy { + cluster_secondary_range_name = "pods" + services_secondary_range_name = "services" + } + + private_cluster_config { + enable_private_nodes = true + enable_private_endpoint = false + master_ipv4_cidr_block = "172.16.0.0/28" + + master_global_access_config { + enabled = true + } + } + + release_channel { + channel = "STABLE" + } + + workload_identity_config { + workload_pool = "${var.project_id}.svc.id.goog" + } + + deletion_protection = false + + cluster_autoscaling { + auto_provisioning_defaults { + service_account = google_service_account.gke_node[0].email + } + } + + depends_on = [ + google_project_service.apis, + google_compute_subnetwork.primary, + google_project_iam_member.gke_node_default_sa, + google_project_iam_member.gke_node_metric_writer, + google_project_iam_member.gke_node_resource_metadata_writer, + ] +} diff --git a/mp-pkg/terraform/helm.tf b/mp-pkg/terraform/helm.tf new file mode 100644 index 0000000..3e1eb5f --- /dev/null +++ b/mp-pkg/terraform/helm.tf @@ -0,0 +1,210 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Helm Release +# ============================================ +resource "helm_release" "datacommons" { + name = "datacommons" + repository = var.helm_chart_repo + chart = var.helm_chart_name + version = var.helm_chart_version + namespace = local.namespace_name + + wait = true + wait_for_jobs = true + timeout = 900 + + # ============================================ + # Container Images (Marketplace-populated) + # ============================================ + + # CDC Services image + set { + name = "deployment.image.repository" + value = var.cdc_services_image_repo + } + + set { + name = "deployment.image.tag" + value = var.cdc_services_image_tag + } + + # Data service image (db-init) + set { + name = "dbInit.image.repository" + value = var.data_image_repo + } + + set { + name = "dbInit.image.tag" + value = var.data_image_tag + } + + # Data service image (db-sync) + set { + name = "dbSync.image.repository" + value = var.data_image_repo + } + + set { + name = "dbSync.image.tag" + value = var.data_image_tag + } + + # ============================================ + # Application Configuration + # ============================================ + + set { + name = "deployment.replicas" + value = local.tier_preset.replicas + } + + set { + name = "config.enableNaturalLanguage" + value = var.enable_natural_language + } + + set { + name = "config.enableDataSync" + value = var.enable_data_sync + } + + set { + name = "config.flaskEnv" + value = var.flask_env + } + + set { + name = "dbInit.activeDeadlineSeconds" + value = "900" + } + + # ============================================ + # Resource Allocation + # ============================================ + + set { + name = "deployment.resources.limits.memory" + value = local.tier_preset.memory + } + + set { + name = "deployment.resources.limits.cpu" + value = local.tier_preset.cpu + } + + set { + name = "deployment.resources.requests.memory" + value = local.tier_preset.memory + } + + set { + name = "deployment.resources.requests.cpu" + value = local.tier_preset.cpu + } + + # ============================================ + # CloudSQL Configuration + # ============================================ + + set { + name = "config.cloudsql.enabled" + value = "true" + } + + set { + name = "config.cloudsql.instance" + value = module.cloudsql.instance_connection_name + } + + set { + name = "config.cloudsql.database" + value = module.cloudsql.database_name + } + + set { + name = "config.cloudsql.user" + value = module.cloudsql.user_name + } + + set { + name = "config.cloudsql.usePrivateIP" + value = "true" + } + + # ============================================ + # GCS Bucket Configuration + # ============================================ + + set { + name = "config.gcs.bucket" + value = module.gcs_bucket.bucket_url + } + + # ============================================ + # Workload Identity + # ============================================ + + set { + name = "serviceAccount.gcpServiceAccountEmail" + value = google_service_account.datacommons_workload.email + } + + set { + name = "serviceAccount.name" + value = "datacommons-ksa" + } + + set { + name = "serviceAccount.create" + value = "true" + } + + set { + name = "serviceAccount.annotations.iam\\.gke\\.io/gcp-service-account" + value = google_service_account.datacommons_workload.email + } + + # ============================================ + # Secrets Configuration + # ============================================ + + # Use existing Kubernetes secret created by Terraform + set { + name = "existingSecret" + value = "datacommons-secrets" + } + + depends_on = [ + kubernetes_namespace_v1.datacommons, + data.kubernetes_namespace_v1.existing, + module.cloudsql, + module.gcs_bucket, + module.maps_api_keys, + module.k8s_secrets, + google_service_account.datacommons_workload, + google_project_iam_member.datacommons_cloudsql_client, + google_storage_bucket_iam_member.datacommons_workload_storage_admin, + google_storage_bucket_object.input_dir, + google_storage_bucket_object.output_dir, + google_service_account_iam_member.datacommons_workload_identity_user, + google_service_networking_connection.cloudsql_private_vpc_connection, + data.google_container_cluster.gke, + data.google_compute_network.vpc, + google_container_cluster.autopilot, + google_compute_router_nat.nat, + ] +} diff --git a/mp-pkg/terraform/locals.tf b/mp-pkg/terraform/locals.tf new file mode 100644 index 0000000..1b428c2 --- /dev/null +++ b/mp-pkg/terraform/locals.tf @@ -0,0 +1,164 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Local Values +# ============================================ + +locals { + # ============================================ + # Common Labels + # ============================================ + # Applied to all resources + common_labels = { + project = "datacommons" + solution = "datacommons-marketplace" + } + + # ============================================ + # Resource Presets + # ============================================ + # Pre-configured resource allocations for different deployment tiers + # small: Development/testing workloads + # medium: Production workloads with moderate traffic + # large: Production workloads with high traffic + resource_presets = { + small = { + memory = "2Gi" + cpu = "1" + replicas = 1 + cloudsql_tier = "db-n1-standard-1" + cloudsql_disk = 30 + cloudsql_ha = false + } + medium = { + memory = "4Gi" + cpu = "2" + replicas = 2 + cloudsql_tier = "db-n1-standard-2" + cloudsql_disk = 50 + cloudsql_ha = false + } + large = { + memory = "8Gi" + cpu = "4" + replicas = 3 + cloudsql_tier = "db-n1-standard-4" + cloudsql_disk = 100 + cloudsql_ha = true + } + } + + # ============================================ + # Computed Values + # ============================================ + # Derive GCP region from cluster location. + # In create mode: use var.region directly. + # In BYO mode: derive from gke_cluster_location (strip zone suffix if present). + # "europe-west3" → "europe-west3", "europe-west3-a" → "europe-west3" + region = var.create_new_cluster ? var.region : join("-", slice(split("-", var.gke_cluster_location), 0, 2)) + cloudsql_region = local.region + + # Auto-derive GCS location from region + # Maps region prefix to nearest GCS multi-region + gcs_location = ( + startswith(local.region, "us-") || startswith(local.region, "northamerica-") ? "US" : + startswith(local.region, "europe-") ? "EU" : + startswith(local.region, "asia-") || startswith(local.region, "australia-") ? "ASIA" : + "US" # Default fallback + ) + + # All tier-derived values accessed from resource_presets + tier_preset = local.resource_presets[var.resource_tier] + cloudsql_tier = local.tier_preset.cloudsql_tier + cloudsql_ha_enabled = local.tier_preset.cloudsql_ha + cloudsql_availability_type = local.cloudsql_ha_enabled ? "REGIONAL" : "ZONAL" + cloudsql_disk_size = local.tier_preset.cloudsql_disk + + # Resource naming with deployment name for collision avoidance + deployment_name = var.goog_cm_deployment_name + resource_suffix = random_id.suffix.hex + + # ============================================ + # Resource Names (with override support) + # ============================================ + # CloudSQL instance name - use override or computed with random suffix + cloudsql_instance_name = var.cloudsql_instance_name_override != "" ? var.cloudsql_instance_name_override : "${local.deployment_name}-db-${local.resource_suffix}" + + # GCS bucket name - use override or computed with random suffix + gcs_bucket_name_computed = "${local.deployment_name}-data-${local.resource_suffix}" + gcs_bucket_name = var.gcs_bucket_name_override != "" ? var.gcs_bucket_name_override : local.gcs_bucket_name_computed + + # Service account name - use override or computed with random suffix + service_account_name = var.service_account_name_override != "" ? var.service_account_name_override : "${local.deployment_name}-sa-${local.resource_suffix}" + + # GKE node service account - only created for new clusters + gke_node_sa_name = "${substr(local.deployment_name, 0, 14)}-gke-${local.resource_suffix}" + + # ============================================ + # Namespace Derivation + # ============================================ + # Derive namespace from deployment name if not explicitly provided. + # This prevents namespace collisions when users deploy the same solution twice. + namespace = var.namespace != "" ? var.namespace : var.goog_cm_deployment_name + + # ============================================ + # Namespace Reference (conditional creation) + # ============================================ + # Use created namespace or existing namespace based on create_namespace flag + namespace_name = var.create_namespace ? kubernetes_namespace_v1.datacommons[0].metadata[0].name : data.kubernetes_namespace_v1.existing[0].metadata[0].name + + # ============================================ + # VPC Network (conditional: created vs. discovered) + # ============================================ + # In create mode: use the new VPC created in vpc.tf. + # In BYO mode: use the VPC discovered from the existing GKE cluster. + vpc_network_name = var.create_new_cluster ? google_compute_network.vpc[0].name : data.google_compute_network.vpc[0].name + vpc_network_self_link = var.create_new_cluster ? google_compute_network.vpc[0].self_link : data.google_compute_network.vpc[0].self_link + + # ============================================ + # Cluster Configuration (dual-mode) + # ============================================ + # In create mode: use the new Autopilot cluster created in gke.tf. + # In BYO mode: use the existing cluster data source from main.tf. + cluster_endpoint = var.create_new_cluster ? google_container_cluster.autopilot[0].endpoint : data.google_container_cluster.gke[0].endpoint + cluster_ca_cert = var.create_new_cluster ? google_container_cluster.autopilot[0].master_auth[0].cluster_ca_certificate : data.google_container_cluster.gke[0].master_auth[0].cluster_ca_certificate + cluster_name = var.create_new_cluster ? google_container_cluster.autopilot[0].name : var.gke_cluster_name + cluster_location = var.create_new_cluster ? google_container_cluster.autopilot[0].location : var.gke_cluster_location + + # ============================================ + # PSA Auto-Detection Configuration + # ============================================ + # In create mode: always create new PSA (new VPC has no existing PSA). + # In BYO mode: query the Service Networking API to detect existing connections. + + # Parse the HTTP response from the Service Networking API + psa_api_response = var.create_new_cluster ? {} : try(jsondecode(data.http.psa_connections[0].response_body), {}) + existing_psa_connections = try(local.psa_api_response.connections, []) + psa_already_exists = length(local.existing_psa_connections) > 0 + existing_psa_range_names = local.psa_already_exists ? local.existing_psa_connections[0].reservedPeeringRanges : [] + + # Always create new PSA range and connection in both modes. + # In BYO mode, reserved_peering_ranges (in cloudsql.tf) uses distinct(concat()) + # to preserve any existing ranges alongside the new one. + # This eliminates the Terraform 1.5.7 plan-time count limitation entirely. + create_psa_range = true + create_psa_connection = true + + # PSA range prefix length: /20 for all auto-created ranges + psa_range_prefix_length = 20 + + # Always use the newly created range for CloudSQL allocated_ip_range + psa_range_name = google_compute_global_address.cloudsql_private_ip[0].name +} diff --git a/mp-pkg/terraform/main.tf b/mp-pkg/terraform/main.tf new file mode 100644 index 0000000..318b81b --- /dev/null +++ b/mp-pkg/terraform/main.tf @@ -0,0 +1,131 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Provider Configurations +# ============================================ + +# Google Cloud provider +provider "google" { + project = var.project_id + region = local.region +} + +# Google Cloud Beta provider +provider "google-beta" { + project = var.project_id + region = local.region +} + +# Kubernetes provider - configured from cluster locals (works in both create and BYO modes) +provider "kubernetes" { + host = "https://${local.cluster_endpoint}" + token = data.google_client_config.default.access_token + cluster_ca_certificate = base64decode(local.cluster_ca_cert) +} + +# Helm provider - configured from cluster locals (works in both create and BYO modes) +provider "helm" { + kubernetes { + host = "https://${local.cluster_endpoint}" + token = data.google_client_config.default.access_token + cluster_ca_certificate = base64decode(local.cluster_ca_cert) + } +} + +# ============================================ +# Data Sources +# ============================================ +# Get current GCP client configuration for authentication +data "google_client_config" "default" {} + +# Get current GCP project details +data "google_project" "current" { + project_id = var.project_id +} + +# ============================================ +# GCP API Services +# ============================================ + +locals { + # List of all required GCP APIs + required_apis = toset([ + "cloudresourcemanager.googleapis.com", # Project-level resource management + "compute.googleapis.com", # VPC networking and compute resources + "container.googleapis.com", # GKE cluster management + "sqladmin.googleapis.com", # CloudSQL MySQL database + "storage.googleapis.com", # GCS bucket management + "places-backend.googleapis.com", # Maps API + "maps-backend.googleapis.com", # Maps API + "apikeys.googleapis.com", # Maps API key generation + "serviceusage.googleapis.com", # API usage and quotas monitoring + "servicenetworking.googleapis.com", # Private Service Access (CloudSQL) + "iam.googleapis.com", # Service account and IAM management + ]) +} + +resource "google_project_service" "apis" { + for_each = local.required_apis + + project = var.project_id + service = each.value + disable_on_destroy = false + disable_dependent_services = false + + # Prevent race conditions during parallel API enablement + timeouts { + create = "30m" + update = "30m" + } +} + +# ============================================ +# Resource Naming Utilities +# ============================================ + +resource "random_id" "suffix" { + byte_length = 4 + + keepers = { + project_id = var.project_id + } +} + +# ============================================ +# GKE Cluster and VPC Network Discovery (BYO mode only) +# ============================================ +# Only active when create_new_cluster = false. +# In create mode, cluster and VPC are managed resources in gke.tf / vpc.tf. +data "google_container_cluster" "gke" { + count = var.create_new_cluster ? 0 : 1 + name = var.gke_cluster_name + location = var.gke_cluster_location + project = var.project_id + + depends_on = [ + google_project_service.apis["container.googleapis.com"] + ] +} + +# Get full VPC network details for Private Service Access (BYO mode only) +data "google_compute_network" "vpc" { + count = var.create_new_cluster ? 0 : 1 + name = element(split("/", data.google_container_cluster.gke[0].network), length(split("/", data.google_container_cluster.gke[0].network)) - 1) + project = var.project_id + + depends_on = [ + google_project_service.apis["compute.googleapis.com"] + ] +} diff --git a/mp-pkg/terraform/marketplace_test.tfvars b/mp-pkg/terraform/marketplace_test.tfvars new file mode 100644 index 0000000..5fcdb10 --- /dev/null +++ b/mp-pkg/terraform/marketplace_test.tfvars @@ -0,0 +1,31 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# The marketplace_test.tfvars file is used to validate the Terraform template. +# Marketplace will validate your product with this file as its `--var-file` +# argument. +# +# Do not include the following variables in marketplace_test.tfvars, as they +# will be provided by Marketplace: +# +# - project_id +# - helm_chart_repo +# - helm_chart_name +# - helm_chart_version +# - Any variables declared in schema.yaml + +goog_cm_deployment_name = "datacommons-test" +create_new_cluster = true +region = "us-central1" +dc_api_key = "test-api-key-placeholder" diff --git a/mp-pkg/terraform/metadata.display.yaml b/mp-pkg/terraform/metadata.display.yaml new file mode 100644 index 0000000..57917f5 --- /dev/null +++ b/mp-pkg/terraform/metadata.display.yaml @@ -0,0 +1,166 @@ +apiVersion: blueprints.cloud.google.com/v1alpha1 +kind: BlueprintMetadata +metadata: + name: datacommons-marketplace-display + annotations: + config.kubernetes.io/local-config: "true" +spec: + info: + title: Data Commons Accelerator + source: {} + ui: + input: + variables: + goog_cm_deployment_name: + name: goog_cm_deployment_name + title: Deployment Name + tooltip: Name for this deployment (2-18 characters) + placeholder: dc + validation: ^[a-z][a-z0-9-]{0,16}[a-z0-9]$ + create_new_cluster: + name: create_new_cluster + title: Create New GKE Cluster + invisible: true + # To enable Create/Existing cluster toggle, uncomment below and remove invisible: + # tooltip: Choose whether to create a new GKE Autopilot cluster or use an existing one. + # xGoogleProperty: + # type: ET_CREATE_RESOURCE + region: + name: region + title: GCP Region + tooltip: GCP region where the cluster and cloud resources will be created. + xGoogleProperty: + type: ET_GCE_REGION + gke_cluster_name: + name: gke_cluster_name + title: GKE Cluster Name + invisible: true + # To enable cluster picker, uncomment below and remove invisible: + # tooltip: Select your existing GKE cluster for deployment + # section: cluster + # xGoogleProperty: + # type: ET_GKE_CLUSTER + # gkeCluster: + # locationVariable: gke_cluster_location + # clusterCreationVariable: create_new_cluster + resource_tier: + name: resource_tier + title: Resource Tier + tooltip: | + Resource allocation for the deployment. Controls both application pod sizing and database tier. + Small: App 2Gi/1CPU, DB Standard-1 (dev/test) + Medium: App 4Gi/2CPU, DB Standard-2 + Large: App 8Gi/4CPU, DB Standard-4 + section: application + enumValueLabels: + - label: Small - 2Gi RAM, 1 CPU + value: small + - label: Medium - 4Gi RAM, 2 CPU (recommended) + value: medium + - label: Large - 8Gi RAM, 4 CPU + value: large + flask_env: + name: flask_env + title: Samples + tooltip: | + Select a pre-built Data Commons configuration optimized for a specific domain. + Each sample includes curated datasets, statistical variables, and visualizations tailored to that subject area. + Select the sample that best matches your use case. + placeholder: Choose from the list + section: application + enumValueLabels: + - label: Education + value: education + - label: Health + value: health + - label: Energy + value: energy + - label: Custom + value: custom + dc_api_key: + name: dc_api_key + title: Data Commons API Key + tooltip: | + API key for Data Commons API access. + Get yours at: https://docs.datacommons.org/custom_dc/quickstart.html#get-a-data-commons-api-key + placeholder: "..." + section: api + validation: ^[A-Za-z0-9_-]+$ + gke_cluster_location: + name: gke_cluster_location + title: GKE Cluster Location + invisible: true + # To enable location picker, uncomment below and remove invisible: + # xGoogleProperty: + # type: ET_GCE_LOCATION + project_id: + name: project_id + title: Project ID + invisible: true + create_namespace: + name: create_namespace + title: Create Namespace + invisible: true + namespace: + name: namespace + title: Kubernetes Namespace + invisible: true + app_replicas: + name: app_replicas + title: Application Replicas + invisible: true + enable_natural_language: + name: enable_natural_language + title: Enable Natural Language Queries + invisible: true + enable_data_sync: + name: enable_data_sync + title: Enable Custom Data Sync + invisible: true + cloudsql_instance_name_override: + name: cloudsql_instance_name_override + title: CloudSQL Instance Name Override + invisible: true + gcs_bucket_name_override: + name: gcs_bucket_name_override + title: GCS Bucket Name Override + invisible: true + service_account_name_override: + name: service_account_name_override + title: Service Account Name Override + invisible: true + cdc_services_image_repo: + name: cdc_services_image_repo + title: CDC Services Image Repository + invisible: true + cdc_services_image_tag: + name: cdc_services_image_tag + title: CDC Services Image Tag + invisible: true + data_image_repo: + name: data_image_repo + title: Data Image Repository + invisible: true + data_image_tag: + name: data_image_tag + title: Data Image Tag + invisible: true + helm_chart_repo: + name: helm_chart_repo + title: Helm Chart Repository + invisible: true + helm_chart_name: + name: helm_chart_name + title: Helm Chart Name + invisible: true + helm_chart_version: + name: helm_chart_version + title: Helm Chart Version + invisible: true + sections: + - name: application + title: Application Settings + tooltip: Data Commons Accelerator application configuration and resource allocation + - name: api + title: API Keys + tooltip: API keys required for Data Commons and Google Maps integration diff --git a/mp-pkg/terraform/metadata.yaml b/mp-pkg/terraform/metadata.yaml new file mode 100644 index 0000000..4f663bb --- /dev/null +++ b/mp-pkg/terraform/metadata.yaml @@ -0,0 +1,141 @@ +apiVersion: blueprints.cloud.google.com/v1alpha1 +kind: BlueprintMetadata +metadata: + name: datacommons-marketplace + annotations: + config.kubernetes.io/local-config: "true" +spec: + info: + title: Data Commons Accelerator + source: {} + actuationTool: + flavor: Terraform + version: ">= 1.5.7" + description: {} + content: + subBlueprints: + - name: cloudsql + location: modules/cloudsql + - name: gcs-bucket + location: modules/gcs-bucket + - name: k8s-secrets + location: modules/k8s-secrets + - name: maps-api-keys + location: modules/maps-api-keys + interfaces: + variables: + - name: goog_cm_deployment_name + description: Deployment name for the Data Commons Accelerator solution (used by GCP Marketplace for tracking and avoiding resource name collisions) + varType: string + required: true + - name: project_id + description: GCP project ID where Data Commons Accelerator will be deployed + varType: string + required: true + - name: create_new_cluster + description: Create a new GKE cluster with VPC networking. + varType: bool + defaultValue: true + - name: region + description: GCP region for new cluster and resources (e.g., us-central1). Only used when create_new_cluster is true. + varType: string + defaultValue: us-central1 + - name: gke_cluster_name + description: Name of an existing GKE cluster. Only used when create_new_cluster is false. + varType: string + defaultValue: "" + - name: gke_cluster_location + description: Location (region or zone) of the existing GKE cluster. Only used when create_new_cluster is false. + varType: string + defaultValue: "" + - name: namespace + description: Kubernetes namespace for Data Commons Accelerator deployment. Defaults to the deployment name (goog_cm_deployment_name) if not provided. + varType: string + defaultValue: "" + - name: create_namespace + description: Create new Kubernetes namespace. Set to false if namespace already exists in the cluster. + varType: bool + defaultValue: true + - name: cloudsql_instance_name_override + description: Override CloudSQL instance name (uses generated name with random suffix if not specified) + varType: string + defaultValue: "" + - name: gcs_bucket_name_override + description: Override GCS bucket name (uses generated name with random suffix if not specified) + varType: string + defaultValue: "" + - name: service_account_name_override + description: Override GCP service account name (uses generated name with random suffix if not specified) + varType: string + defaultValue: "" + - name: dc_api_key + description: Data Commons API key for accessing Data Commons APIs + varType: string + required: true + - name: app_replicas + description: Number of replicas for the Data Commons Accelerator application deployment + varType: number + defaultValue: 1 + - name: resource_tier + description: Resource allocation tier controlling application pod resources and CloudSQL database sizing (small, medium, large) + varType: string + defaultValue: medium + - name: enable_natural_language + description: Enable natural language query features + varType: bool + defaultValue: true + - name: enable_data_sync + description: Enable automatic synchronization of custom data from GCS bucket to CloudSQL database + varType: bool + defaultValue: true + - name: flask_env + description: Data Commons sample (pre-built configurations for specific domains) + varType: string + defaultValue: health + - name: cdc_services_image_repo + description: Container image repository for CDC Services (populated by GCP Marketplace) + varType: string + defaultValue: "" + - name: cdc_services_image_tag + description: Container image tag for CDC Services (populated by GCP Marketplace) + varType: string + defaultValue: "" + - name: data_image_repo + description: Container image repository for Data service (populated by GCP Marketplace) + varType: string + defaultValue: "" + - name: data_image_tag + description: Container image tag for Data service (populated by GCP Marketplace) + varType: string + defaultValue: "" + - name: helm_chart_repo + description: Helm chart repository URL (populated by GCP Marketplace) + varType: string + defaultValue: "" + - name: helm_chart_name + description: Helm chart name (populated by GCP Marketplace) + varType: string + defaultValue: datacommons + - name: helm_chart_version + description: Helm chart version (populated by GCP Marketplace) + varType: string + defaultValue: "" + outputs: + - name: cloud_shell_access + description: "Cloud Shell quick access: GKE Console > cluster > Connect > Run in Cloud Shell, then run the port-forward command" + - name: gcs_bucket_url + description: GCS bucket URL (gs://) + - name: kubectl_configure + description: Command to configure kubectl for your GKE cluster + - name: namespace + description: Kubernetes namespace where DataCommons is deployed + - name: port_forward + description: Port-forward command to access Data Commons locally (with auto-retry) + - name: retrieve_admin_credentials + description: Commands to retrieve admin panel credentials (username and password) + - name: upload_data + description: Command to upload custom data to GCS bucket + - name: verify_pods + description: Command to verify Data Commons pods are running + - name: view_logs + description: Command to view application logs diff --git a/mp-pkg/terraform/modules/cloudsql/main.tf b/mp-pkg/terraform/modules/cloudsql/main.tf new file mode 100644 index 0000000..fd2e20c --- /dev/null +++ b/mp-pkg/terraform/modules/cloudsql/main.tf @@ -0,0 +1,127 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# CloudSQL MySQL Instance +# ============================================ +resource "google_sql_database_instance" "instance" { + project = var.project_id + name = var.instance_name + region = var.region + database_version = var.database_version + + settings { + tier = var.tier + availability_type = var.availability_type + + # Storage configuration + disk_size = var.disk_size + disk_type = var.disk_type + disk_autoresize = var.disk_autoresize + disk_autoresize_limit = var.disk_autoresize_limit + + # Backup configuration + backup_configuration { + enabled = true + start_time = "03:00" + binary_log_enabled = var.point_in_time_recovery_enabled + location = var.backup_location + point_in_time_recovery_enabled = var.point_in_time_recovery_enabled + transaction_log_retention_days = 7 + backup_retention_settings { + retained_backups = 7 + retention_unit = "COUNT" + } + } + + # Network configuration - Private IP only + ip_configuration { + ipv4_enabled = false + private_network = var.network_self_link + ssl_mode = "ENCRYPTED_ONLY" + allocated_ip_range = var.allocated_ip_range + enable_private_path_for_google_cloud_services = true + } + + # Maintenance window + maintenance_window { + day = 7 # Sunday + hour = 3 # 3 AM + update_track = "stable" + } + + # Database flags (MySQL specific) + database_flags { + name = "local_infile" + value = "off" + } + + # Query Insights + insights_config { + query_insights_enabled = true + query_plans_per_minute = 5 + query_string_length = 1024 + record_application_tags = true + record_client_address = true + } + + # User labels + user_labels = var.labels + } + + deletion_protection = var.deletion_protection + + # Prevent disk size downgrade after autoresize + lifecycle { + ignore_changes = [ + settings[0].disk_size + ] + } +} + +# ============================================ +# Database Creation +# ============================================ +resource "google_sql_database" "database" { + project = var.project_id + instance = google_sql_database_instance.instance.name + name = var.database_name + charset = "utf8mb4" + collation = "utf8mb4_unicode_ci" +} + +# ============================================ +# Password Management +# ============================================ +# Generate random password +resource "random_password" "db_password" { + length = 16 + special = true + # Ensure password meets MySQL requirements + min_lower = 1 + min_upper = 1 + min_numeric = 1 + min_special = 1 +} + +# Create database user with generated password +resource "google_sql_user" "user" { + project = var.project_id + instance = google_sql_database_instance.instance.name + name = var.user_name + host = "%" + password = random_password.db_password.result + + depends_on = [google_sql_database_instance.instance] +} diff --git a/mp-pkg/terraform/modules/cloudsql/outputs.tf b/mp-pkg/terraform/modules/cloudsql/outputs.tf new file mode 100644 index 0000000..5184ee1 --- /dev/null +++ b/mp-pkg/terraform/modules/cloudsql/outputs.tf @@ -0,0 +1,107 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Outputs +# ============================================ + +# ============================================ +# Instance Information +# ============================================ +output "instance_name" { + description = "CloudSQL instance name" + value = google_sql_database_instance.instance.name +} + +output "instance_connection_name" { + description = "CloudSQL instance connection name (project:region:instance) for Cloud SQL Auth Proxy" + value = google_sql_database_instance.instance.connection_name +} + +output "instance_self_link" { + description = "CloudSQL instance self link" + value = google_sql_database_instance.instance.self_link +} + +output "instance_service_account_email" { + description = "CloudSQL instance service account email" + value = google_sql_database_instance.instance.service_account_email_address +} + +# ============================================ +# Network Information +# ============================================ +output "private_ip_address" { + description = "Private IP address for direct connection from GKE (within VPC)" + value = google_sql_database_instance.instance.private_ip_address +} + +# ============================================ +# Database Information +# ============================================ +output "database_name" { + description = "Name of the created database" + value = var.database_name +} + +# ============================================ +# User Credentials +# ============================================ +output "user_name" { + description = "Database user name" + value = var.user_name +} + +output "user_password" { + description = "Database user password (auto-generated)" + value = random_password.db_password.result + sensitive = true +} + +# ============================================ +# Connection Information +# ============================================ +output "connection_string" { + description = "MySQL connection string for direct private IP connection" + value = "mysql://${var.user_name}@${google_sql_database_instance.instance.private_ip_address}:3306/${var.database_name}" + sensitive = true +} + +output "jdbc_connection_string" { + description = "JDBC connection string for applications" + value = "jdbc:mysql://${google_sql_database_instance.instance.private_ip_address}:3306/${var.database_name}?useSSL=true&requireSSL=true" +} + +# ============================================ +# Kubernetes Configuration +# ============================================ +output "k8s_env_vars" { + description = "Environment variables for Kubernetes deployments" + value = { + CLOUDSQL_INSTANCE = google_sql_database_instance.instance.connection_name + DB_HOST = google_sql_database_instance.instance.private_ip_address + DB_PORT = "3306" + DB_NAME = var.database_name + DB_USER = var.user_name + USE_CLOUDSQL = "true" + } +} + +output "k8s_secret_data" { + description = "Secret data for Kubernetes Secret resource (use with k8s-secrets module)" + value = { + DB_PASSWORD = random_password.db_password.result + } + sensitive = true +} diff --git a/mp-pkg/terraform/modules/cloudsql/variables.tf b/mp-pkg/terraform/modules/cloudsql/variables.tf new file mode 100644 index 0000000..8fb8259 --- /dev/null +++ b/mp-pkg/terraform/modules/cloudsql/variables.tf @@ -0,0 +1,170 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Variables +# ============================================ +variable "project_id" { + description = "The GCP project ID" + type = string +} + +variable "region" { + description = "The GCP region for the CloudSQL instance" + type = string +} + +variable "instance_name" { + description = "CloudSQL instance name (must be unique within project)" + type = string +} + +variable "network_self_link" { + description = "VPC network self link for private IP connection (from GKE cluster VPC)" + type = string +} + +variable "allocated_ip_range" { + description = "Name of the allocated IP range for Private Service Access (PSA)" + type = string +} + +# ============================================ +# Instance Configuration +# ============================================ +variable "database_version" { + description = "MySQL version (MYSQL_8_0, MYSQL_8_0_36, etc.)" + type = string + default = "MYSQL_8_0" +} + +variable "tier" { + description = "Machine tier (e.g., db-f1-micro, db-n1-standard-1, db-custom-2-7680)" + type = string + default = "db-n1-standard-1" +} + +variable "zone" { + description = "Primary zone for the CloudSQL instance" + type = string + default = null +} + +variable "secondary_zone" { + description = "Secondary zone for high availability (required if availability_type is REGIONAL)" + type = string + default = null +} + +variable "availability_type" { + description = "Availability type: ZONAL or REGIONAL (REGIONAL provides high availability)" + type = string + default = "ZONAL" + + validation { + condition = contains(["ZONAL", "REGIONAL"], var.availability_type) + error_message = "Availability type must be either ZONAL or REGIONAL." + } +} + +# ============================================ +# Storage Configuration +# ============================================ +variable "disk_size" { + description = "Disk size in GB (minimum 10 GB for MySQL)" + type = number + default = 20 + + validation { + condition = var.disk_size >= 10 + error_message = "Disk size must be at least 10 GB for MySQL." + } +} + +variable "disk_type" { + description = "Disk type: PD_SSD (recommended for production) or PD_HDD" + type = string + default = "PD_SSD" + + validation { + condition = contains(["PD_SSD", "PD_HDD"], var.disk_type) + error_message = "Disk type must be either PD_SSD or PD_HDD." + } +} + +variable "disk_autoresize" { + description = "Enable automatic disk size increase" + type = bool + default = true +} + +variable "disk_autoresize_limit" { + description = "Maximum disk size in GB for autoresize (0 = unlimited)" + type = number + default = 0 +} + +# ============================================ +# Backup Configuration +# ============================================ +variable "backup_location" { + description = "Backup location (defaults to instance region if not specified)" + type = string + default = null +} + +variable "point_in_time_recovery_enabled" { + description = "Enable point-in-time recovery (requires binary logging)" + type = bool + default = false +} + +# ============================================ +# Database and User Configuration +# ============================================ +variable "database_name" { + description = "Name of the database to create" + type = string + default = "datacommons" +} + +variable "user_name" { + description = "Name of the database user to create" + type = string + default = "datacommons" +} + +# ============================================ +# Security Configuration +# ============================================ +variable "deletion_protection" { + description = "Terraform deletion protection (prevents accidental terraform destroy)" + type = bool + default = false +} + +variable "deletion_protection_enabled" { + description = "GCP deletion protection (prevents deletion via console/API)" + type = bool + default = false +} + +# ============================================ +# Labels +# ============================================ +variable "labels" { + description = "Resource labels/tags for organization and cost tracking" + type = map(string) + default = {} +} diff --git a/mp-pkg/terraform/modules/cloudsql/versions.tf b/mp-pkg/terraform/modules/cloudsql/versions.tf new file mode 100644 index 0000000..efad6fe --- /dev/null +++ b/mp-pkg/terraform/modules/cloudsql/versions.tf @@ -0,0 +1,32 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +terraform { + required_version = ">= 1.5.7" + + required_providers { + google = { + source = "hashicorp/google" + version = ">= 7.0.0, < 8.0.0" + } + google-beta = { + source = "hashicorp/google-beta" + version = ">= 7.0.0, < 8.0.0" + } + random = { + source = "hashicorp/random" + version = ">= 3.0.0" + } + } +} diff --git a/mp-pkg/terraform/modules/gcs-bucket/main.tf b/mp-pkg/terraform/modules/gcs-bucket/main.tf new file mode 100644 index 0000000..5d087a6 --- /dev/null +++ b/mp-pkg/terraform/modules/gcs-bucket/main.tf @@ -0,0 +1,92 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# GCS Bucket Resource +# Creates the Cloud Storage bucket with security and lifecycle settings +# ============================================ +resource "google_storage_bucket" "bucket" { + name = var.bucket_name + project = var.project_id + location = var.location + storage_class = var.storage_class + force_destroy = var.force_destroy + labels = var.labels + + # Security: Uniform bucket-level access (replaces legacy object ACLs) + # This enforces IAM-only access control, improving security posture + uniform_bucket_level_access = true + + # Versioning: Protect against accidental overwrites and deletions + # Conditionally enabled based on var.versioning_enabled + dynamic "versioning" { + for_each = var.versioning_enabled ? [1] : [] + content { + enabled = true + } + } + + # Encryption: Customer-managed encryption keys (CMEK) via Cloud KMS + # If encryption_key_name is null, Google-managed keys are used automatically + dynamic "encryption" { + for_each = var.encryption_key_name != null ? [1] : [] + content { + default_kms_key_name = var.encryption_key_name + } + } + + # Lifecycle Rules: Automated object lifecycle management + # Supports transitions, deletions, and versioning actions + dynamic "lifecycle_rule" { + for_each = var.lifecycle_rules + content { + # Action block: What to do with matching objects + action { + type = lifecycle_rule.value.action.type + storage_class = lookup(lifecycle_rule.value.action, "storage_class", null) + } + + # Condition block: When to apply this rule + condition { + age = lookup(lifecycle_rule.value.condition, "age", null) + created_before = lookup(lifecycle_rule.value.condition, "created_before", null) + with_state = lookup(lifecycle_rule.value.condition, "with_state", null) + matches_storage_class = lookup(lifecycle_rule.value.condition, "matches_storage_class", null) + matches_prefix = lookup(lifecycle_rule.value.condition, "matches_prefix", null) + matches_suffix = lookup(lifecycle_rule.value.condition, "matches_suffix", null) + num_newer_versions = lookup(lifecycle_rule.value.condition, "num_newer_versions", null) + custom_time_before = lookup(lifecycle_rule.value.condition, "custom_time_before", null) + days_since_custom_time = lookup(lifecycle_rule.value.condition, "days_since_custom_time", null) + days_since_noncurrent_time = lookup(lifecycle_rule.value.condition, "days_since_noncurrent_time", null) + noncurrent_time_before = lookup(lifecycle_rule.value.condition, "noncurrent_time_before", null) + } + } + } +} + +# ============================================ +# IAM Bindings +# Grant access to bucket using IAM roles +# Each member is bound independently for safe incremental additions +# ============================================ +resource "google_storage_bucket_iam_member" "members" { + for_each = { + for idx, member in var.iam_members : + "${member.role}-${member.member}" => member + } + + bucket = google_storage_bucket.bucket.name + role = each.value.role + member = each.value.member +} diff --git a/mp-pkg/terraform/modules/gcs-bucket/outputs.tf b/mp-pkg/terraform/modules/gcs-bucket/outputs.tf new file mode 100644 index 0000000..dc03fa6 --- /dev/null +++ b/mp-pkg/terraform/modules/gcs-bucket/outputs.tf @@ -0,0 +1,37 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# GCS Bucket Outputs +# ============================================ + +output "bucket_name" { + description = "Name of the created GCS bucket" + value = google_storage_bucket.bucket.name +} + +output "bucket_url" { + description = "GCS bucket URL (gs://)" + value = google_storage_bucket.bucket.url +} + +output "bucket_self_link" { + description = "GCS bucket self link (full resource URI)" + value = google_storage_bucket.bucket.self_link +} + +output "bucket" { + description = "Full bucket resource object" + value = google_storage_bucket.bucket +} diff --git a/mp-pkg/terraform/modules/gcs-bucket/variables.tf b/mp-pkg/terraform/modules/gcs-bucket/variables.tf new file mode 100644 index 0000000..9c84356 --- /dev/null +++ b/mp-pkg/terraform/modules/gcs-bucket/variables.tf @@ -0,0 +1,107 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# DataCommons GCS Bucket Module Variables +# ============================================ + +# ============================================ +# Required Variables +# ============================================ +variable "project_id" { + description = "The GCP project ID" + type = string +} + +variable "bucket_name" { + description = "Name of the GCS bucket (must be globally unique)" + type = string +} + +# ============================================ +# Location Configuration +# ============================================ +variable "location" { + description = "Bucket location (e.g., US, EU, ASIA, or specific region like us-central1)" + type = string + default = "US" +} + +variable "storage_class" { + description = "Storage class: STANDARD, NEARLINE, COLDLINE, or ARCHIVE" + type = string + default = "STANDARD" + + validation { + condition = contains(["STANDARD", "NEARLINE", "COLDLINE", "ARCHIVE"], var.storage_class) + error_message = "Storage class must be STANDARD, NEARLINE, COLDLINE, or ARCHIVE." + } +} + +# ============================================ +# Versioning Configuration +# ============================================ +variable "versioning_enabled" { + description = "Enable object versioning (recommended for data protection)" + type = bool + default = true +} + +# ============================================ +# Lifecycle Rules +# ============================================ +variable "lifecycle_rules" { + description = "Lifecycle rules for object management (passed directly to underlying module)" + type = any + default = [] +} + +# ============================================ +# IAM Configuration +# ============================================ +variable "iam_members" { + description = "IAM members and their roles for bucket access" + type = list(object({ + role = string + member = string + })) + default = [] +} + +# ============================================ +# Encryption Configuration +# ============================================ +variable "encryption_key_name" { + description = "Cloud KMS key name for customer-managed encryption (optional, uses Google-managed keys by default)" + type = string + default = null +} + +# ============================================ +# Deletion Protection +# ============================================ +variable "force_destroy" { + description = "Allow bucket deletion even if it contains objects (use with caution)" + type = bool + default = false +} + +# ============================================ +# Labels +# ============================================ +variable "labels" { + description = "Resource labels/tags for organization and cost tracking" + type = map(string) + default = {} +} diff --git a/mp-pkg/terraform/modules/gcs-bucket/versions.tf b/mp-pkg/terraform/modules/gcs-bucket/versions.tf new file mode 100644 index 0000000..43edb22 --- /dev/null +++ b/mp-pkg/terraform/modules/gcs-bucket/versions.tf @@ -0,0 +1,24 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +terraform { + required_version = ">= 1.5.7" + + required_providers { + google = { + source = "hashicorp/google" + version = ">= 7.0.0, < 8.0.0" + } + } +} diff --git a/mp-pkg/terraform/modules/k8s-secrets/main.tf b/mp-pkg/terraform/modules/k8s-secrets/main.tf new file mode 100644 index 0000000..48df4ed --- /dev/null +++ b/mp-pkg/terraform/modules/k8s-secrets/main.tf @@ -0,0 +1,30 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Kubernetes Secrets +# ============================================ + +resource "kubernetes_secret_v1" "secrets" { + for_each = var.secrets + + metadata { + name = each.key + namespace = var.namespace + labels = merge(var.labels, each.value.labels) + } + + data = each.value.data + type = var.secret_type +} diff --git a/mp-pkg/terraform/modules/k8s-secrets/outputs.tf b/mp-pkg/terraform/modules/k8s-secrets/outputs.tf new file mode 100644 index 0000000..2878f9c --- /dev/null +++ b/mp-pkg/terraform/modules/k8s-secrets/outputs.tf @@ -0,0 +1,32 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Kubernetes Secrets Outputs +# ============================================ + +output "secret_names" { + description = "Map of secret keys to their created names in Kubernetes" + value = { for k, v in kubernetes_secret_v1.secrets : k => v.metadata[0].name } +} + +output "secret_namespaces" { + description = "Map of secret keys to their namespaces" + value = { for k, v in kubernetes_secret_v1.secrets : k => v.metadata[0].namespace } +} + +output "secret_ids" { + description = "Map of secret keys to their full Kubernetes resource IDs (namespace/name)" + value = { for k, v in kubernetes_secret_v1.secrets : k => "${v.metadata[0].namespace}/${v.metadata[0].name}" } +} diff --git a/mp-pkg/terraform/modules/k8s-secrets/variables.tf b/mp-pkg/terraform/modules/k8s-secrets/variables.tf new file mode 100644 index 0000000..134dbeb --- /dev/null +++ b/mp-pkg/terraform/modules/k8s-secrets/variables.tf @@ -0,0 +1,75 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Kubernetes Secrets +# ============================================ +variable "namespace" { + description = "Kubernetes namespace where secrets will be created" + type = string +} + +variable "secrets" { + description = <<-EOT + Map of secrets to create. Each secret can have: + - data: Map of key-value pairs (values will be base64 encoded by Kubernetes) + - labels: Optional additional labels for this specific secret + + Example: + { + "db-credentials" = { + data = { + DB_PASSWORD = "secret-value" + DB_USER = "myuser" + } + labels = { + component = "database" + } + } + } + EOT + type = map(object({ + data = map(string) + labels = optional(map(string), {}) + })) +} + +# ============================================ +# Optional Variables +# ============================================ +variable "secret_type" { + description = "Kubernetes secret type (Opaque, kubernetes.io/tls, etc.)" + type = string + default = "Opaque" + + validation { + condition = contains([ + "Opaque", + "kubernetes.io/service-account-token", + "kubernetes.io/dockercfg", + "kubernetes.io/dockerconfigjson", + "kubernetes.io/basic-auth", + "kubernetes.io/ssh-auth", + "kubernetes.io/tls", + "bootstrap.kubernetes.io/token" + ], var.secret_type) + error_message = "Secret type must be a valid Kubernetes secret type." + } +} + +variable "labels" { + description = "Common labels to apply to all secrets" + type = map(string) + default = {} +} diff --git a/mp-pkg/terraform/modules/k8s-secrets/versions.tf b/mp-pkg/terraform/modules/k8s-secrets/versions.tf new file mode 100644 index 0000000..3b1b330 --- /dev/null +++ b/mp-pkg/terraform/modules/k8s-secrets/versions.tf @@ -0,0 +1,24 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +terraform { + required_version = ">= 1.5.7" + + required_providers { + kubernetes = { + source = "hashicorp/kubernetes" + version = ">= 2.0.0" + } + } +} diff --git a/mp-pkg/terraform/modules/maps-api-keys/main.tf b/mp-pkg/terraform/modules/maps-api-keys/main.tf new file mode 100644 index 0000000..d04ea6f --- /dev/null +++ b/mp-pkg/terraform/modules/maps-api-keys/main.tf @@ -0,0 +1,70 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# API Keys +# ============================================ + +# Random suffix for unique naming +resource "random_id" "key_suffix" { + byte_length = 4 +} + +# ============================================ +# Google Maps API Key +# ============================================ +resource "google_apikeys_key" "maps_key" { + project = var.project_id + name = "${var.name_prefix}-key-${random_id.key_suffix.hex}" + display_name = "${var.name_prefix} Maps API Key" + + restrictions { + dynamic "api_targets" { + for_each = var.api_targets + content { + service = api_targets.value + } + } + + # Optional browser key restrictions + dynamic "browser_key_restrictions" { + for_each = length(var.allowed_referrers) > 0 ? [1] : [] + content { + allowed_referrers = var.allowed_referrers + } + } + + # Optional Android app restrictions + dynamic "android_key_restrictions" { + for_each = length(var.allowed_android_applications) > 0 ? [1] : [] + content { + dynamic "allowed_applications" { + for_each = var.allowed_android_applications + content { + package_name = allowed_applications.value.package_name + sha1_fingerprint = allowed_applications.value.sha1_fingerprint + } + } + } + } + + # Optional iOS app restrictions + dynamic "ios_key_restrictions" { + for_each = length(var.allowed_ios_bundle_ids) > 0 ? [1] : [] + content { + allowed_bundle_ids = var.allowed_ios_bundle_ids + } + } + } +} diff --git a/mp-pkg/terraform/modules/maps-api-keys/outputs.tf b/mp-pkg/terraform/modules/maps-api-keys/outputs.tf new file mode 100644 index 0000000..d217e11 --- /dev/null +++ b/mp-pkg/terraform/modules/maps-api-keys/outputs.tf @@ -0,0 +1,49 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# API Keys Outputs +# ============================================ + +output "api_key_id" { + description = "The unique identifier of the API key resource" + value = google_apikeys_key.maps_key.id +} + +output "api_key_name" { + description = "The resource name of the API key" + value = google_apikeys_key.maps_key.name +} + +output "api_key_uid" { + description = "The unique ID assigned to the API key by Google" + value = google_apikeys_key.maps_key.uid +} + +output "api_key" { + description = "The actual API key value" + value = google_apikeys_key.maps_key.key_string + sensitive = true +} + +# ============================================ +# Kubernetes Secret Data +# ============================================ +output "k8s_secret_data" { + description = "Secret data for Kubernetes Secret resource" + value = { + MAPS_API_KEY = google_apikeys_key.maps_key.key_string + } + sensitive = true +} diff --git a/mp-pkg/terraform/modules/maps-api-keys/variables.tf b/mp-pkg/terraform/modules/maps-api-keys/variables.tf new file mode 100644 index 0000000..683d7bc --- /dev/null +++ b/mp-pkg/terraform/modules/maps-api-keys/variables.tf @@ -0,0 +1,75 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# API Keys Variables +# ============================================ + +# ============================================ +# Required Variables +# ============================================ +variable "project_id" { + description = "The GCP project ID" + type = string +} + +variable "name_prefix" { + description = "Prefix for API key naming (e.g., 'datacommons-prod')" + type = string +} + +# ============================================ +# API Configuration +# ============================================ +variable "api_targets" { + description = "List of Google API services this key can access" + type = list(string) + default = [ + "maps-backend.googleapis.com", + "places-backend.googleapis.com" + ] +} + +# ============================================ +# Application Restrictions +# ============================================ +variable "allowed_referrers" { + description = "List of allowed HTTP referrers for browser applications (e.g., ['https://example.com/*'])" + type = list(string) + default = [] +} + +variable "allowed_android_applications" { + description = "List of allowed Android applications (package_name and sha1_fingerprint)" + type = list(object({ + package_name = string + sha1_fingerprint = string + })) + default = [] +} + +variable "allowed_ios_bundle_ids" { + description = "List of allowed iOS bundle IDs" + type = list(string) + default = [] +} + +# ============================================ +# Labels +# ============================================ +variable "labels" { + description = "Resource labels/tags for organization and cost tracking" + type = map(string) + default = {} +} diff --git a/mp-pkg/terraform/modules/maps-api-keys/versions.tf b/mp-pkg/terraform/modules/maps-api-keys/versions.tf new file mode 100644 index 0000000..d360c68 --- /dev/null +++ b/mp-pkg/terraform/modules/maps-api-keys/versions.tf @@ -0,0 +1,28 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +terraform { + required_version = ">= 1.5.7" + + required_providers { + google = { + source = "hashicorp/google" + version = ">= 7.0.0, < 8.0.0" + } + random = { + source = "hashicorp/random" + version = ">= 3.0.0" + } + } +} diff --git a/mp-pkg/terraform/nat.tf b/mp-pkg/terraform/nat.tf new file mode 100644 index 0000000..d0c4850 --- /dev/null +++ b/mp-pkg/terraform/nat.tf @@ -0,0 +1,54 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Cloud Router + Cloud NAT +# ============================================ + +resource "google_compute_router" "nat_router" { + count = var.create_new_cluster ? 1 : 0 + + provider = google + + name = "${local.deployment_name}-router" + region = local.region + network = google_compute_network.vpc[0].id + project = var.project_id + + depends_on = [ + google_project_service.apis["compute.googleapis.com"] + ] +} + +resource "google_compute_router_nat" "nat" { + count = var.create_new_cluster ? 1 : 0 + + provider = google + + name = "${local.deployment_name}-nat" + router = google_compute_router.nat_router[0].name + region = local.region + project = var.project_id + nat_ip_allocate_option = "AUTO_ONLY" + source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES" + + log_config { + enable = false + filter = "ERRORS_ONLY" + } + + depends_on = [ + google_compute_subnetwork.primary + ] +} diff --git a/mp-pkg/terraform/outputs.tf b/mp-pkg/terraform/outputs.tf new file mode 100644 index 0000000..ab1e6f0 --- /dev/null +++ b/mp-pkg/terraform/outputs.tf @@ -0,0 +1,68 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Namespace +# ============================================ +output "namespace" { + description = "Kubernetes namespace where DataCommons is deployed" + value = local.namespace_name +} + +# ============================================ +# GCS Bucket +# ============================================ +output "gcs_bucket_url" { + description = "GCS bucket URL (gs://)" + value = module.gcs_bucket.bucket_url +} + +# ============================================ +# Access Commands +# ============================================ +output "kubectl_configure" { + description = "Command to configure kubectl for your GKE cluster" + value = "gcloud container clusters get-credentials ${local.cluster_name} --location=${local.cluster_location} --project=${var.project_id}" +} + +output "verify_pods" { + description = "Command to verify Data Commons pods are running" + value = "kubectl get pods -n ${local.namespace_name}" +} + +output "port_forward" { + description = "Port-forward command to access Data Commons locally (with auto-retry)" + value = "until kubectl port-forward -n ${local.namespace_name} svc/datacommons 8080:8080; do echo 'Port-forward crashed. Respawning...' >&2; sleep 1; done" +} + +output "cloud_shell_access" { + description = "Cloud Shell quick access: GKE Console > cluster > Connect > Run in Cloud Shell, then run the port-forward command" + value = "GKE Console > ${local.cluster_name} > Connect > Run in Cloud Shell, then run: until kubectl port-forward -n ${local.namespace_name} svc/datacommons 8080:8080; do echo 'Respawning...' >&2; sleep 1; done — then click 'Web Preview' > 'Preview on port 8080'" +} + +output "upload_data" { + description = "Command to upload custom data to GCS bucket" + value = "gsutil cp -r /path/to/your/data gs://${module.gcs_bucket.bucket_name}/input" +} + +output "view_logs" { + description = "Command to view application logs" + value = "kubectl logs -n ${local.namespace_name} -l app=datacommons --tail=100 -f" +} + +output "retrieve_admin_credentials" { + description = "Commands to retrieve admin panel credentials (username and password)" + value = "echo 'Admin Username:' && kubectl get secret datacommons -n ${local.namespace_name} -o jsonpath='{.data.ADMIN_PANEL_USERNAME}' | base64 -d && echo && echo 'Admin Password:' && kubectl get secret datacommons -n ${local.namespace_name} -o jsonpath='{.data.ADMIN_PANEL_PASSWORD}' | base64 -d && echo" +} + diff --git a/mp-pkg/terraform/schema.yaml b/mp-pkg/terraform/schema.yaml new file mode 100644 index 0000000..978da5b --- /dev/null +++ b/mp-pkg/terraform/schema.yaml @@ -0,0 +1,31 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# GCP Marketplace Image Mapping Schema +# ============================================ + +images: + cdc-services: + variables: + cdc_services_image_repo: + type: REPO_WITH_REGISTRY_WITH_NAME + cdc_services_image_tag: + type: TAG + data: + variables: + data_image_repo: + type: REPO_WITH_REGISTRY_WITH_NAME + data_image_tag: + type: TAG diff --git a/mp-pkg/terraform/secrets.tf b/mp-pkg/terraform/secrets.tf new file mode 100644 index 0000000..d6b2cd2 --- /dev/null +++ b/mp-pkg/terraform/secrets.tf @@ -0,0 +1,89 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Kubernetes Secrets Configuration +# ============================================ +resource "kubernetes_namespace_v1" "datacommons" { + count = var.create_namespace ? 1 : 0 + + metadata { + name = local.namespace + + labels = merge( + local.common_labels, + { + name = local.namespace + } + ) + } + + depends_on = [ + data.google_container_cluster.gke, + google_container_cluster.autopilot, + ] +} + +# Data source for existing namespace (when create_namespace=false) +data "kubernetes_namespace_v1" "existing" { + count = var.create_namespace ? 0 : 1 + + metadata { + name = local.namespace + } + + depends_on = [ + data.google_container_cluster.gke, + google_container_cluster.autopilot, + ] +} + +# ============================================ +# Kubernetes Secrets +# ============================================ + +module "k8s_secrets" { + source = "./modules/k8s-secrets" + + namespace = local.namespace_name + + secrets = { + "datacommons-secrets" = { + data = { + # Database password from CloudSQL module + DB_PASS = module.cloudsql.user_password + + # Google Maps API key from maps-api-keys module + MAPS_API_KEY = module.maps_api_keys.api_key + + # DataCommons API key from user input + DC_API_KEY = var.dc_api_key + } + } + } + + labels = merge( + local.common_labels, + { + component = "secrets" + } + ) + + depends_on = [ + kubernetes_namespace_v1.datacommons, + data.kubernetes_namespace_v1.existing, + module.cloudsql, + module.maps_api_keys + ] +} diff --git a/mp-pkg/terraform/variables.tf b/mp-pkg/terraform/variables.tf new file mode 100644 index 0000000..4ba24f6 --- /dev/null +++ b/mp-pkg/terraform/variables.tf @@ -0,0 +1,252 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Variables +# ============================================ + +# ============================================ +# Marketplace Deployment Configuration +# ============================================ +variable "goog_cm_deployment_name" { + description = "Deployment name for the Data Commons Accelerator solution (used by GCP Marketplace for tracking and avoiding resource name collisions)" + type = string + + validation { + condition = length(var.goog_cm_deployment_name) >= 2 && length(var.goog_cm_deployment_name) <= 18 + error_message = "Deployment name must be between 2 and 18 characters (limited by GCP service account 30-char max: name + '-sa-' + 8-char suffix)." + } + + validation { + condition = can(regex("^[a-z][a-z0-9-]{0,16}[a-z0-9]$", var.goog_cm_deployment_name)) + error_message = "Deployment name must start with a lowercase letter, contain only lowercase letters, numbers, and hyphens, and end with a letter or number. Maximum 18 characters." + } +} + +# ============================================ +# Project & Region Configuration +# ============================================ +variable "project_id" { + description = "GCP project ID where Data Commons Accelerator will be deployed" + type = string + + validation { + condition = can(regex("^[a-z][a-z0-9-]{4,28}[a-z0-9]$", var.project_id)) + error_message = "Project ID must be 6-30 characters, start with a letter, and contain only lowercase letters, numbers, and hyphens." + } +} + +variable "create_new_cluster" { + description = "Create a new GKE Autopilot cluster with VPC networking. Set to false to use an existing cluster." + type = bool + default = true +} + +variable "region" { + description = "GCP region for new cluster and resources (e.g., us-central1). Only used when create_new_cluster is true." + type = string + default = "us-central1" + + validation { + condition = var.region == "" || can(regex("^[a-z]+-[a-z]+[0-9]+$", var.region)) + error_message = "Region must be a valid GCP region (e.g., us-central1, europe-west3)." + } +} + +# ============================================ +# GKE Cluster Configuration (Bring-Your-Own) +# ============================================ +variable "gke_cluster_name" { + description = "Name of an existing GKE cluster. Only used when create_new_cluster is false." + type = string + default = "" +} + +variable "gke_cluster_location" { + description = "Location (region or zone) of the existing GKE cluster. Only used when create_new_cluster is false." + type = string + default = "" +} + +variable "namespace" { + description = "Kubernetes namespace for Data Commons Accelerator deployment. Defaults to the deployment name (goog_cm_deployment_name) if not provided." + type = string + default = "" + + validation { + condition = var.namespace == "" || can(regex("^[a-z0-9]([-a-z0-9]*[a-z0-9])?$", var.namespace)) + error_message = "Namespace must consist of lowercase alphanumeric characters or '-', and must start and end with an alphanumeric character." + } +} + +variable "create_namespace" { + description = "Create new Kubernetes namespace. Set to false if namespace already exists in the cluster." + type = bool + default = true +} + +# ============================================ +# Resource Name Overrides (Optional) +# ============================================ +# These variables allow enterprise customers to specify exact resource names +# when they have naming conventions or pre-existing resources to integrate with. +# If not specified, resources use auto-generated names with random suffixes. + +variable "cloudsql_instance_name_override" { + description = "Override CloudSQL instance name (uses generated name with random suffix if not specified)" + type = string + default = "" + + validation { + condition = var.cloudsql_instance_name_override == "" || can(regex("^[a-z][a-z0-9-]{0,78}[a-z0-9]$", var.cloudsql_instance_name_override)) + error_message = "CloudSQL instance name must be lowercase, start with a letter, and contain only lowercase letters, numbers, and hyphens." + } +} + +variable "gcs_bucket_name_override" { + description = "Override GCS bucket name (uses generated name with random suffix if not specified)" + type = string + default = "" + + validation { + condition = var.gcs_bucket_name_override == "" || can(regex("^[a-z0-9][a-z0-9-_.]{1,61}[a-z0-9]$", var.gcs_bucket_name_override)) + error_message = "Bucket name must be 3-63 characters, start and end with lowercase letter or number, and contain only lowercase letters, numbers, hyphens, underscores, and dots." + } +} + +variable "service_account_name_override" { + description = "Override GCP service account name (uses generated name with random suffix if not specified)" + type = string + default = "" + + validation { + condition = var.service_account_name_override == "" || can(regex("^[a-z][a-z0-9-]{4,28}[a-z0-9]$", var.service_account_name_override)) + error_message = "Service account name must be 6-30 characters, start with a letter, and contain only lowercase letters, numbers, and hyphens." + } +} + +# ============================================ +# API Keys Configuration +# ============================================ +variable "dc_api_key" { + description = "Data Commons API key for accessing Data Commons APIs" + type = string + sensitive = true + + validation { + condition = length(var.dc_api_key) > 0 + error_message = "Data Commons API key cannot be empty." + } +} + +# ============================================ +# Application Configuration +# ============================================ +variable "app_replicas" { + description = "Number of replicas for the Data Commons Accelerator application deployment" + type = number + default = 1 + + validation { + condition = var.app_replicas >= 1 && var.app_replicas <= 10 + error_message = "Application replicas must be between 1 and 10." + } +} + +variable "resource_tier" { + description = "Resource allocation tier controlling application pod resources and CloudSQL database sizing (small, medium, large)" + type = string + default = "medium" + + validation { + condition = contains(["small", "medium", "large"], var.resource_tier) + error_message = "Resource tier must be one of: small, medium, large." + } +} + +variable "enable_natural_language" { + description = "Enable natural language query features" + type = bool + default = true +} + +variable "enable_data_sync" { + description = "Enable automatic synchronization of custom data from GCS bucket to CloudSQL database" + type = bool + default = true +} + +variable "flask_env" { + description = "Data Commons sample (pre-built configurations for specific domains)" + type = string + default = "health" + + validation { + condition = contains(["health", "education", "energy", "custom"], var.flask_env) + error_message = "Sample must be one of: health, education, energy, custom." + } +} + +# ============================================ +# Container Image Variables (Marketplace-populated) +# ============================================ +variable "cdc_services_image_repo" { + description = "Container image repository for CDC Services (populated by GCP Marketplace)" + type = string + default = "" +} + +variable "cdc_services_image_tag" { + description = "Container image tag for CDC Services (populated by GCP Marketplace)" + type = string + default = "" +} + +variable "data_image_repo" { + description = "Container image repository for Data service (populated by GCP Marketplace)" + type = string + default = "" +} + +variable "data_image_tag" { + description = "Container image tag for Data service (populated by GCP Marketplace)" + type = string + default = "" +} + +# ============================================ +# Helm Chart Variables (Marketplace-populated) +# ============================================ +variable "helm_chart_repo" { + description = "Helm chart repository URL (populated by GCP Marketplace)" + type = string + default = "" +} + +variable "helm_chart_name" { + description = "Helm chart name (populated by GCP Marketplace)" + type = string + default = "datacommons" + + validation { + condition = length(var.helm_chart_name) > 0 + error_message = "Helm chart name cannot be empty." + } +} + +variable "helm_chart_version" { + description = "Helm chart version (populated by GCP Marketplace)" + type = string + default = "" +} \ No newline at end of file diff --git a/mp-pkg/terraform/versions.tf b/mp-pkg/terraform/versions.tf new file mode 100644 index 0000000..cb72278 --- /dev/null +++ b/mp-pkg/terraform/versions.tf @@ -0,0 +1,48 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Terraform Version Constraints +# ============================================ + +terraform { + required_version = ">= 1.5.7" + + required_providers { + google = { + source = "hashicorp/google" + version = ">= 7.0.0, < 8.0.0" + } + google-beta = { + source = "hashicorp/google-beta" + version = ">= 7.0.0, < 8.0.0" + } + kubernetes = { + source = "hashicorp/kubernetes" + version = ">= 2.20.0" + } + helm = { + source = "hashicorp/helm" + version = "~> 2.12.0" + } + random = { + source = "hashicorp/random" + version = "~> 3.6.0" + } + http = { + source = "hashicorp/http" + version = "~> 3.4.0" + } + } +} diff --git a/mp-pkg/terraform/vpc.tf b/mp-pkg/terraform/vpc.tf new file mode 100644 index 0000000..ac51e2c --- /dev/null +++ b/mp-pkg/terraform/vpc.tf @@ -0,0 +1,68 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# VPC Network +# ============================================ + +resource "google_compute_network" "vpc" { + count = var.create_new_cluster ? 1 : 0 + + provider = google + + name = "${local.deployment_name}-vpc" + auto_create_subnetworks = false + routing_mode = "REGIONAL" + project = var.project_id + + depends_on = [ + google_project_service.apis["compute.googleapis.com"] + ] +} + +# ============================================ +# Primary Subnet with GKE Secondary Ranges +# ============================================ + +resource "google_compute_subnetwork" "primary" { + count = var.create_new_cluster ? 1 : 0 + + provider = google + + name = "${local.deployment_name}-subnet" + network = google_compute_network.vpc[0].id + region = local.region + ip_cidr_range = "10.0.0.0/20" + private_ip_google_access = true + project = var.project_id + + secondary_ip_range { + range_name = "pods" + ip_cidr_range = "10.1.0.0/17" + } + + secondary_ip_range { + range_name = "services" + ip_cidr_range = "10.2.0.0/22" + } + + log_config { + aggregation_interval = "INTERVAL_30_SEC" + flow_sampling = 0.5 + } + + depends_on = [ + google_project_service.apis["compute.googleapis.com"] + ] +} diff --git a/mp-pkg/terraform/workload-identity.tf b/mp-pkg/terraform/workload-identity.tf new file mode 100644 index 0000000..89c0817 --- /dev/null +++ b/mp-pkg/terraform/workload-identity.tf @@ -0,0 +1,64 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Workload Identity Configuration +# ============================================ + +# ============================================ +# GCP Service Account for DataCommons Workload +# ============================================ +resource "google_service_account" "datacommons_workload" { + provider = google + + account_id = local.service_account_name + display_name = "DataCommons Workload Service Account (${local.deployment_name})" + description = "Service account for DataCommons application running on GKE with Workload Identity" + project = var.project_id + + depends_on = [google_project_service.apis["iam.googleapis.com"]] +} + +# ============================================ +# IAM Role Bindings - Project Level +# ============================================ + +# CloudSQL Client - Required for CloudSQL access via Cloud SQL Auth Proxy or private IP +resource "google_project_iam_member" "datacommons_cloudsql_client" { + provider = google + + project = var.project_id + role = "roles/cloudsql.client" + member = "serviceAccount:${google_service_account.datacommons_workload.email}" + + depends_on = [google_service_account.datacommons_workload] +} + +# NOTE: Storage Object Admin IAM is granted at the bucket level (not project level) + +# ============================================ +# Workload Identity Binding +# ============================================ + +resource "google_service_account_iam_member" "datacommons_workload_identity_user" { + provider = google + + service_account_id = google_service_account.datacommons_workload.name + role = "roles/iam.workloadIdentityUser" + + # Member format: serviceAccount:{PROJECT_ID}.svc.id.goog[{NAMESPACE}/{KSA_NAME}] + member = "serviceAccount:${var.project_id}.svc.id.goog[${local.namespace_name}/datacommons-ksa]" + + depends_on = [google_service_account.datacommons_workload] +}