From 6758951d13569ccaa4992fd6e0283e21584ffe6c Mon Sep 17 00:00:00 2001 From: Artur Date: Thu, 15 Jan 2026 17:19:10 +0100 Subject: [PATCH 1/3] Add Deployment guide --- docs/DEPLOYMENT_GUIDE.md | 1067 +++++++++++++++++++++++++++++++------- 1 file changed, 886 insertions(+), 181 deletions(-) diff --git a/docs/DEPLOYMENT_GUIDE.md b/docs/DEPLOYMENT_GUIDE.md index 5f16e13..83fa300 100644 --- a/docs/DEPLOYMENT_GUIDE.md +++ b/docs/DEPLOYMENT_GUIDE.md @@ -1,4 +1,6 @@ -# Data Commons Accelerator - GCP Marketplace Deployment Guide +# Data Commons Accelerator - GCP Marketplace User Guide + +A complete guide to deploying and managing Data Commons Accelerator through Google Cloud Marketplace. --- @@ -8,9 +10,11 @@ 2. [Architecture](#architecture) 3. [Prerequisites](#prerequisites) 4. [Deployment via GCP Marketplace](#deployment-via-gcp-marketplace) -5. [Using Data Commons Accelerator](#using-data-commons-accelerator) -6. [Troubleshooting](#troubleshooting) -7. [Deleting Your Deployment](#deleting-your-deployment) +5. [Accessing Your Deployment](#accessing-your-deployment) +6. [Using Data Commons Accelerator](#using-data-commons-accelerator) +7. [Managing Your Deployment](#managing-your-deployment) +8. [Troubleshooting](#troubleshooting) +9. [Deleting Your Deployment](#deleting-your-deployment) --- @@ -18,22 +22,28 @@ ### What is Data Commons Accelerator? -Data Commons Accelerator is a ready-to-deploy instance of **Custom Data Commons** on Google Kubernetes Engine (GKE). [Data Commons](https://docs.datacommons.org/what_is.html) is an open knowledge repository providing unified access to public datasets and statistics. For more details on custom Data Commons and its benefits, see [Custom Data Commons documentation](https://docs.datacommons.org/custom_dc/). +Data Commons Accelerator is a ready-to-deploy instance of the Data Commons platform on Google Kubernetes Engine (GKE). Data Commons is an open knowledge repository providing unified access to public datasets and statistics, enabling your organization to explore data without manually aggregating from multiple sources. ### What Problems Does It Solve? -Data Commons Accelerator simplifies deploying a custom Data Commons instance—removing the complexity of infrastructure setup, Kubernetes configuration, and cloud resource provisioning. It empowers domain experts and data analysts to quickly create a custom Data Commons, integrate their datasets, and leverage the public Data Commons knowledge graph. - -**Value Proposition:** +Data Commons Accelerator addresses these common data exploration challenges: -- **Simplicity**: Deploy a complex data solution with a few clicks, eliminating manual setup and configuration -- **Efficiency**: Streamline custom Data Commons adoption via Marketplace, bypassing traditional sales cycles -- **Accelerated Time-to-Insight**: Quickly join proprietary datasets with public data (census, economic, weather) to unlock new correlations -- **Empowerment**: Natural language interface allows non-technical users to query data without code +- **Data Fragmentation**: Public datasets scattered across multiple sources with different formats +- **Time-to-Insight**: Analysts spend weeks aggregating and standardizing data manually +- **Lack of Context**: Difficulty understanding relationships and connections between datasets +- **Scalability**: Traditional approaches break down with large datasets and many concurrent users +- **Customization**: Need to combine public data with proprietary internal datasets ### Who Should Use It? -See [Custom Data Commons: When do I need a custom instance?](https://docs.datacommons.org/custom_dc/) for use cases and audience guidance. +Data Commons Accelerator is ideal for: + +- **Data analysts** exploring public datasets and statistics +- **Researchers** studying demographic, economic, or environmental trends +- **Government agencies** publishing and analyzing public statistics +- **Non-profit organizations** working with community data +- **Academic institutions** teaching data analysis and visualization +- **Enterprise teams** integrating public data with business intelligence ### What Gets Deployed? @@ -41,8 +51,8 @@ This solution provisions a complete data exploration platform: - **Data Commons Accelerator Web Application**: Interactive interface for data exploration and visualization - **CloudSQL MySQL Database**: Persistent storage for datasets and metadata (with optional high availability) -- **Cloud Storage Bucket**: Scalable storage for custom data imports -- **Kubernetes Workload**: Application deployed to your existing GKE cluster (not created by this solution) with Workload Identity authentication +- **Cloud Storage Bucket**: Scalable storage for custom data imports and exports +- **Kubernetes Workload**: Application deployed to your existing GKE cluster with Workload Identity authentication - **Service Account**: Secure identity for accessing cloud resources No additional infrastructure setup is required—everything integrates with your existing GCP project. @@ -53,27 +63,58 @@ No additional infrastructure setup is required—everything integrates with your ### Components -This Marketplace solution deploys the following GCP resources: +The Data Commons Accelerator solution consists of four primary components: + +**1. GKE Application Container** + +Your Data Commons Accelerator application runs as Kubernetes pods in your existing GKE cluster. The application: + +- Provides the web UI for data exploration +- Handles API requests from users and external clients +- Processes statistical queries and visualizations +- Integrates with the Google Maps API for geospatial features + +The application runs in a dedicated Kubernetes namespace (default: `datacommons`) to keep it isolated from other workloads on your cluster. + +**2. CloudSQL Database** + +A managed MySQL database stores: + +- Statistical datasets and curated public data +- Metadata describing available datasets +- User-created custom datasets +- Query history and saved visualizations + +The database is deployed to a private IP address (via VPC Private Service Access) for security. It never exposes a public IP to the internet. Database replicas and backups are automatically managed by Google Cloud. + +**3. Cloud Storage Bucket** -| Component | Description | -|-----------|-------------| -| **GKE Workload** | Data Commons application pods running in your existing cluster (namespace matches your deployment name) | -| **CloudSQL MySQL** | Managed database with private IP (via VPC Private Service Access) for dataset storage | -| **GCS Bucket** | Cloud Storage for custom data imports | -| **Service Account** | Workload Identity-enabled SA for secure access to CloudSQL, GCS, and Maps API | -| **db-init Job** | One-time Kubernetes Job that initializes the database schema | -| **db-sync CronJob** | Recurring job that syncs custom data from GCS to CloudSQL (every 3 hours) | +A GCS bucket stores: -For details on Data Commons architecture and how the application works internally, see [Custom Data Commons documentation](https://docs.datacommons.org/custom_dc/). +- Custom datasets you import +- Exported data in various formats +- Query results and visualizations +- Temporary files during data processing + +You control who can access the bucket via GCP IAM permissions. + +**4. Workload Identity** + +A Google Cloud service account authenticates the application to cloud resources: + +- CloudSQL access via Workload Identity binding +- GCS bucket access via IAM roles +- Google Maps API calls via API key +- All credentials managed securely (no keys stored in pods) ### How Components Interact ``` -User Browser GCS Bucket (Custom Data) - │ │ - ├─> GKE Pod (Data Commons App) ├─> db-sync CronJob (every 3 hours) - │ │ │ - │ ├─> CloudSQL Database <───────┘ +User Browser + │ + ├─> GKE Pod (Data Commons Application) + │ │ + │ ├─> CloudSQL Database (private IP) │ │ └─> Dataset storage, queries │ │ │ ├─> GCS Bucket @@ -83,7 +124,7 @@ User Browser GCS Bucket (Custom Data) │ └─> Geospatial visualization │ └─> Infrastructure Manager - └─> Creates all related components/infrastructure of Data Commons (k8s resources, CloudSQL, IAM, GCS etc.) + └─> Manages Kubernetes resources ``` **Deployment Workflow:** @@ -96,75 +137,147 @@ User Browser GCS Bucket (Custom Data) ## Prerequisites -Before deploying Data Commons Accelerator, ensure you have the following: +### Required Before Deployment -### Infrastructure Requirements +**1. Existing GKE Cluster** -| Requirement | Details | -|-------------|---------| -| **GKE Cluster** | Kubernetes 1.27+, Standard or Autopilot cluster | -| **Workload Identity** | Must be enabled (default on GKE 1.27+) | -| **VPC Network** | VPC with Private Service Access configured (PSA could be created via Marketplace form if it's not existing) | +You must have an existing GKE cluster in your GCP project where Data Commons will be deployed. -**GKE Cluster:** If you need to create a cluster first, use the [GCP Kubernetes Engine creation form](https://console.cloud.google.com/kubernetes/add?). +- **Minimum version**: Kubernetes 1.27 or higher +- **Cluster type**: Standard or Autopilot (both supported) +- **Workload Identity**: Must be enabled (enabled by default on GKE 1.27+) +- **Network**: VPC with Private Service Access configured -### Required IAM Roles +To verify your cluster exists and meets requirements: -Your Google Cloud user account must have at least these roles assigned on the GCP project to deploy solution from Marketplace: +1. Go to **Kubernetes Engine** in the GCP Console +2. Click **Clusters** +3. Find your cluster in the list +4. Click the cluster name to see details +5. Verify the **Cluster version** is 1.27 or higher +6. Check that **Workload Identity** is shown as "Enabled" -| Role | Purpose | -|------|---------| -| Cloud Infrastructure Manager Admin | Deploy and manage Infrastructure Manager deployments | -| Infrastructure Administrator | Manage infrastructure resources provisioned by Terraform | -| Kubernetes Engine Developer | Deploy workloads to GKE clusters | -| Project IAM Admin | Assign IAM roles to service accounts | -| Service Account Admin | Create and manage GCP service accounts | -| Service Account User | Act as service accounts for Workload Identity | +If you need to create a cluster first, see the [GCP Kubernetes Engine documentation](https://docs.cloud.google.com/kubernetes-engine/docs/resources/autopilot-standard-feature-comparison). + +**Important**: The deployment form will ask for your cluster name and location. + +**2. Private Service Access (PSA) Configuration** + +Data Commons Accelerator uses CloudSQL with private IP connectivity, which requires Private Service Access (PSA) to be configured between your VPC network and Google's service producer network. + +**Option A: Automatic PSA Creation (Recommended for New Users)** -Contact your GCP administrator if you are missing any roles. +**Default behavior** - The solution will automatically: +- Allocate a /20 IP range (4,096 addresses) in your VPC +- Create Private Service Access connection for CloudSQL +- Configure VPC peering with Google's service producer network -#### Deployment Service Account (Automatic) +**Requirements:** +- Service Networking API must be enabled (solution enables automatically) +- User must have `compute.networkAdmin` role or equivalent +- Sufficient IP space in your VPC (solution uses /20 by default) -The deployment automatically creates a Service Account with the following roles. These are **not** assigned to users — they are used by the application and Infrastructure Manager: +**Configuration:** +```yaml +create_psa_connection: true # default - no action needed +psa_range_prefix_length: 20 # 4,096 IPs for production +``` + +**Option B: Use Existing PSA (Recommended for Enterprise)** + +**When to use:** +- Your organization already has PSA configured for the VPC +- Multiple deployments in the same project/VPC +- Centralized network management by dedicated team +- Existing CloudSQL or other services using PSA + +**Find your existing PSA range:** +```bash +gcloud compute addresses list --global \ + --filter="purpose=VPC_PEERING AND network~YOUR_VPC_NAME" \ + --format="table(name,address,prefixLength,network)" +``` + +**Verify PSA connection exists:** +```bash +gcloud services vpc-peerings list \ + --network=YOUR_VPC_NAME \ + --project=YOUR_PROJECT_ID +``` + +**Configuration:** +```yaml +create_psa_connection: false +existing_psa_range_name: "your-psa-range-name" # from gcloud command above +``` + +**Important:** The PSA range must be in the same VPC network as your GKE cluster. The solution automatically derives the VPC from your GKE cluster configuration. + +**Checking Your GKE Cluster's VPC** + +To verify which VPC network your GKE cluster uses: +```bash +gcloud container clusters describe YOUR_CLUSTER_NAME \ + --location=YOUR_LOCATION \ + --format="value(network)" +``` + +The solution automatically uses this VPC for PSA configuration. + +**3. Required IAM Permissions** + +Your Google Cloud user account must have these roles on the GCP project: | Role | Purpose | |------|---------| -| roles/container.developer | Application workload management on GKE | -| roles/storage.admin | Read/write access to the GCS data bucket | -| roles/cloudsql.admin | Database instance management | -| roles/config.agent | Infrastructure Manager operations | -| roles/iam.infrastructureAdmin | Infrastructure resource management | -| roles/iam.serviceAccountAdmin | Service account lifecycle management | -| roles/serviceusage.serviceUsageAdmin | API enablement | -| roles/serviceusage.apiKeysAdmin | API key management | -| roles/resourcemanager.projectIamAdmin | IAM binding management | - -### Roles for Working with the Deployed Solution - -After deployment, your team members will need GCP IAM roles to interact with the deployed resources. - -| Role | What It Allows | -|------|----------------| -| Kubernetes Engine Developer | Application workload management on GKE | -| Cloud SQL Client | Connect directly to CloudSQL for database debugging | -| Cloud SQL Admin | Modify database configuration, manage backups and replicas | -| Cloud Infrastructure Manager Viewer | View deployment state and Terraform outputs | -| Cloud Infrastructure Manager Admin | Redeploy, update Terraform variables, manage deployment lifecycle | -| Storage Object Viewer | Download and read files from the GCS data bucket | -| Storage Object Admin | Upload, modify, and delete files in the GCS data bucket | -| API Keys Admin | Rotate or manage the Google Maps API key | - -### Data Commons API Key +| Marketplace Admin | Deploy solutions from GCP Marketplace | +| Kubernetes Engine Admin | Create and manage GKE resources | +| Cloud SQL Admin | Create and manage CloudSQL instances | +| Storage Admin | Create and manage GCS buckets | +| Service Account Admin | Create and manage service accounts | +| Service Account User | Bind identities to workloads | + +To verify your permissions: + +1. Go to **IAM & Admin** in the GCP Console +2. Click **IAM** +3. Find your user email in the list +4. Click the role(s) next to your email +5. Verify the required roles are listed + +If you're missing roles, contact your GCP administrator to grant them. + +**4. Data Commons API Key** Data Commons Accelerator requires an API key to access the Data Commons knowledge graph. -To obtain your API key, visit [Data Commons API documentation](https://docs.datacommons.org/custom_dc/quickstart.html/) and follow the "Get a Data Commons API Key" section. Save the key securely—you will need it during deployment. +To get your API key: + +1. Visit https://docs.datacommons.org/custom_dc/quickstart.html +2. Follow the "Get a Data Commons API Key" section +3. You'll receive your API key +4. Save it somewhere secure—you'll need it during deployment + +**Security note**: This key authenticates your instance to Data Commons. Keep it confidential and never commit it to version control. + +### Pre-Deployment Checklist + +Before proceeding to the GCP Marketplace form, verify: + +- [ ] GKE cluster exists and version is 1.27+ +- [ ] Workload Identity is enabled on your cluster +- [ ] Private Service Access is configured +- [ ] You have required IAM roles +- [ ] You have your Data Commons API key saved +- [ ] You've estimated monthly costs and have budget approved +- [ ] You have 15-20 minutes available for deployment +- [ ] You know your GKE cluster name and location --- ## Deployment via GCP Marketplace -This section walks through deploying Data Commons Accelerator via GCP Marketplace. +This section walks through the complete GCP Marketplace deployment process, field by field. ### Step 1: Navigate to GCP Marketplace @@ -176,172 +289,764 @@ This section walks through deploying Data Commons Accelerator via GCP Marketplac 6. Review the solution overview, pricing information, and documentation 7. Click the **Deploy** button (or **Get Started**, depending on UI) -### Step 2: Complete the Deployment Configuration Form +### Step 2: Deployment Configuration Form + +The Marketplace will open a deployment configuration form. This form has multiple sections. We'll walk through each field. + +#### Section 1: Basic Configuration + +This section identifies your deployment and project. + +**Deployment Name** + +- **Field name**: `deployment_name` +- **What it means**: A friendly identifier for this deployment in GCP Marketplace and Infrastructure Manager +- **Format**: Lowercase letters, numbers, hyphens only (no spaces or special characters) +- **Examples**: `datacommons-prod`, `data-commons-team-analytics`, `dc-staging` +- **Why it matters**: Used for tracking multiple deployments and resource naming +- **Recommendation**: Use a descriptive name that includes environment and purpose (e.g., `datacommons-prod` not `deploy123`) + +**Project** + +- **Field name**: `project_id` +- **What it means**: Your GCP project where all resources will be created +- **Format**: Project ID (find in **Settings** > **Project** > **Project ID**) +- **Examples**: `my-project-123456`, `data-analytics-prod` +- **Why it matters**: All resources (database, storage, service accounts) are created in this project +- **Recommendation**: Must be the same project as your GKE cluster. If you don't see your project in the dropdown, you may lack Marketplace Admin permissions—contact your GCP administrator + +**Region** + +- **Field name**: `region` +- **What it means**: Default region for regional resources (CloudSQL, networking) +- **Format**: Region code (see [GCP Regions](https://cloud.google.com/compute/docs/regions-zones)) +- **Examples**: `us-central1`, `us-east1`, `europe-west1`, `asia-southeast1` +- **Why it matters**: Affects latency (choose region near your users) and pricing (slight variations by region) +- **Recommendation**: + - Choose region closest to your users + - Must have same VPC connectivity as your GKE cluster + - Can't be changed after deployment (don't change later) + +#### Section 2: GKE Cluster Configuration + +This section specifies which existing GKE cluster to deploy to. + +**GKE Cluster Name** + +- **Field name**: `gke_cluster_name` +- **What it means**: Name of your existing GKE cluster (not created by this solution) +- **Format**: Cluster name as shown in **Kubernetes Engine** > **Clusters** +- **Examples**: `prod-gke-cluster`, `analytics-cluster-1`, `us-central1-cluster` +- **Why it matters**: The application pods will run on this cluster +- **Requirement**: Cluster must exist, be running, and have Workload Identity enabled +- **Recommendation**: Use dropdown to select from existing clusters in your project. Don't type manually—use the selection dropdown to avoid typos + +**GKE Cluster Location** + +- **Field name**: `gke_cluster_location` +- **What it means**: Geographic location of your GKE cluster (region or zone) +- **Format**: Region (e.g., `us-central1`) or Zone (e.g., `us-central1-a`) +- **Examples**: `us-central1` (regional cluster), `us-central1-a` (zonal cluster) +- **Why it matters**: Must match actual cluster location for deployment to work +- **Recommendation**: If you selected a cluster name above, this may auto-populate. Verify it matches the cluster's actual location from the **Kubernetes Engine** console + +**Kubernetes Namespace** + +- **Field name**: `namespace` +- **What it means**: Kubernetes namespace where the application pods will run +- **What is a namespace?**: A logical partition of your Kubernetes cluster, like a folder. Namespaces allow multiple applications to run on the same cluster without interfering +- **Format**: Lowercase letters and hyphens only +- **Default**: `datacommons` +- **Examples**: `datacommons`, `datacommons-prod`, `analytics` +- **Why it matters**: Keeps this deployment isolated from other applications on your cluster +- **Recommendation**: Use the default `datacommons` unless your organization has a naming standard (e.g., `namespace-{environment}`). The namespace will be created automatically if it doesn't exist + +#### Section 3: CloudSQL Database Configuration + +This section configures the managed MySQL database that stores datasets. + +**CloudSQL Machine Tier** + +- **Field name**: `cloudsql_tier` +- **What it means**: The processing power and memory of your database server +- **Why it matters**: Directly impacts query performance and cost. Higher tiers = faster queries but higher monthly cost +- **Default**: `db-n1-standard-1` + +**Recommendation:** +- **New deployments**: Choose `db-n1-standard-1` (good balance of cost and performance) +- **Proof-of-concept**: Use `db-f1-micro` to minimize costs +- **High-traffic production**: Use `db-n1-standard-2` or higher +- **You can upgrade later**: After deployment, you can change the tier via Infrastructure Manager update (5-10 minutes downtime) + +**CloudSQL Disk Size** + +- **Field name**: `cloudsql_disk_size` +- **What it means**: Storage space available for the database +- **Unit**: Gigabytes (GB) +- **Minimum**: 10 GB +- **Default**: 20 GB +- **Maximum**: 65,536 GB (65 TB) +- **Why it matters**: Stores datasets and metadata. Database automatically expands as you add data + +**Recommendation:** +- **Default (20 GB)** is suitable for most deployments +- **Start small**: You don't need to predict exact size. Disk auto-grows +- **Monitor usage**: Check **Cloud SQL** > **Instances** > [your instance] > **Overview** tab for current size +- **Estimate**: + - Empty installation: ~2 GB + - With public Data Commons data: +5–10 GB + - Custom datasets: Add estimated size +- **Example**: If you plan to import 50 GB of custom data, allocate 60 GB initially + +**CloudSQL Region (Optional)** + +- **Field name**: `cloudsql_region` +- **What it means**: Geographic region where the database is created (default: uses the `region` parameter from Section 1) +- **Format**: Region code, or leave empty to use default +- **Examples**: `us-central1`, `us-east1`, `europe-west1` +- **Why it matters**: Affects latency and must align with your VPC's Private Service Access configuration +- **Recommendation**: Leave empty unless you have a specific reason to override. The default region aligns with your application region + +**High Availability** + +- **Field name**: `cloudsql_ha_enabled` +- **What it means**: Enables automatic database replication to a different availability zone +- **How it works**: + - **Disabled (default)**: Single database instance in one zone. If zone fails, data is lost + - **Enabled**: Two instances (primary + replica) in different zones. If one zone fails, automatically switches to replica with zero downtime +- **Cost impact**: Approximately doubles the monthly database cost +- **Downtime**: 0 minutes (automatic failover) + +**Recommendation:** +- **Production deployments**: Enable High Availability. The extra cost is worth the reliability +- **Non-production**: Disable to save costs +- **Can be changed later**: Update via Infrastructure Manager + +#### Section 4: Cloud Storage Configuration + +This section configures the GCS bucket for data storage. + +**GCS Bucket Name** + +- **Field name**: `gcs_bucket_name` +- **What it means**: Name of the Cloud Storage bucket (like a drive on the cloud) +- **Format**: Must be globally unique across all GCP projects. Only lowercase letters, numbers, hyphens, periods +- **Pattern**: `{project}-{purpose}-{random}` to ensure uniqueness +- **Examples**: + - `datacommons-prod` + - `mycompany-dc-analytics` + - `datacommons-team-project1` +- **Why it matters**: Stores custom datasets, exports, and query results. Must be unique +- **Recommendation**: Include your project name and environment to make it identifiable. Avoid generic names like `data` or `bucket1` + +**GCS Storage Class** + +- **Field name**: `gcs_storage_class` +- **What it means**: How your data is stored (affects cost and access speed) +- **Default**: `STANDARD` +- **Monthly cost per GB**: Varies by class (see table below) + +**Recommendation:** +- **New deployments**: Use `STANDARD` (best performance, reasonable cost) +- **Cost optimization**: Use `NEARLINE` if you access archived data less than monthly +- **Can't be changed later**: Choose carefully. To change, you'd need to migrate data to a new bucket +- **Best practice**: Use `STANDARD` initially; if you have archived data, create separate buckets with different storage classes + +**GCS Location** + +- **Field name**: `gcs_location` +- **What it means**: Geographic region or multi-region where the bucket is stored +- **Default**: `US` +- **Multi-region options**: `US`, `EU`, `ASIA` (replicates across multiple regions for redundancy) +- **Single-region options**: `us-central1`, `us-east1`, `europe-west1`, etc. + +**Recommendation:** +- **Most deployments**: Use `US`, `EU`, or `ASIA` based on your users' location +- **Cost optimization**: Use single-region locations only if cost is critical and you accept single-point-of-failure risk +- **Data residency**: EU if you have European data requiring GDPR compliance + +#### Section 5: API Configuration + +This section provides the Data Commons API key that authenticates your instance. + +**Data Commons API Key** + +- **Field name**: `dc_api_key` +- **What it means**: Authentication token for accessing the Data Commons knowledge graph +- **Where to get it**: You obtained this in the Prerequisites section (https://docs.datacommons.org/custom_dc/quickstart.html) +- **Why it matters**: Authenticates your instance to Data Commons for accessing the knowledge graph +- **Security**: The key is stored as a Kubernetes secret. Never commit to version control + +**Recommendation:** +- **Paste carefully**: Copy from your secure note +- **Verify**: Double-check there are no extra spaces or characters +- **Can't be changed via form**: If you enter the wrong key, you'll need to manually update the Kubernetes secret after deployment. Contact your GCP administrator if needed + +#### Section 6: Application Configuration + +This section controls how the application runs on your cluster. + +**Application Replicas** + +- **Field name**: `app_replicas` +- **What it means**: Number of application pods running simultaneously +- **Default**: `1` +- **Range**: 1–10 pods +- **How it affects cost**: Each replica uses cloud resources. 2 replicas ≈ 2x resource cost +- **How it affects reliability**: More replicas = better resilience to pod failures + +**Recommendation:** +- **Development**: Use 1 replica +- **Production**: Use 2 replicas +- **High-traffic**: Use 3–4 replicas +- **Can be changed later**: Update via Infrastructure Manager + +**Resource Tier** + +- **Field name**: `resource_tier` +- **What it means**: CPU and memory allocated to each application pod +- **Default**: `medium` +- **Options**: `small`, `medium`, `large` +- **How it affects cost**: Each pod's resources are billed separately + +**Resource Tier Specifications:** + +| Tier | CPU | Memory | Best For | +|------|-----|--------|--------| +| `small` | 1.0 | 2 GB| Light workloads, <10 concurrent users | +| `medium` | 2.0 | 4 GB | Standard workloads, 10–100 concurrent users | +| `large` | 4.0 | 8 GB | Heavy workloads, >100 concurrent users, complex queries | + + +**Recommendation:** +- **Default (`medium`)** works for most deployments +- **Start with medium**: If you experience slow queries or high CPU, upgrade to `large` +- **Reduce to small**: If you're under-utilizing resources +- **Can be changed later**: Update via Infrastructure Manager (requires pod restart, 1–2 minutes downtime) + +**Enable Natural Language Queries** + +- **Field name**: `enable_natural_language` +- **What it means**: Allow users to ask questions in plain English (e.g., "Show me healthcare spending by state") +- **Default**: Enabled +- **How it works**: System translates natural language to statistical queries + +**Recommendation:** +- **Most deployments**: Keep enabled. Users find it helpful +- **Specialized deployments**: Disable only if you have specific technical requirements +- **Can be changed later**: Update via Infrastructure Manager + +**Enable Data Sync** + +- **Field name**: `enable_data_sync` +- **What it means**: Automatically synchronize custom data from GCS bucket to CloudSQL database +- **Default**: Enabled +- **What it does**: Enables scheduled import of custom datasets you upload to GCS into CloudSQL for querying. Without it, you manage data imports manually +- **Performance impact**: Runs during off-peak hours; minimal performance impact + +**Recommendation:** +- **Most deployments**: Keep enabled. You get fresh data automatically +- **Custom-only deployments**: Disable only if using only proprietary data and never want public data +- **Can be changed later**: Update via Infrastructure Manager (background process, no downtime) + +### Step 3: Review and Deploy + +**Review Deployment Configuration** + +Before submitting: + +1. Scroll to the top of the form +2. Review each section: + - Deployment name is descriptive + - Project ID is correct + - GKE cluster name and location match your cluster + - CloudSQL tier is appropriate for your use case + - GCS bucket name is globally unique + - Data Commons API key is correct +3. If any field is wrong, click on the section and correct it + +**Accept Terms of Service** + +1. Scroll to the bottom of the form +2. Read and check the checkbox: "I accept the Google Cloud Marketplace Terms" +3. Check any additional terms specific to Data Commons + +**Click Deploy** + +1. Click the **Deploy** button +2. A progress indicator appears +3. **Do not close the browser tab** during deployment + +### Step 4: Monitor Deployment Progress + +The deployment takes 10–15 minutes. Here's what's happening: + +1. **Infrastructure Manager creates resources** (2–3 minutes): + - CloudSQL instance being provisioned + - GCS bucket being created + - Service account being created + - VPC network connections being established + +2. **Terraform applies configuration** (3–5 minutes): + - Resources are configured with your settings + - Workload Identity bindings are created + - Kubernetes secrets are created + +3. **Helm deploys application** (3–5 minutes): + - Container images are pulled from registry + - Application pods are scheduled on your GKE cluster + - Readiness probes verify the application is healthy + +**To monitor progress:** + +1. You should see a progress page in your browser showing deployment status +2. Alternatively, go to **Infrastructure Manager** > **Deployments** +3. Click on your deployment name +4. View the status indicator: + - **Creating**: Deployment in progress + - **Active**: Deployment succeeded (you can now access the application) + - **Error**: Something failed (see Troubleshooting section) + +**If deployment fails:** + +1. Click on the deployment in Infrastructure Manager +2. Click **Details** or **Logs** +3. Review the error message +4. Common issues and solutions are in the [Troubleshooting](#troubleshooting) section + +--- + +## Accessing Your Deployment + +After successful deployment, you can access your Data Commons Accelerator application. -The Marketplace will open a deployment configuration form organized into several sections: **Basic** (deployment name and project), **GKE** (cluster details), **CloudSQL** (database settings), **Cloud Storage** (bucket configuration), **API** (Data Commons API key), and **Application** (pod replicas and resource sizing). +### View Deployment Outputs -Each field has built-in tooltips with detailed guidance—hover over or click the help icon next to any field for clarification. The form validates your inputs and shows clear error messages if anything is incorrect. +Deployment outputs contain important information needed to access and manage your application. -For detailed descriptions of every form field, valid values, and tips, see [Marketplace Fields Reference](MARKETPLACE_FIELDS.md). +**To view outputs:** -**Before you start, gather these from Prerequisites:** -- Your **GKE cluster name and location** -- Your **Data Commons API key** +1. Go to **Infrastructure Manager** > **Deployments** in GCP Console +2. Click your deployment name +3. Click the **Outputs** tab +4. Review the outputs (see descriptions below) -#### Private Service Access (PSA) +**Key Outputs:** -The CloudSQL section of the form asks how to configure Private Service Access for database connectivity: +| Output | Purpose | Example | +|--------|---------|---------| +| `namespace` | Kubernetes namespace where app is deployed | `datacommons` | +| `helm_release_name` | Helm release identifier | `datacommons-123abc` | +| `cloudsql_instance_name` | Database instance name | `datacommons-db-xyz123` | +| `cloudsql_connection_name` | Database connection string | `my-project:us-central1:datacommons-db-xyz123` | +| `gcs_bucket_name` | Cloud Storage bucket name | `data-datacommons-prod` | +| `workload_service_account` | Service account for the application | `datacommons-sa@my-project.iam.gserviceaccount.com` | +| `gke_cluster_name` | Your GKE cluster name | `prod-cluster` | +| `gke_cluster_location` | Cluster location | `us-central1` | +| `next_steps` | Post-deployment instructions | (Detailed instructions) | -| Option | When to Use | Configuration | -|--------|------------|----------------| -| **Create New PSA** | First deployment, no existing PSA | The form will create a /20 PSA range automatically | -| **Use Existing PSA** | PSA already configured, multiple deployments in same VPC | Provide your existing PSA range name | +### Verify Deployment -**To find your existing PSA range:** +Before accessing the application, verify that all components are running correctly: + +**Check Pods** ```bash -gcloud compute addresses list --global \ - --filter="purpose=VPC_PEERING AND network=YOUR_VPC_NAME" \ - --format="table(name,address,prefixLength,network)" +# Check that application pods are running +kubectl get pods -n datacommons + +# Expected output: All pods should show status "Running" ``` -### Step 3: Review and Deploy +**Check Services** -Once you've completed all sections: +```bash +# Verify the service exists +kubectl get services -n datacommons -1. **Review your selections** by scrolling through the form -2. **Accept the terms** by checking the Terms checkbox -3. **Click the Deploy button** +# Note: Service should be ClusterIP type +``` -Deployment takes approximately **10–15 minutes**. A progress indicator will appear. **Do not close the browser tab** during deployment. +**View Application Logs** -When the status shows **"Active"**, your deployment is complete. Proceed to the next section for accessing your application. +```bash +# Check for any errors in the application logs +kubectl logs -n datacommons -l app=datacommons --tail=50 +``` -### Step 4: Access Your Deployment +**Check Database** -All deployment outputs—resource names, connection strings, and commands—are available in: -**Infrastructure Manager** > **Deployments** > your deployment > **Outputs** tab. +1. Go to **Cloud SQL** > **Instances** in GCP Console +2. Click your instance +3. **Status** should show **Available** +4. Verify initialization is complete (takes 2–3 minutes after deployment) -#### Quick Access via Cloud Shell (Recommended) +### Access Application for Testing -The easiest way to access your deployment—no local tools needed: +**Important Security Note** -1. Go to **GKE** > click your cluster > click **Connect** -2. Click **Run in Cloud Shell** -3. Run the port-forward command: - ```bash - until kubectl port-forward -n NAMESPACE svc/datacommons 8080:8080; do echo "Port-forward crashed. Respawning..." >&2; sleep 1; done - ``` - (Replace `NAMESPACE` with your deployment name — the namespace matches your deployment name) -4. In the Cloud Shell toolbar, click **Web Preview** > **Preview on port 8080** +For security reasons, the application is deployed as a ClusterIP service without external access by default. This ensures your deployment is secure until you explicitly choose an exposure method. -#### Local Access via kubectl +**Quick Verification with Port-Forward** -If you have `gcloud` and `kubectl` installed locally: +For quick testing and verification of your deployment, use `kubectl port-forward` to create a temporary tunnel from your local machine to the application: -1. Configure kubectl: - ```bash - gcloud container clusters get-credentials CLUSTER --location=LOCATION --project=PROJECT - ``` -2. Port-forward: - ```bash - until kubectl port-forward -n NAMESPACE svc/datacommons 8080:8080; do echo "Port-forward crashed. Respawning..." >&2; sleep 1; done - ``` -3. Open http://localhost:8080 in your browser +```bash +# Forward local port 8080 to the application service +kubectl port-forward -n datacommons svc/datacommons 8080:80 -#### Production Access +# Keep this terminal open; the port-forward will run in the foreground +# In another terminal, test the application: +curl http://localhost:8080 -For external access, follow: -**[GCP Guide: Exposing Applications in GKE](https://docs.cloud.google.com/kubernetes-engine/docs/how-to/exposing-apps)** +# Or open http://localhost:8080 in your browser +``` -#### Next Steps +The application is NOT exposed to the internet with this method—you can only access it from your local machine. -Once your application is running, see the [User Guide](USER_GUIDE.md) for instructions on logging in, uploading custom data, and using the dashboards. +**For Production Access** + +To expose your application for production use, you must explicitly choose an exposure method and implement appropriate security controls. Follow the official GCP documentation: + +**[GCP Guide: Exposing Applications in GKE](https://docs.cloud.google.com/kubernetes-engine/docs/how-to/exposing-apps)** --- ## Using Data Commons Accelerator -For detailed instructions on configuring and using your deployed instance, see the **[User Guide](USER_GUIDE.md)**. +### Key Features + +**Statistical Data Explorer** + +- Browse curated datasets from public sources (census data, economic indicators, health statistics) +- Filter by country, region, or time period +- Compare data across different dimensions + +**Place Explorer** + +- Geographic data visualization (maps, charts) +- Population and demographic analysis by location +- Economic indicators by region + +**Timeline Tool** + +- Time-series analysis for tracking trends +- Historical data visualization +- Forecasting and trend analysis (if enabled) + +**Natural Language Queries** (if enabled) + +- Ask questions in plain English +- Example queries: + - "What is the population of California?" + - "Show me healthcare spending by country" + - "List the top 10 countries by GDP" + +**Custom Datasets** -For additional resources, refer to the official Data Commons documentation: +- Import your own data +- Combine with public Data Commons data +- Share datasets with team members -- [Custom Data Commons Documentation](https://docs.datacommons.org/custom_dc/) -- [Data Commons API Documentation](https://docs.datacommons.org/api) -- [Knowledge Graph Browser](https://datacommons.org/browser) +**Data Export** + +- Download query results in CSV, JSON, or other formats +- Export to Cloud Storage for analysis in other tools +- Build API integrations with external systems + +### Learning Resources + +To get started using Data Commons: + +- **Official Tutorials**: https://datacommons.org/tutorials +- **API Documentation**: https://docs.datacommons.org/api +- **Knowledge Graph Explorer**: https://datacommons.org/ (official site for learning about available data) +- **Custom Data Commons Guide**: https://docs.datacommons.org/custom_dc/ + +### Common Use Cases + +**Government Agencies** +- Publish open data to citizens +- Combine multiple datasets for policy analysis +- Track public health and economic indicators + +**Non-Profit Organizations** +- Research community statistics +- Track progress toward social impact goals +- Share data insights with stakeholders + +**Researchers and Academics** +- Access curated datasets for studies +- Combine public data with primary research +- Teach data analysis and visualization + +**Businesses** +- Market research using demographic data +- Competitive analysis by geography +- Economic trend analysis for strategy --- ## Troubleshooting -### Deployment Logs +### Deployment Fails During Provisioning + +**Symptoms:** +- Deployment shows "Error" status +- Infrastructure Manager shows failure message + +**Common causes and solutions:** + +**Issue: "GKE cluster not found"** +- Verify cluster name is spelled correctly +- Verify cluster is in the same project +- Go to **Kubernetes Engine** > **Clusters** to confirm cluster exists -1. Go to **Solution deployments** -2. Click the three-dot menu (⋮) next to your deployment -3. Select **Cloud Build log** -4. Review the Terraform execution log for provisioning errors +**Issue: "Insufficient permissions"** +- You lack required IAM roles +- Ask your GCP administrator to grant: + - Kubernetes Engine Admin + - Cloud SQL Admin + - Storage Admin + - Service Account Admin -**Common deployment errors:** -- **"GKE cluster not found"** — Verify cluster name and project match -- **"Insufficient permissions"** — Check [Required IAM Roles](#required-iam-roles) -- **"PSA not configured"** — See [PSA Issues](#private-service-access-issues) below +**Issue: "Private Service Access not configured"** +- VPC doesn't have Private Service Access enabled +- Ask your GCP administrator to configure it +- Or follow the [GCP Private Service Access guide](https://docs.cloud.google.com/sql/docs/mysql/configure-private-services-access) -### Pod Status and Logs +**Issue: "CloudSQL quota exceeded"** +- Your project has created too many CloudSQL instances +- Delete unused instances, or request quota increase +- Go to **Quotas & System Limits** to view and request increases -**GKE Console:** Kubernetes Engine > Workloads > filter by your deployment namespace (namespace matches your deployment name) +### Application Pods Don't Start -**Quick diagnostics from Cloud Shell:** +**Symptoms:** +- Pods show **Pending** or **CrashLoopBackOff** status +- Application URL doesn't load + +**To diagnose:** + +1. Go to **Kubernetes Engine** > **Workloads** +2. Filter by namespace: `datacommons` +3. Click the deployment +4. Click a pod name +5. **Logs** tab shows error messages + +**Common issues:** + +**"ImagePullBackOff"** or **"ErrImagePull"** +- Container images can't be pulled from registry +- Likely: Wrong API key or image tag +- Solution: Verify the `dc_api_key` in the deployment form is correct +- If wrong, contact support for instructions to update the secret + +**"CrashLoopBackOff"** +- Application is crashing immediately after starting +- Check logs (click pod > **Logs** tab) +- Common causes: + - Database not reachable (wait 2–3 minutes for CloudSQL to initialize) + - Invalid configuration in Kubernetes secrets + - Insufficient memory (try upgrading `resource_tier`) + +**"Pending" (pods won't start)** +- Cluster doesn't have capacity +- Check **Kubernetes Engine** > **Nodes** for available capacity +- If nodes are full, add more nodes to the cluster or delete other workloads + +### Can't Access Application from Browser + +**Symptoms:** +- Browser shows "Connection refused" or "ERR_CONNECTION_REFUSED" +- Or shows "Host unreachable" +- Port-forward fails to connect + +**Troubleshooting Steps:** + +**1. Verify Application Pods Are Running** ```bash -kubectl get pods -n NAMESPACE -kubectl describe pod POD_NAME -n NAMESPACE -kubectl logs -n NAMESPACE -l app.kubernetes.io/name=datacommons +# Check pod status +kubectl get pods -n datacommons + +# All pods should show "Running" status +# If any show "Pending" or "CrashLoopBackOff", see "Application Pods Don't Start" section ``` -**Common pod issues:** -- **Pending** — Cluster needs more capacity -- **CrashLoopBackOff** — Check logs; often CloudSQL still initializing (wait 2–3 min) -- **ImagePullBackOff** — Verify `dc_api_key` is correct - -### Private Service Access Issues +**2. Check the Service Exists** ```bash -# Check existing PSA ranges -gcloud compute addresses list --global --filter="purpose=VPC_PEERING" +# Verify service is created +kubectl get services -n datacommons + +# Service should exist and be type "ClusterIP" by default +# By default, the application is NOT exposed externally for security ``` -- **"Couldn't find free blocks"** — Use `psa_range_configuration: "create_16"` for more IPs -- **"Peering already exists"** — Use `psa_range_configuration: "existing"` with your existing range name +**3. Test Access with Port-Forward** + +```bash +# Create a temporary tunnel to test the application +kubectl port-forward -n datacommons svc/datacommons 8080:80 + +# In another terminal, verify connectivity +curl http://localhost:8080/healthz -### Port-Forward Connection Refused +# If this works, the application is running correctly +# Issue is with your exposure method (see below) +``` + +**4. Review Application Logs** -**Error:** +```bash +# Check for errors in application logs +kubectl logs -n datacommons -l app=datacommons --tail=100 -```text -E0206 portforward.go:424 "Unhandled Error" err="an error occurred forwarding 8080 -> 8080: connection refused" +# Look for error messages that might indicate misconfiguration +# Check for database connection errors (usually appear in first log entries) ``` -**Cause:** The port-forward connection drops when the application receives too many concurrent requests — for example, opening the `/explore` page which loads many data widgets simultaneously. It can also occur during pod startup while the application is initializing. +**Important Note** + +By default, the application is deployed as a **ClusterIP service**. This is secure by default. You must explicitly expose it using: +- `kubectl port-forward` (for testing) +- Ingress (for production with TLS) +- LoadBalancer service (for direct external IP) +- Other methods described in the [GCP Exposing Applications guide](https://docs.cloud.google.com/kubernetes-engine/docs/how-to/exposing-apps) + +If port-forward works but browser access doesn't, the issue is with your exposure method, not the application itself. + +### Private Service Access Issues + +**Error: "Failed to create subnetwork. Couldn't find free blocks"** + +**Cause:** The allocated IP range is exhausted or conflicts with existing ranges. + +**Solution:** +1. Use a larger prefix length: Set `psa_range_prefix_length: 16` (65k IPs) +2. Or use existing PSA: Set `create_psa_connection: false` and provide existing range + +**Error: "Cannot modify allocated ranges" or "Peering already exists"** + +**Cause:** PSA connection already exists with different configuration. -**Fix:** -1. If using the auto-retry loop (`until kubectl port-forward ...`), it will reconnect automatically -2. If running a single port-forward, simply re-run the command -3. If the error persists, check pod status: `kubectl get pods -n NAMESPACE` — ensure the pod is `Running` with `1/1` Ready +**Solution:** +1. Set `create_psa_connection: false` +2. Find your existing range: `gcloud compute addresses list --global --filter="purpose=VPC_PEERING"` +3. Provide the range name in `existing_psa_range_name` -### Error Loading GKE Cluster Location +**Error: "Address 'xxx' is not in VPC 'yyy'"** -**Error:** The Marketplace form shows "Error loading GKE Cluster Location" when selecting a cluster. +**Cause:** The provided PSA range is in a different VPC than your GKE cluster. -**Fix:** Refresh the browser page. This is a transient UI loading error. Your previously entered values may need to be re-entered. +**Solution:** +1. Verify your GKE cluster's VPC: `gcloud container clusters describe CLUSTER --format="value(network)"` +2. Find PSA ranges in that VPC: `gcloud compute addresses list --global --filter="purpose=VPC_PEERING AND network~YOUR_VPC"` +3. Provide the correct range name + +**Verifying PSA Configuration** + +Check if PSA is properly configured: +```bash +# List PSA IP ranges +gcloud compute addresses list --global --filter="purpose=VPC_PEERING" + +# List PSA connections +gcloud services vpc-peerings list --network=YOUR_VPC_NAME + +# Verify CloudSQL can use the range +gcloud sql instances describe INSTANCE_NAME --format="value(ipConfiguration.privateNetwork)" +``` --- ## Deleting Your Deployment -If you no longer need the Data Commons Accelerator, delete the deployment to stop incurring costs. +If you no longer need the Data Commons Accelerator, you can delete it to stop incurring costs. + +### Delete via Infrastructure Manager + +1. Go to **Infrastructure Manager** > **Deployments** +2. Click your deployment name +3. Click the **Delete** button +4. Confirm the deletion + +**Duration:** 2–5 minutes + +### What Gets Deleted + +These resources are automatically deleted: + +- Kubernetes namespace and all pods +- Helm release +- IAM service account bindings +- Kubernetes secrets + +### What Persists (Manual Cleanup Required) + +**Important:** These resources are NOT automatically deleted to prevent accidental data loss. You must delete them manually. + +**CloudSQL Instance** + +- Contains your database and datasets +- **To delete**: + 1. Go to **Cloud SQL** > **Instances** + 2. Click your instance + 3. Click **Delete** button + 4. Type the instance name to confirm + 5. Click **Delete** +- **Cost if left running**: ~$50–200/month depending on tier + +**Cloud Storage Bucket** + +- Contains your data exports and custom datasets +- **To delete**: + 1. Go to **Cloud Storage** > **Buckets** + 2. Click your bucket + 3. Click **Delete bucket** button + 4. Type the bucket name to confirm + 5. Click **Delete** +- **Warning**: Deletion is permanent if versioning not enabled +- **Cost if left running**: ~$0.015–0.020 per GB per month + +**Service Account** + +- Created for Workload Identity authentication +- **To delete** (optional): + 1. Go to **IAM & Admin** > **Service Accounts** + 2. Find the service account (name contains `datacommons`) + 3. Click the account + 4. Click **Delete** button + 5. Confirm deletion +- **Cost if left running**: Free (service accounts have no direct cost) + +### Before Deleting + +**Backup Important Data** + +1. **Export from CloudSQL**: + - Go to **Cloud SQL** > **Instances** > [your instance] + - Click **Export** button + - Choose export format (CSV, JSON, SQL dump) + - Export to GCS bucket (if you're keeping the bucket) + +2. **Download from GCS**: + - Go to **Cloud Storage** > **Buckets** > [your bucket] + - Select files you want to keep + - Click **Download** to save locally + +### After Deletion + +**Cost Implications:** + +- If you deleted the Infrastructure Manager deployment but kept CloudSQL and GCS: + - You continue paying ~$50–200/month for CloudSQL + - You continue paying for GCS storage +- **Delete CloudSQL and GCS** to stop all charges + +**Redeployment:** -1. Go to [Google Cloud Console](https://console.cloud.google.com) -2. Search for "Solution deployments" -3. Find your deployment and click the **three-dot menu** (⋮) -4. Click **Delete** -5. Confirm the deletion +- CloudSQL and GCS data are deleted when you delete those resources +- To deploy again, you'll start from scratch +- The Infrastructure Manager deployment can be deleted and redeployed (you have the configuration saved) From 02746dd3989fb4d71abe13c687c7437b5e080546 Mon Sep 17 00:00:00 2001 From: Artur Date: Wed, 25 Feb 2026 15:28:02 +0100 Subject: [PATCH 2/3] update documentation, terraform, helm charts to support new cluster logic --- docs/DEPLOYMENT_GUIDE.md | 1073 +++-------------- docs/MARKETPLACE_FIELDS.md | 390 +----- docs/USER_GUIDE.md | 258 ++-- mp-pkg/charts/datacommons/.helmignore | 35 + mp-pkg/charts/datacommons/Chart.yaml | 34 + mp-pkg/charts/datacommons/crds/app-crd.yaml | 532 ++++++++ mp-pkg/charts/datacommons/templates/NOTES.txt | 85 ++ .../charts/datacommons/templates/_helpers.tpl | 104 ++ .../datacommons/templates/application.yaml | 92 ++ .../datacommons/templates/configmap.yaml | 82 ++ .../datacommons/templates/db-init-job.yaml | 76 ++ .../templates/db-sync-cronjob.yaml | 77 ++ .../datacommons/templates/deployment.yaml | 130 ++ .../charts/datacommons/templates/secret.yaml | 37 + .../charts/datacommons/templates/service.yaml | 34 + .../datacommons/templates/serviceaccount.yaml | 33 + mp-pkg/charts/datacommons/values.yaml | 221 ++++ mp-pkg/terraform/README.md | 152 +++ mp-pkg/terraform/api-keys.tf | 37 + mp-pkg/terraform/cloudsql.tf | 131 ++ mp-pkg/terraform/gcs.tf | 71 ++ mp-pkg/terraform/gke.tf | 109 ++ mp-pkg/terraform/helm.tf | 210 ++++ mp-pkg/terraform/locals.tf | 164 +++ mp-pkg/terraform/main.tf | 131 ++ mp-pkg/terraform/marketplace_test.tfvars | 31 + mp-pkg/terraform/metadata.display.yaml | 164 +++ mp-pkg/terraform/metadata.yaml | 141 +++ mp-pkg/terraform/modules/cloudsql/main.tf | 127 ++ mp-pkg/terraform/modules/cloudsql/outputs.tf | 107 ++ .../terraform/modules/cloudsql/variables.tf | 170 +++ mp-pkg/terraform/modules/cloudsql/versions.tf | 32 + mp-pkg/terraform/modules/gcs-bucket/main.tf | 92 ++ .../terraform/modules/gcs-bucket/outputs.tf | 37 + .../terraform/modules/gcs-bucket/variables.tf | 107 ++ .../terraform/modules/gcs-bucket/versions.tf | 24 + mp-pkg/terraform/modules/k8s-secrets/main.tf | 30 + .../terraform/modules/k8s-secrets/outputs.tf | 32 + .../modules/k8s-secrets/variables.tf | 75 ++ .../terraform/modules/k8s-secrets/versions.tf | 24 + .../terraform/modules/maps-api-keys/main.tf | 70 ++ .../modules/maps-api-keys/outputs.tf | 49 + .../modules/maps-api-keys/variables.tf | 75 ++ .../modules/maps-api-keys/versions.tf | 28 + mp-pkg/terraform/nat.tf | 54 + mp-pkg/terraform/outputs.tf | 68 ++ mp-pkg/terraform/schema.yaml | 31 + mp-pkg/terraform/secrets.tf | 89 ++ mp-pkg/terraform/variables.tf | 252 ++++ mp-pkg/terraform/versions.tf | 48 + mp-pkg/terraform/vpc.tf | 68 ++ mp-pkg/terraform/workload-identity.tf | 64 + 52 files changed, 5028 insertions(+), 1359 deletions(-) create mode 100644 mp-pkg/charts/datacommons/.helmignore create mode 100644 mp-pkg/charts/datacommons/Chart.yaml create mode 100644 mp-pkg/charts/datacommons/crds/app-crd.yaml create mode 100644 mp-pkg/charts/datacommons/templates/NOTES.txt create mode 100644 mp-pkg/charts/datacommons/templates/_helpers.tpl create mode 100644 mp-pkg/charts/datacommons/templates/application.yaml create mode 100644 mp-pkg/charts/datacommons/templates/configmap.yaml create mode 100644 mp-pkg/charts/datacommons/templates/db-init-job.yaml create mode 100644 mp-pkg/charts/datacommons/templates/db-sync-cronjob.yaml create mode 100644 mp-pkg/charts/datacommons/templates/deployment.yaml create mode 100644 mp-pkg/charts/datacommons/templates/secret.yaml create mode 100644 mp-pkg/charts/datacommons/templates/service.yaml create mode 100644 mp-pkg/charts/datacommons/templates/serviceaccount.yaml create mode 100644 mp-pkg/charts/datacommons/values.yaml create mode 100644 mp-pkg/terraform/README.md create mode 100644 mp-pkg/terraform/api-keys.tf create mode 100644 mp-pkg/terraform/cloudsql.tf create mode 100644 mp-pkg/terraform/gcs.tf create mode 100644 mp-pkg/terraform/gke.tf create mode 100644 mp-pkg/terraform/helm.tf create mode 100644 mp-pkg/terraform/locals.tf create mode 100644 mp-pkg/terraform/main.tf create mode 100644 mp-pkg/terraform/marketplace_test.tfvars create mode 100644 mp-pkg/terraform/metadata.display.yaml create mode 100644 mp-pkg/terraform/metadata.yaml create mode 100644 mp-pkg/terraform/modules/cloudsql/main.tf create mode 100644 mp-pkg/terraform/modules/cloudsql/outputs.tf create mode 100644 mp-pkg/terraform/modules/cloudsql/variables.tf create mode 100644 mp-pkg/terraform/modules/cloudsql/versions.tf create mode 100644 mp-pkg/terraform/modules/gcs-bucket/main.tf create mode 100644 mp-pkg/terraform/modules/gcs-bucket/outputs.tf create mode 100644 mp-pkg/terraform/modules/gcs-bucket/variables.tf create mode 100644 mp-pkg/terraform/modules/gcs-bucket/versions.tf create mode 100644 mp-pkg/terraform/modules/k8s-secrets/main.tf create mode 100644 mp-pkg/terraform/modules/k8s-secrets/outputs.tf create mode 100644 mp-pkg/terraform/modules/k8s-secrets/variables.tf create mode 100644 mp-pkg/terraform/modules/k8s-secrets/versions.tf create mode 100644 mp-pkg/terraform/modules/maps-api-keys/main.tf create mode 100644 mp-pkg/terraform/modules/maps-api-keys/outputs.tf create mode 100644 mp-pkg/terraform/modules/maps-api-keys/variables.tf create mode 100644 mp-pkg/terraform/modules/maps-api-keys/versions.tf create mode 100644 mp-pkg/terraform/nat.tf create mode 100644 mp-pkg/terraform/outputs.tf create mode 100644 mp-pkg/terraform/schema.yaml create mode 100644 mp-pkg/terraform/secrets.tf create mode 100644 mp-pkg/terraform/variables.tf create mode 100644 mp-pkg/terraform/versions.tf create mode 100644 mp-pkg/terraform/vpc.tf create mode 100644 mp-pkg/terraform/workload-identity.tf diff --git a/docs/DEPLOYMENT_GUIDE.md b/docs/DEPLOYMENT_GUIDE.md index 83fa300..a12c094 100644 --- a/docs/DEPLOYMENT_GUIDE.md +++ b/docs/DEPLOYMENT_GUIDE.md @@ -1,6 +1,4 @@ -# Data Commons Accelerator - GCP Marketplace User Guide - -A complete guide to deploying and managing Data Commons Accelerator through Google Cloud Marketplace. +# Data Commons Accelerator - GCP Marketplace Deployment Guide --- @@ -10,11 +8,9 @@ A complete guide to deploying and managing Data Commons Accelerator through Goog 2. [Architecture](#architecture) 3. [Prerequisites](#prerequisites) 4. [Deployment via GCP Marketplace](#deployment-via-gcp-marketplace) -5. [Accessing Your Deployment](#accessing-your-deployment) -6. [Using Data Commons Accelerator](#using-data-commons-accelerator) -7. [Managing Your Deployment](#managing-your-deployment) -8. [Troubleshooting](#troubleshooting) -9. [Deleting Your Deployment](#deleting-your-deployment) +5. [Using Data Commons Accelerator](#using-data-commons-accelerator) +6. [Troubleshooting](#troubleshooting) +7. [Deleting Your Deployment](#deleting-your-deployment) --- @@ -22,37 +18,32 @@ A complete guide to deploying and managing Data Commons Accelerator through Goog ### What is Data Commons Accelerator? -Data Commons Accelerator is a ready-to-deploy instance of the Data Commons platform on Google Kubernetes Engine (GKE). Data Commons is an open knowledge repository providing unified access to public datasets and statistics, enabling your organization to explore data without manually aggregating from multiple sources. +Data Commons Accelerator is a ready-to-deploy instance of **Custom Data Commons** on Google Kubernetes Engine (GKE). [Data Commons](https://docs.datacommons.org/what_is.html) is an open knowledge repository providing unified access to public datasets and statistics. For more details on custom Data Commons and its benefits, see [Custom Data Commons documentation](https://docs.datacommons.org/custom_dc/). ### What Problems Does It Solve? -Data Commons Accelerator addresses these common data exploration challenges: +Data Commons Accelerator simplifies deploying a custom Data Commons instance—removing the complexity of infrastructure setup, Kubernetes configuration, and cloud resource provisioning. It empowers domain experts and data analysts to quickly create a custom Data Commons, integrate their datasets, and leverage the public Data Commons knowledge graph. -- **Data Fragmentation**: Public datasets scattered across multiple sources with different formats -- **Time-to-Insight**: Analysts spend weeks aggregating and standardizing data manually -- **Lack of Context**: Difficulty understanding relationships and connections between datasets -- **Scalability**: Traditional approaches break down with large datasets and many concurrent users -- **Customization**: Need to combine public data with proprietary internal datasets +**Value Proposition:** -### Who Should Use It? +- **Simplicity**: Deploy a complex data solution with a few clicks, eliminating manual setup and configuration +- **Efficiency**: Streamline custom Data Commons adoption via Marketplace, bypassing traditional sales cycles +- **Accelerated Time-to-Insight**: Quickly join proprietary datasets with public data (census, economic, weather) to unlock new correlations +- **Empowerment**: Natural language interface allows non-technical users to query data without code -Data Commons Accelerator is ideal for: +### Who Should Use It? -- **Data analysts** exploring public datasets and statistics -- **Researchers** studying demographic, economic, or environmental trends -- **Government agencies** publishing and analyzing public statistics -- **Non-profit organizations** working with community data -- **Academic institutions** teaching data analysis and visualization -- **Enterprise teams** integrating public data with business intelligence +See [Custom Data Commons: When do I need a custom instance?](https://docs.datacommons.org/custom_dc/) for use cases and audience guidance. ### What Gets Deployed? This solution provisions a complete data exploration platform: - **Data Commons Accelerator Web Application**: Interactive interface for data exploration and visualization +- **GKE Cluster**: A new GKE cluster is created automatically +- **VPC and networking**: Network, subnet, Cloud Router, and Cloud NAT are set up automatically - **CloudSQL MySQL Database**: Persistent storage for datasets and metadata (with optional high availability) -- **Cloud Storage Bucket**: Scalable storage for custom data imports and exports -- **Kubernetes Workload**: Application deployed to your existing GKE cluster with Workload Identity authentication +- **Cloud Storage Bucket**: Scalable storage for custom data imports - **Service Account**: Secure identity for accessing cloud resources No additional infrastructure setup is required—everything integrates with your existing GCP project. @@ -63,58 +54,29 @@ No additional infrastructure setup is required—everything integrates with your ### Components -The Data Commons Accelerator solution consists of four primary components: - -**1. GKE Application Container** - -Your Data Commons Accelerator application runs as Kubernetes pods in your existing GKE cluster. The application: - -- Provides the web UI for data exploration -- Handles API requests from users and external clients -- Processes statistical queries and visualizations -- Integrates with the Google Maps API for geospatial features - -The application runs in a dedicated Kubernetes namespace (default: `datacommons`) to keep it isolated from other workloads on your cluster. - -**2. CloudSQL Database** - -A managed MySQL database stores: - -- Statistical datasets and curated public data -- Metadata describing available datasets -- User-created custom datasets -- Query history and saved visualizations - -The database is deployed to a private IP address (via VPC Private Service Access) for security. It never exposes a public IP to the internet. Database replicas and backups are automatically managed by Google Cloud. - -**3. Cloud Storage Bucket** +This Marketplace solution deploys the following GCP resources: -A GCS bucket stores: +| Component | Description | +|-----------|-------------| +| **GKE Cluster** | A new GKE cluster is created for you during deployment | +| **VPC and networking** | Network, subnet, Cloud Router, and Cloud NAT — all created automatically | +| **GKE Workload** | Data Commons application pods deployed to the cluster (namespace matches your deployment name) | +| **CloudSQL MySQL** | Managed database with private connectivity for dataset storage | +| **GCS Bucket** | Cloud Storage for custom data imports | +| **Service Account** | Secure identity for accessing CloudSQL, GCS, and Maps API | +| **db-init Job** | One-time job that initializes the database schema | +| **db-sync CronJob** | Recurring job that syncs custom data from Cloud Storage to the database (every 3 hours) | -- Custom datasets you import -- Exported data in various formats -- Query results and visualizations -- Temporary files during data processing - -You control who can access the bucket via GCP IAM permissions. - -**4. Workload Identity** - -A Google Cloud service account authenticates the application to cloud resources: - -- CloudSQL access via Workload Identity binding -- GCS bucket access via IAM roles -- Google Maps API calls via API key -- All credentials managed securely (no keys stored in pods) +For details on Data Commons architecture and how the application works internally, see [Custom Data Commons documentation](https://docs.datacommons.org/custom_dc/). ### How Components Interact ``` -User Browser - │ - ├─> GKE Pod (Data Commons Application) - │ │ - │ ├─> CloudSQL Database (private IP) +User Browser GCS Bucket (Custom Data) + │ │ + ├─> GKE Pod (Data Commons App) ├─> db-sync CronJob (every 3 hours) + │ │ │ + │ ├─> CloudSQL Database <───────┘ │ │ └─> Dataset storage, queries │ │ │ ├─> GCS Bucket @@ -124,160 +86,89 @@ User Browser │ └─> Geospatial visualization │ └─> Infrastructure Manager - └─> Manages Kubernetes resources + └─> Creates all related components/infrastructure of Data Commons (k8s resources, CloudSQL, IAM, GCS etc.) ``` **Deployment Workflow:** 1. You fill out the GCP Marketplace form with your preferences -2. Infrastructure Manager uses Terraform to provision CloudSQL, create GCS bucket, and bind service account -3. Helm deploys the Data Commons application to your GKE cluster -4. All resources are linked via Workload Identity and VPC Private Service Access +2. Infrastructure Manager uses Terraform to provision the cluster, database, Cloud Storage bucket, and service account +3. The Data Commons application is deployed to your cluster +4. All resources are linked via Workload Identity and private database connectivity --- ## Prerequisites -### Required Before Deployment - -**1. Existing GKE Cluster** - -You must have an existing GKE cluster in your GCP project where Data Commons will be deployed. - -- **Minimum version**: Kubernetes 1.27 or higher -- **Cluster type**: Standard or Autopilot (both supported) -- **Workload Identity**: Must be enabled (enabled by default on GKE 1.27+) -- **Network**: VPC with Private Service Access configured - -To verify your cluster exists and meets requirements: - -1. Go to **Kubernetes Engine** in the GCP Console -2. Click **Clusters** -3. Find your cluster in the list -4. Click the cluster name to see details -5. Verify the **Cluster version** is 1.27 or higher -6. Check that **Workload Identity** is shown as "Enabled" - -If you need to create a cluster first, see the [GCP Kubernetes Engine documentation](https://docs.cloud.google.com/kubernetes-engine/docs/resources/autopilot-standard-feature-comparison). +Before deploying Data Commons Accelerator, ensure you have the following: -**Important**: The deployment form will ask for your cluster name and location. +### Infrastructure Requirements -**2. Private Service Access (PSA) Configuration** +No existing infrastructure is required. The solution creates all necessary resources automatically, including: -Data Commons Accelerator uses CloudSQL with private IP connectivity, which requires Private Service Access (PSA) to be configured between your VPC network and Google's service producer network. +- GKE cluster +- VPC network with subnet, Cloud Router, and Cloud NAT +- CloudSQL MySQL database with private connectivity +- Cloud Storage bucket +- Service accounts and IAM bindings -**Option A: Automatic PSA Creation (Recommended for New Users)** +### Required IAM Roles -**Default behavior** - The solution will automatically: -- Allocate a /20 IP range (4,096 addresses) in your VPC -- Create Private Service Access connection for CloudSQL -- Configure VPC peering with Google's service producer network - -**Requirements:** -- Service Networking API must be enabled (solution enables automatically) -- User must have `compute.networkAdmin` role or equivalent -- Sufficient IP space in your VPC (solution uses /20 by default) - -**Configuration:** -```yaml -create_psa_connection: true # default - no action needed -psa_range_prefix_length: 20 # 4,096 IPs for production -``` - -**Option B: Use Existing PSA (Recommended for Enterprise)** - -**When to use:** -- Your organization already has PSA configured for the VPC -- Multiple deployments in the same project/VPC -- Centralized network management by dedicated team -- Existing CloudSQL or other services using PSA - -**Find your existing PSA range:** -```bash -gcloud compute addresses list --global \ - --filter="purpose=VPC_PEERING AND network~YOUR_VPC_NAME" \ - --format="table(name,address,prefixLength,network)" -``` - -**Verify PSA connection exists:** -```bash -gcloud services vpc-peerings list \ - --network=YOUR_VPC_NAME \ - --project=YOUR_PROJECT_ID -``` - -**Configuration:** -```yaml -create_psa_connection: false -existing_psa_range_name: "your-psa-range-name" # from gcloud command above -``` - -**Important:** The PSA range must be in the same VPC network as your GKE cluster. The solution automatically derives the VPC from your GKE cluster configuration. - -**Checking Your GKE Cluster's VPC** - -To verify which VPC network your GKE cluster uses: -```bash -gcloud container clusters describe YOUR_CLUSTER_NAME \ - --location=YOUR_LOCATION \ - --format="value(network)" -``` - -The solution automatically uses this VPC for PSA configuration. - -**3. Required IAM Permissions** - -Your Google Cloud user account must have these roles on the GCP project: +Your Google Cloud user account must have at least these roles assigned on the GCP project to deploy solution from Marketplace: | Role | Purpose | |------|---------| -| Marketplace Admin | Deploy solutions from GCP Marketplace | -| Kubernetes Engine Admin | Create and manage GKE resources | -| Cloud SQL Admin | Create and manage CloudSQL instances | -| Storage Admin | Create and manage GCS buckets | -| Service Account Admin | Create and manage service accounts | -| Service Account User | Bind identities to workloads | +| Cloud Infrastructure Manager Admin | Deploy and manage Infrastructure Manager deployments | +| Infrastructure Administrator | Manage infrastructure resources provisioned by Terraform | +| Kubernetes Engine Developer | Deploy workloads to GKE clusters | +| Project IAM Admin | Assign IAM roles to service accounts | +| Service Account Admin | Create and manage GCP service accounts | +| Service Account User | Act as service accounts for Workload Identity | -To verify your permissions: +Contact your GCP administrator if you are missing any roles. -1. Go to **IAM & Admin** in the GCP Console -2. Click **IAM** -3. Find your user email in the list -4. Click the role(s) next to your email -5. Verify the required roles are listed +#### Deployment Service Account (Automatic) -If you're missing roles, contact your GCP administrator to grant them. +The deployment automatically creates a Service Account with the following roles. These are **not** assigned to users — they are used by the application and Infrastructure Manager: -**4. Data Commons API Key** +| Role | Purpose | +|------|---------| +| roles/container.admin | GKE cluster and workload management | +| roles/storage.admin | Read/write access to the GCS data bucket | +| roles/cloudsql.admin | Database instance management | +| roles/config.agent | Infrastructure Manager operations | +| roles/iam.infrastructureAdmin | Infrastructure resource management | +| roles/iam.serviceAccountAdmin | Service account lifecycle management | +| roles/iam.serviceAccountUser | Act as service accounts for Workload Identity binding | +| roles/serviceusage.serviceUsageAdmin | API enablement | +| roles/serviceusage.apiKeysAdmin | API key management | +| roles/resourcemanager.projectIamAdmin | IAM binding management | + +### Roles for Working with the Deployed Solution + +After deployment, your team members will need GCP IAM roles to interact with the deployed resources. + +| Role | What It Allows | +|------|----------------| +| Kubernetes Engine Developer | Application workload management on GKE | +| Cloud SQL Client | Connect directly to CloudSQL for database debugging | +| Cloud SQL Admin | Modify database configuration, manage backups and replicas | +| Cloud Infrastructure Manager Viewer | View deployment state and Terraform outputs | +| Cloud Infrastructure Manager Admin | Redeploy, update Terraform variables, manage deployment lifecycle | +| Storage Object Viewer | Download and read files from the GCS data bucket | +| Storage Object Admin | Upload, modify, and delete files in the GCS data bucket | +| API Keys Admin | Rotate or manage the Google Maps API key | + +### Data Commons API Key Data Commons Accelerator requires an API key to access the Data Commons knowledge graph. -To get your API key: - -1. Visit https://docs.datacommons.org/custom_dc/quickstart.html -2. Follow the "Get a Data Commons API Key" section -3. You'll receive your API key -4. Save it somewhere secure—you'll need it during deployment - -**Security note**: This key authenticates your instance to Data Commons. Keep it confidential and never commit it to version control. - -### Pre-Deployment Checklist - -Before proceeding to the GCP Marketplace form, verify: - -- [ ] GKE cluster exists and version is 1.27+ -- [ ] Workload Identity is enabled on your cluster -- [ ] Private Service Access is configured -- [ ] You have required IAM roles -- [ ] You have your Data Commons API key saved -- [ ] You've estimated monthly costs and have budget approved -- [ ] You have 15-20 minutes available for deployment -- [ ] You know your GKE cluster name and location +To obtain your API key, visit [Data Commons API documentation](https://docs.datacommons.org/custom_dc/quickstart.html/) and follow the "Get a Data Commons API Key" section. Save the key securely—you will need it during deployment. --- ## Deployment via GCP Marketplace -This section walks through the complete GCP Marketplace deployment process, field by field. +This section walks through deploying Data Commons Accelerator via GCP Marketplace. ### Step 1: Navigate to GCP Marketplace @@ -289,764 +180,160 @@ This section walks through the complete GCP Marketplace deployment process, fiel 6. Review the solution overview, pricing information, and documentation 7. Click the **Deploy** button (or **Get Started**, depending on UI) -### Step 2: Deployment Configuration Form - -The Marketplace will open a deployment configuration form. This form has multiple sections. We'll walk through each field. - -#### Section 1: Basic Configuration - -This section identifies your deployment and project. - -**Deployment Name** - -- **Field name**: `deployment_name` -- **What it means**: A friendly identifier for this deployment in GCP Marketplace and Infrastructure Manager -- **Format**: Lowercase letters, numbers, hyphens only (no spaces or special characters) -- **Examples**: `datacommons-prod`, `data-commons-team-analytics`, `dc-staging` -- **Why it matters**: Used for tracking multiple deployments and resource naming -- **Recommendation**: Use a descriptive name that includes environment and purpose (e.g., `datacommons-prod` not `deploy123`) - -**Project** - -- **Field name**: `project_id` -- **What it means**: Your GCP project where all resources will be created -- **Format**: Project ID (find in **Settings** > **Project** > **Project ID**) -- **Examples**: `my-project-123456`, `data-analytics-prod` -- **Why it matters**: All resources (database, storage, service accounts) are created in this project -- **Recommendation**: Must be the same project as your GKE cluster. If you don't see your project in the dropdown, you may lack Marketplace Admin permissions—contact your GCP administrator - -**Region** - -- **Field name**: `region` -- **What it means**: Default region for regional resources (CloudSQL, networking) -- **Format**: Region code (see [GCP Regions](https://cloud.google.com/compute/docs/regions-zones)) -- **Examples**: `us-central1`, `us-east1`, `europe-west1`, `asia-southeast1` -- **Why it matters**: Affects latency (choose region near your users) and pricing (slight variations by region) -- **Recommendation**: - - Choose region closest to your users - - Must have same VPC connectivity as your GKE cluster - - Can't be changed after deployment (don't change later) - -#### Section 2: GKE Cluster Configuration - -This section specifies which existing GKE cluster to deploy to. - -**GKE Cluster Name** - -- **Field name**: `gke_cluster_name` -- **What it means**: Name of your existing GKE cluster (not created by this solution) -- **Format**: Cluster name as shown in **Kubernetes Engine** > **Clusters** -- **Examples**: `prod-gke-cluster`, `analytics-cluster-1`, `us-central1-cluster` -- **Why it matters**: The application pods will run on this cluster -- **Requirement**: Cluster must exist, be running, and have Workload Identity enabled -- **Recommendation**: Use dropdown to select from existing clusters in your project. Don't type manually—use the selection dropdown to avoid typos - -**GKE Cluster Location** - -- **Field name**: `gke_cluster_location` -- **What it means**: Geographic location of your GKE cluster (region or zone) -- **Format**: Region (e.g., `us-central1`) or Zone (e.g., `us-central1-a`) -- **Examples**: `us-central1` (regional cluster), `us-central1-a` (zonal cluster) -- **Why it matters**: Must match actual cluster location for deployment to work -- **Recommendation**: If you selected a cluster name above, this may auto-populate. Verify it matches the cluster's actual location from the **Kubernetes Engine** console - -**Kubernetes Namespace** - -- **Field name**: `namespace` -- **What it means**: Kubernetes namespace where the application pods will run -- **What is a namespace?**: A logical partition of your Kubernetes cluster, like a folder. Namespaces allow multiple applications to run on the same cluster without interfering -- **Format**: Lowercase letters and hyphens only -- **Default**: `datacommons` -- **Examples**: `datacommons`, `datacommons-prod`, `analytics` -- **Why it matters**: Keeps this deployment isolated from other applications on your cluster -- **Recommendation**: Use the default `datacommons` unless your organization has a naming standard (e.g., `namespace-{environment}`). The namespace will be created automatically if it doesn't exist - -#### Section 3: CloudSQL Database Configuration - -This section configures the managed MySQL database that stores datasets. - -**CloudSQL Machine Tier** - -- **Field name**: `cloudsql_tier` -- **What it means**: The processing power and memory of your database server -- **Why it matters**: Directly impacts query performance and cost. Higher tiers = faster queries but higher monthly cost -- **Default**: `db-n1-standard-1` - -**Recommendation:** -- **New deployments**: Choose `db-n1-standard-1` (good balance of cost and performance) -- **Proof-of-concept**: Use `db-f1-micro` to minimize costs -- **High-traffic production**: Use `db-n1-standard-2` or higher -- **You can upgrade later**: After deployment, you can change the tier via Infrastructure Manager update (5-10 minutes downtime) - -**CloudSQL Disk Size** - -- **Field name**: `cloudsql_disk_size` -- **What it means**: Storage space available for the database -- **Unit**: Gigabytes (GB) -- **Minimum**: 10 GB -- **Default**: 20 GB -- **Maximum**: 65,536 GB (65 TB) -- **Why it matters**: Stores datasets and metadata. Database automatically expands as you add data +### Step 2: Complete the Deployment Configuration Form -**Recommendation:** -- **Default (20 GB)** is suitable for most deployments -- **Start small**: You don't need to predict exact size. Disk auto-grows -- **Monitor usage**: Check **Cloud SQL** > **Instances** > [your instance] > **Overview** tab for current size -- **Estimate**: - - Empty installation: ~2 GB - - With public Data Commons data: +5–10 GB - - Custom datasets: Add estimated size -- **Example**: If you plan to import 50 GB of custom data, allocate 60 GB initially - -**CloudSQL Region (Optional)** - -- **Field name**: `cloudsql_region` -- **What it means**: Geographic region where the database is created (default: uses the `region` parameter from Section 1) -- **Format**: Region code, or leave empty to use default -- **Examples**: `us-central1`, `us-east1`, `europe-west1` -- **Why it matters**: Affects latency and must align with your VPC's Private Service Access configuration -- **Recommendation**: Leave empty unless you have a specific reason to override. The default region aligns with your application region - -**High Availability** - -- **Field name**: `cloudsql_ha_enabled` -- **What it means**: Enables automatic database replication to a different availability zone -- **How it works**: - - **Disabled (default)**: Single database instance in one zone. If zone fails, data is lost - - **Enabled**: Two instances (primary + replica) in different zones. If one zone fails, automatically switches to replica with zero downtime -- **Cost impact**: Approximately doubles the monthly database cost -- **Downtime**: 0 minutes (automatic failover) - -**Recommendation:** -- **Production deployments**: Enable High Availability. The extra cost is worth the reliability -- **Non-production**: Disable to save costs -- **Can be changed later**: Update via Infrastructure Manager - -#### Section 4: Cloud Storage Configuration - -This section configures the GCS bucket for data storage. - -**GCS Bucket Name** - -- **Field name**: `gcs_bucket_name` -- **What it means**: Name of the Cloud Storage bucket (like a drive on the cloud) -- **Format**: Must be globally unique across all GCP projects. Only lowercase letters, numbers, hyphens, periods -- **Pattern**: `{project}-{purpose}-{random}` to ensure uniqueness -- **Examples**: - - `datacommons-prod` - - `mycompany-dc-analytics` - - `datacommons-team-project1` -- **Why it matters**: Stores custom datasets, exports, and query results. Must be unique -- **Recommendation**: Include your project name and environment to make it identifiable. Avoid generic names like `data` or `bucket1` - -**GCS Storage Class** - -- **Field name**: `gcs_storage_class` -- **What it means**: How your data is stored (affects cost and access speed) -- **Default**: `STANDARD` -- **Monthly cost per GB**: Varies by class (see table below) - -**Recommendation:** -- **New deployments**: Use `STANDARD` (best performance, reasonable cost) -- **Cost optimization**: Use `NEARLINE` if you access archived data less than monthly -- **Can't be changed later**: Choose carefully. To change, you'd need to migrate data to a new bucket -- **Best practice**: Use `STANDARD` initially; if you have archived data, create separate buckets with different storage classes - -**GCS Location** - -- **Field name**: `gcs_location` -- **What it means**: Geographic region or multi-region where the bucket is stored -- **Default**: `US` -- **Multi-region options**: `US`, `EU`, `ASIA` (replicates across multiple regions for redundancy) -- **Single-region options**: `us-central1`, `us-east1`, `europe-west1`, etc. - -**Recommendation:** -- **Most deployments**: Use `US`, `EU`, or `ASIA` based on your users' location -- **Cost optimization**: Use single-region locations only if cost is critical and you accept single-point-of-failure risk -- **Data residency**: EU if you have European data requiring GDPR compliance - -#### Section 5: API Configuration - -This section provides the Data Commons API key that authenticates your instance. - -**Data Commons API Key** - -- **Field name**: `dc_api_key` -- **What it means**: Authentication token for accessing the Data Commons knowledge graph -- **Where to get it**: You obtained this in the Prerequisites section (https://docs.datacommons.org/custom_dc/quickstart.html) -- **Why it matters**: Authenticates your instance to Data Commons for accessing the knowledge graph -- **Security**: The key is stored as a Kubernetes secret. Never commit to version control - -**Recommendation:** -- **Paste carefully**: Copy from your secure note -- **Verify**: Double-check there are no extra spaces or characters -- **Can't be changed via form**: If you enter the wrong key, you'll need to manually update the Kubernetes secret after deployment. Contact your GCP administrator if needed - -#### Section 6: Application Configuration - -This section controls how the application runs on your cluster. - -**Application Replicas** - -- **Field name**: `app_replicas` -- **What it means**: Number of application pods running simultaneously -- **Default**: `1` -- **Range**: 1–10 pods -- **How it affects cost**: Each replica uses cloud resources. 2 replicas ≈ 2x resource cost -- **How it affects reliability**: More replicas = better resilience to pod failures +The Marketplace will open a deployment configuration form. Enter your **Deployment Name** and select a **GCP Region** at the top, then configure **Application Settings** (resource tier and domain template) and provide your **API Keys** (Data Commons API key). A new GKE cluster is created automatically. -**Recommendation:** -- **Development**: Use 1 replica -- **Production**: Use 2 replicas -- **High-traffic**: Use 3–4 replicas -- **Can be changed later**: Update via Infrastructure Manager - -**Resource Tier** +> [!TIP] +> Each field has built-in tooltips with detailed guidance—hover over or click the help icon next to any field for clarification. The form validates your inputs and shows clear error messages if anything is incorrect. -- **Field name**: `resource_tier` -- **What it means**: CPU and memory allocated to each application pod -- **Default**: `medium` -- **Options**: `small`, `medium`, `large` -- **How it affects cost**: Each pod's resources are billed separately +For detailed descriptions of every form field, valid values, and tips, see [Marketplace Fields Reference](MARKETPLACE_FIELDS.md). -**Resource Tier Specifications:** - -| Tier | CPU | Memory | Best For | -|------|-----|--------|--------| -| `small` | 1.0 | 2 GB| Light workloads, <10 concurrent users | -| `medium` | 2.0 | 4 GB | Standard workloads, 10–100 concurrent users | -| `large` | 4.0 | 8 GB | Heavy workloads, >100 concurrent users, complex queries | - - -**Recommendation:** -- **Default (`medium`)** works for most deployments -- **Start with medium**: If you experience slow queries or high CPU, upgrade to `large` -- **Reduce to small**: If you're under-utilizing resources -- **Can be changed later**: Update via Infrastructure Manager (requires pod restart, 1–2 minutes downtime) - -**Enable Natural Language Queries** - -- **Field name**: `enable_natural_language` -- **What it means**: Allow users to ask questions in plain English (e.g., "Show me healthcare spending by state") -- **Default**: Enabled -- **How it works**: System translates natural language to statistical queries - -**Recommendation:** -- **Most deployments**: Keep enabled. Users find it helpful -- **Specialized deployments**: Disable only if you have specific technical requirements -- **Can be changed later**: Update via Infrastructure Manager - -**Enable Data Sync** - -- **Field name**: `enable_data_sync` -- **What it means**: Automatically synchronize custom data from GCS bucket to CloudSQL database -- **Default**: Enabled -- **What it does**: Enables scheduled import of custom datasets you upload to GCS into CloudSQL for querying. Without it, you manage data imports manually -- **Performance impact**: Runs during off-peak hours; minimal performance impact - -**Recommendation:** -- **Most deployments**: Keep enabled. You get fresh data automatically -- **Custom-only deployments**: Disable only if using only proprietary data and never want public data -- **Can be changed later**: Update via Infrastructure Manager (background process, no downtime) +**Before you start, have your Data Commons API key ready** (required for all deployments). ### Step 3: Review and Deploy -**Review Deployment Configuration** - -Before submitting: - -1. Scroll to the top of the form -2. Review each section: - - Deployment name is descriptive - - Project ID is correct - - GKE cluster name and location match your cluster - - CloudSQL tier is appropriate for your use case - - GCS bucket name is globally unique - - Data Commons API key is correct -3. If any field is wrong, click on the section and correct it - -**Accept Terms of Service** - -1. Scroll to the bottom of the form -2. Read and check the checkbox: "I accept the Google Cloud Marketplace Terms" -3. Check any additional terms specific to Data Commons - -**Click Deploy** - -1. Click the **Deploy** button -2. A progress indicator appears -3. **Do not close the browser tab** during deployment - -### Step 4: Monitor Deployment Progress - -The deployment takes 10–15 minutes. Here's what's happening: - -1. **Infrastructure Manager creates resources** (2–3 minutes): - - CloudSQL instance being provisioned - - GCS bucket being created - - Service account being created - - VPC network connections being established - -2. **Terraform applies configuration** (3–5 minutes): - - Resources are configured with your settings - - Workload Identity bindings are created - - Kubernetes secrets are created - -3. **Helm deploys application** (3–5 minutes): - - Container images are pulled from registry - - Application pods are scheduled on your GKE cluster - - Readiness probes verify the application is healthy - -**To monitor progress:** - -1. You should see a progress page in your browser showing deployment status -2. Alternatively, go to **Infrastructure Manager** > **Deployments** -3. Click on your deployment name -4. View the status indicator: - - **Creating**: Deployment in progress - - **Active**: Deployment succeeded (you can now access the application) - - **Error**: Something failed (see Troubleshooting section) - -**If deployment fails:** - -1. Click on the deployment in Infrastructure Manager -2. Click **Details** or **Logs** -3. Review the error message -4. Common issues and solutions are in the [Troubleshooting](#troubleshooting) section - ---- - -## Accessing Your Deployment - -After successful deployment, you can access your Data Commons Accelerator application. - -### View Deployment Outputs - -Deployment outputs contain important information needed to access and manage your application. - -**To view outputs:** - -1. Go to **Infrastructure Manager** > **Deployments** in GCP Console -2. Click your deployment name -3. Click the **Outputs** tab -4. Review the outputs (see descriptions below) - -**Key Outputs:** - -| Output | Purpose | Example | -|--------|---------|---------| -| `namespace` | Kubernetes namespace where app is deployed | `datacommons` | -| `helm_release_name` | Helm release identifier | `datacommons-123abc` | -| `cloudsql_instance_name` | Database instance name | `datacommons-db-xyz123` | -| `cloudsql_connection_name` | Database connection string | `my-project:us-central1:datacommons-db-xyz123` | -| `gcs_bucket_name` | Cloud Storage bucket name | `data-datacommons-prod` | -| `workload_service_account` | Service account for the application | `datacommons-sa@my-project.iam.gserviceaccount.com` | -| `gke_cluster_name` | Your GKE cluster name | `prod-cluster` | -| `gke_cluster_location` | Cluster location | `us-central1` | -| `next_steps` | Post-deployment instructions | (Detailed instructions) | - -### Verify Deployment +Once you've completed all sections: -Before accessing the application, verify that all components are running correctly: +1. **Review your selections** by scrolling through the form +2. **Accept the terms** by checking the Terms checkbox +3. **Click the Deploy button** -**Check Pods** +Deployment takes approximately **15-20 minutes**. A progress indicator will appear. -```bash -# Check that application pods are running -kubectl get pods -n datacommons - -# Expected output: All pods should show status "Running" -``` - -**Check Services** - -```bash -# Verify the service exists -kubectl get services -n datacommons - -# Note: Service should be ClusterIP type -``` - -**View Application Logs** - -```bash -# Check for any errors in the application logs -kubectl logs -n datacommons -l app=datacommons --tail=50 -``` +> [!WARNING] +> **Do not close the browser tab** during deployment. Closing it may interrupt the provisioning process. -**Check Database** +When the status shows **"Active"**, your deployment is complete. Proceed to the next section for accessing your application. -1. Go to **Cloud SQL** > **Instances** in GCP Console -2. Click your instance -3. **Status** should show **Available** -4. Verify initialization is complete (takes 2–3 minutes after deployment) +### Step 4: Access Your Deployment -### Access Application for Testing +> [!TIP] +> After deployment completes, useful commands and resource information are available in the deployment details: +> [Infrastructure Manager Deployments](https://console.cloud.google.com/infra-manager/deployments) > your deployment > **Outputs** tab. +> +> The Outputs tab contains ready-to-use commands for connecting to your cluster, port-forwarding, uploading data, and viewing logs — you can copy and run them directly. -**Important Security Note** +#### Quick Access via Cloud Shell (Recommended) -For security reasons, the application is deployed as a ClusterIP service without external access by default. This ensures your deployment is secure until you explicitly choose an exposure method. +The easiest way to access your deployment—no local tools needed: -**Quick Verification with Port-Forward** +1. Go to **GKE** > click your cluster > click **Connect** +2. Click **Run in Cloud Shell** +3. Run the port-forward command: + ```bash + until kubectl port-forward -n [NAMESPACE] svc/datacommons 8080:8080; do echo "Port-forward crashed. Respawning..." >&2; sleep 1; done + ``` + Replace `[NAMESPACE]` with your deployment name — the namespace matches your deployment name. +4. In the Cloud Shell toolbar, click **Web Preview** > **Preview on port 8080** -For quick testing and verification of your deployment, use `kubectl port-forward` to create a temporary tunnel from your local machine to the application: +#### Local Access via kubectl -```bash -# Forward local port 8080 to the application service -kubectl port-forward -n datacommons svc/datacommons 8080:80 - -# Keep this terminal open; the port-forward will run in the foreground -# In another terminal, test the application: -curl http://localhost:8080 +If you have `gcloud` and `kubectl` installed locally: -# Or open http://localhost:8080 in your browser -``` +1. Configure kubectl: + ```bash + gcloud container clusters get-credentials [CLUSTER_NAME] --location=[LOCATION] --project=[PROJECT_ID] + ``` +2. Port-forward: + ```bash + until kubectl port-forward -n [NAMESPACE] svc/datacommons 8080:8080; do echo "Port-forward crashed. Respawning..." >&2; sleep 1; done + ``` +3. Open in your browser -The application is NOT exposed to the internet with this method—you can only access it from your local machine. +#### Production Access -**For Production Access** +For external access, follow: +**[GCP Guide: Exposing Applications in GKE](https://docs.cloud.google.com/kubernetes-engine/docs/how-to/exposing-apps)** -To expose your application for production use, you must explicitly choose an exposure method and implement appropriate security controls. Follow the official GCP documentation: +#### Next Steps -**[GCP Guide: Exposing Applications in GKE](https://docs.cloud.google.com/kubernetes-engine/docs/how-to/exposing-apps)** +Once your application is running, see the [User Guide](USER_GUIDE.md) for instructions on logging in, uploading custom data, and using the dashboards. --- ## Using Data Commons Accelerator -### Key Features - -**Statistical Data Explorer** - -- Browse curated datasets from public sources (census data, economic indicators, health statistics) -- Filter by country, region, or time period -- Compare data across different dimensions - -**Place Explorer** - -- Geographic data visualization (maps, charts) -- Population and demographic analysis by location -- Economic indicators by region - -**Timeline Tool** - -- Time-series analysis for tracking trends -- Historical data visualization -- Forecasting and trend analysis (if enabled) - -**Natural Language Queries** (if enabled) +For detailed instructions on configuring and using your deployed instance, see the **[User Guide](USER_GUIDE.md)**. -- Ask questions in plain English -- Example queries: - - "What is the population of California?" - - "Show me healthcare spending by country" - - "List the top 10 countries by GDP" +For additional resources, refer to the official Data Commons documentation: -**Custom Datasets** - -- Import your own data -- Combine with public Data Commons data -- Share datasets with team members - -**Data Export** - -- Download query results in CSV, JSON, or other formats -- Export to Cloud Storage for analysis in other tools -- Build API integrations with external systems - -### Learning Resources - -To get started using Data Commons: - -- **Official Tutorials**: https://datacommons.org/tutorials -- **API Documentation**: https://docs.datacommons.org/api -- **Knowledge Graph Explorer**: https://datacommons.org/ (official site for learning about available data) -- **Custom Data Commons Guide**: https://docs.datacommons.org/custom_dc/ - -### Common Use Cases - -**Government Agencies** -- Publish open data to citizens -- Combine multiple datasets for policy analysis -- Track public health and economic indicators - -**Non-Profit Organizations** -- Research community statistics -- Track progress toward social impact goals -- Share data insights with stakeholders - -**Researchers and Academics** -- Access curated datasets for studies -- Combine public data with primary research -- Teach data analysis and visualization - -**Businesses** -- Market research using demographic data -- Competitive analysis by geography -- Economic trend analysis for strategy +- [Custom Data Commons Documentation](https://docs.datacommons.org/custom_dc/) +- [Data Commons API Documentation](https://docs.datacommons.org/api) +- [Knowledge Graph Browser](https://datacommons.org/browser) --- ## Troubleshooting -### Deployment Fails During Provisioning - -**Symptoms:** -- Deployment shows "Error" status -- Infrastructure Manager shows failure message - -**Common causes and solutions:** - -**Issue: "GKE cluster not found"** -- Verify cluster name is spelled correctly -- Verify cluster is in the same project -- Go to **Kubernetes Engine** > **Clusters** to confirm cluster exists - -**Issue: "Insufficient permissions"** -- You lack required IAM roles -- Ask your GCP administrator to grant: - - Kubernetes Engine Admin - - Cloud SQL Admin - - Storage Admin - - Service Account Admin - -**Issue: "Private Service Access not configured"** -- VPC doesn't have Private Service Access enabled -- Ask your GCP administrator to configure it -- Or follow the [GCP Private Service Access guide](https://docs.cloud.google.com/sql/docs/mysql/configure-private-services-access) - -**Issue: "CloudSQL quota exceeded"** -- Your project has created too many CloudSQL instances -- Delete unused instances, or request quota increase -- Go to **Quotas & System Limits** to view and request increases - -### Application Pods Don't Start - -**Symptoms:** -- Pods show **Pending** or **CrashLoopBackOff** status -- Application URL doesn't load - -**To diagnose:** - -1. Go to **Kubernetes Engine** > **Workloads** -2. Filter by namespace: `datacommons` -3. Click the deployment -4. Click a pod name -5. **Logs** tab shows error messages - -**Common issues:** - -**"ImagePullBackOff"** or **"ErrImagePull"** -- Container images can't be pulled from registry -- Likely: Wrong API key or image tag -- Solution: Verify the `dc_api_key` in the deployment form is correct -- If wrong, contact support for instructions to update the secret +### Deployment Logs -**"CrashLoopBackOff"** -- Application is crashing immediately after starting -- Check logs (click pod > **Logs** tab) -- Common causes: - - Database not reachable (wait 2–3 minutes for CloudSQL to initialize) - - Invalid configuration in Kubernetes secrets - - Insufficient memory (try upgrading `resource_tier`) +1. Go to **Solution deployments** +2. Click the three-dot menu (⋮) next to your deployment +3. Select **Cloud Build log** +4. Review the Terraform execution log for provisioning errors -**"Pending" (pods won't start)** -- Cluster doesn't have capacity -- Check **Kubernetes Engine** > **Nodes** for available capacity -- If nodes are full, add more nodes to the cluster or delete other workloads +**Common deployment errors:** -### Can't Access Application from Browser +- **"Insufficient permissions"** — Check [Required IAM Roles](#required-iam-roles) +- **"PSA not configured"** — See [Private Connectivity Issues](#private-connectivity-issues) below -**Symptoms:** -- Browser shows "Connection refused" or "ERR_CONNECTION_REFUSED" -- Or shows "Host unreachable" -- Port-forward fails to connect +### Pod Status and Logs -**Troubleshooting Steps:** +**GKE Console:** Kubernetes Engine > Workloads > filter by your deployment namespace (namespace matches your deployment name) -**1. Verify Application Pods Are Running** +**Quick diagnostics from Cloud Shell:** ```bash -# Check pod status -kubectl get pods -n datacommons - -# All pods should show "Running" status -# If any show "Pending" or "CrashLoopBackOff", see "Application Pods Don't Start" section -``` - -**2. Check the Service Exists** - -```bash -# Verify service is created -kubectl get services -n datacommons - -# Service should exist and be type "ClusterIP" by default -# By default, the application is NOT exposed externally for security +kubectl get pods -n [NAMESPACE] +kubectl describe pod [POD_NAME] -n [NAMESPACE] +kubectl logs -n [NAMESPACE] -l app.kubernetes.io/name=datacommons ``` -**3. Test Access with Port-Forward** +**Common pod issues:** -```bash -# Create a temporary tunnel to test the application -kubectl port-forward -n datacommons svc/datacommons 8080:80 - -# In another terminal, verify connectivity -curl http://localhost:8080/healthz - -# If this works, the application is running correctly -# Issue is with your exposure method (see below) -``` +- **Pending** — Cluster needs more capacity +- **CrashLoopBackOff** — Check logs; often the database is still initializing (wait 2–3 min) +- **ImagePullBackOff** — Verify your Data Commons API key is correct -**4. Review Application Logs** +### Private Connectivity Issues ```bash -# Check for errors in application logs -kubectl logs -n datacommons -l app=datacommons --tail=100 - -# Look for error messages that might indicate misconfiguration -# Check for database connection errors (usually appear in first log entries) +# Check existing private connectivity ranges +gcloud compute addresses list --global --filter="purpose=VPC_PEERING" ``` -**Important Note** +- **"Couldn't find free blocks"** — Select the /16 range option for more IP addresses +- **"Peering already exists"** — Select "Use my existing range" and enter your existing range name -By default, the application is deployed as a **ClusterIP service**. This is secure by default. You must explicitly expose it using: -- `kubectl port-forward` (for testing) -- Ingress (for production with TLS) -- LoadBalancer service (for direct external IP) -- Other methods described in the [GCP Exposing Applications guide](https://docs.cloud.google.com/kubernetes-engine/docs/how-to/exposing-apps) +### Port-Forward Connection Refused -If port-forward works but browser access doesn't, the issue is with your exposure method, not the application itself. +**Error:** -### Private Service Access Issues - -**Error: "Failed to create subnetwork. Couldn't find free blocks"** - -**Cause:** The allocated IP range is exhausted or conflicts with existing ranges. - -**Solution:** -1. Use a larger prefix length: Set `psa_range_prefix_length: 16` (65k IPs) -2. Or use existing PSA: Set `create_psa_connection: false` and provide existing range - -**Error: "Cannot modify allocated ranges" or "Peering already exists"** - -**Cause:** PSA connection already exists with different configuration. - -**Solution:** -1. Set `create_psa_connection: false` -2. Find your existing range: `gcloud compute addresses list --global --filter="purpose=VPC_PEERING"` -3. Provide the range name in `existing_psa_range_name` - -**Error: "Address 'xxx' is not in VPC 'yyy'"** - -**Cause:** The provided PSA range is in a different VPC than your GKE cluster. - -**Solution:** -1. Verify your GKE cluster's VPC: `gcloud container clusters describe CLUSTER --format="value(network)"` -2. Find PSA ranges in that VPC: `gcloud compute addresses list --global --filter="purpose=VPC_PEERING AND network~YOUR_VPC"` -3. Provide the correct range name +```text +E0206 portforward.go:424 "Unhandled Error" err="an error occurred forwarding 8080 -> 8080: connection refused" +``` -**Verifying PSA Configuration** +> [!NOTE] +> This is expected behavior, not a critical error. The connection drops when the application receives too many concurrent requests — for example, opening the `/explore` page which loads many data widgets simultaneously. It can also occur during pod startup while the application is initializing. -Check if PSA is properly configured: -```bash -# List PSA IP ranges -gcloud compute addresses list --global --filter="purpose=VPC_PEERING" +**Fix:** -# List PSA connections -gcloud services vpc-peerings list --network=YOUR_VPC_NAME - -# Verify CloudSQL can use the range -gcloud sql instances describe INSTANCE_NAME --format="value(ipConfiguration.privateNetwork)" -``` +1. If using the auto-retry loop (`until kubectl port-forward ...`), it will reconnect automatically +2. If running a single port-forward, simply re-run the command +3. If the error persists, check pod status: `kubectl get pods -n [NAMESPACE]` — ensure the pod is `Running` with `1/1` Ready --- ## Deleting Your Deployment -If you no longer need the Data Commons Accelerator, you can delete it to stop incurring costs. - -### Delete via Infrastructure Manager - -1. Go to **Infrastructure Manager** > **Deployments** -2. Click your deployment name -3. Click the **Delete** button -4. Confirm the deletion - -**Duration:** 2–5 minutes - -### What Gets Deleted - -These resources are automatically deleted: - -- Kubernetes namespace and all pods -- Helm release -- IAM service account bindings -- Kubernetes secrets - -### What Persists (Manual Cleanup Required) - -**Important:** These resources are NOT automatically deleted to prevent accidental data loss. You must delete them manually. - -**CloudSQL Instance** - -- Contains your database and datasets -- **To delete**: - 1. Go to **Cloud SQL** > **Instances** - 2. Click your instance - 3. Click **Delete** button - 4. Type the instance name to confirm - 5. Click **Delete** -- **Cost if left running**: ~$50–200/month depending on tier - -**Cloud Storage Bucket** - -- Contains your data exports and custom datasets -- **To delete**: - 1. Go to **Cloud Storage** > **Buckets** - 2. Click your bucket - 3. Click **Delete bucket** button - 4. Type the bucket name to confirm - 5. Click **Delete** -- **Warning**: Deletion is permanent if versioning not enabled -- **Cost if left running**: ~$0.015–0.020 per GB per month - -**Service Account** - -- Created for Workload Identity authentication -- **To delete** (optional): - 1. Go to **IAM & Admin** > **Service Accounts** - 2. Find the service account (name contains `datacommons`) - 3. Click the account - 4. Click **Delete** button - 5. Confirm deletion -- **Cost if left running**: Free (service accounts have no direct cost) - -### Before Deleting - -**Backup Important Data** - -1. **Export from CloudSQL**: - - Go to **Cloud SQL** > **Instances** > [your instance] - - Click **Export** button - - Choose export format (CSV, JSON, SQL dump) - - Export to GCS bucket (if you're keeping the bucket) - -2. **Download from GCS**: - - Go to **Cloud Storage** > **Buckets** > [your bucket] - - Select files you want to keep - - Click **Download** to save locally - -### After Deletion - -**Cost Implications:** - -- If you deleted the Infrastructure Manager deployment but kept CloudSQL and GCS: - - You continue paying ~$50–200/month for CloudSQL - - You continue paying for GCS storage -- **Delete CloudSQL and GCS** to stop all charges +> [!IMPORTANT] +> Delete your deployment when no longer needed to stop incurring costs for the database, Kubernetes workloads, and Cloud Storage. -**Redeployment:** +1. Go to [Google Cloud Console](https://console.cloud.google.com) +2. Search for "Solution deployments" +3. Find your deployment and click the **three-dot menu** (⋮) +4. Click **Delete** +5. Confirm the deletion -- CloudSQL and GCS data are deleted when you delete those resources -- To deploy again, you'll start from scratch -- The Infrastructure Manager deployment can be deleted and redeployed (you have the configuration saved) +**What gets deleted:** All resources provisioned by this deployment — the database, Cloud Storage bucket, Kubernetes workloads, service account, and IAM bindings. If the deployment created a new GKE cluster and network, those are deleted as well. diff --git a/docs/MARKETPLACE_FIELDS.md b/docs/MARKETPLACE_FIELDS.md index 3fd7c46..4283c3b 100644 --- a/docs/MARKETPLACE_FIELDS.md +++ b/docs/MARKETPLACE_FIELDS.md @@ -4,387 +4,55 @@ ## Form Overview -The Marketplace deployment form is organized into five sections. Fields are listed below in the order they appear in the form. +The deployment form has **5 fields** across **2 sections**. A new GKE cluster is created automatically in your selected region. --- -## 1. Kubernetes Cluster +## Deployment Name -Configure your existing GKE cluster for Data Commons Accelerator deployment. - -| Field | Type | Required | Default | Description | -|-------|------|----------|---------|-------------| -| **GKE Cluster Location** | Location selector | Yes | — | Region or zone where your GKE cluster runs | -| **GKE Cluster Name** | Cluster selector | Yes | — | Your existing GKE cluster | - -### GKE Cluster Location - -**Description:** Select the region or zone where your existing GKE cluster is located. The Marketplace UI defaults to showing the "Regional" tab. - -**Details:** -- This must match the actual location of your GKE cluster -- CloudSQL will be automatically deployed in the same region -- Choose a region close to your users to minimize latency - -**Validation:** Must be a valid GCP region or zone. - ---- - -### GKE Cluster Name - -**Description:** Select your existing GKE cluster from the dropdown. The list shows only clusters in the location you selected above. - -**Details:** -- You must have an existing GKE cluster before deploying -- The cluster must have sufficient resources for Data Commons workloads -- Workload Identity must be enabled on the cluster for GCP service integration - -**Validation:** Must be an existing GKE cluster in the selected location. - ---- - -## 2. CloudSQL Database - -MySQL database configuration for Data Commons Accelerator metadata and state. CloudSQL is automatically deployed in the same region as your GKE cluster with private IP connectivity. - -| Field | Type | Required | Default | Description | -|-------|------|----------|---------|-------------| -| **CloudSQL Instance Tier** | Dropdown | Yes | `db-n1-standard-1` | Machine type for CloudSQL instance | -| **CloudSQL Disk Size (GB)** | Number | Yes | `20` | Initial disk size in GB | -| **Enable High Availability** | Boolean | Yes | `false` | Enable CloudSQL HA for production | -| **Private Service Access Configuration** | Dropdown | Yes | `create_20` | PSA range configuration | -| **Existing PSA Range Name** | String | Conditional | — | Name of existing PSA IP range | - -### CloudSQL Instance Tier - -**Description:** Machine tier for the CloudSQL MySQL instance with private IP connectivity. Choose based on workload requirements. - -**Default:** `db-n1-standard-1` (3.75GB RAM, 1 vCPU) - -**Valid Options:** - -| Option | Value | RAM | vCPU | Use Case | -|--------|-------|-----|------|----------| -| Micro - 0.6GB RAM (dev/test only) | `db-f1-micro` | 0.6GB | Shared | Development/testing only | -| Small - 1.7GB RAM, 1 vCPU | `db-g1-small` | 1.7GB | 1 | Small dev/test environments | -| Standard-1 - 3.75GB RAM, 1 vCPU | `db-n1-standard-1` | 3.75GB | 1 | Small production workloads | -| Standard-2 - 7.5GB RAM, 2 vCPU (recommended) | `db-n1-standard-2` | 7.5GB | 2 | **Recommended for production** | -| Standard-4 - 15GB RAM, 4 vCPU | `db-n1-standard-4` | 15GB | 4 | High-traffic production | -| Standard-8 - 30GB RAM, 8 vCPU | `db-n1-standard-8` | 30GB | 8 | Very high-traffic (100+ users) | - -**Recommendations:** -- **Dev/Test:** `db-f1-micro` or `db-g1-small` -- **Production:** `db-n1-standard-2` or higher -- **High-traffic:** `db-n1-standard-4` or `db-n1-standard-8` (100+ concurrent users) - -**Note:** Micro and Small tiers are NOT recommended for production use. +| Field | Default | Description | +|-------|---------|-------------| +| **Deployment Name** | — | A unique name for this deployment (2-18 characters). Used to generate resource names for the cluster, database, and storage. | --- -### CloudSQL Disk Size (GB) - -**Description:** Initial disk size in GB for the CloudSQL instance. CloudSQL will automatically grow the disk if needed. - -**Type:** Number -**Default:** `20` -**Minimum:** `10` (MySQL requirement) +## GCP Region -**Details:** -- Storage auto-grows when needed, so start conservatively -- You can adjust this later if needed -- Storage costs scale with size - -**Validation:** Must be a positive integer >= 10. -**Regex Pattern:** `^[1-9][0-9]*$` - -**Recommendations:** -- **Dev/Test:** 10-20 GB -- **Production:** 20-50 GB (depending on dataset size) -- **Large datasets:** 100+ GB +| Field | Default | Description | +|-------|---------|-------------| +| **GCP Region** | us-central1 | The GCP region where the cluster and all cloud resources will be created. | --- -### Enable High Availability - -**Description:** Enable CloudSQL high availability for production workloads. Creates a standby instance in a different zone for automatic failover. - -**Type:** Boolean -**Default:** `false` -**Recommended for production:** `true` - -**Details:** -- **Enabled:** Creates a standby replica in a different zone within the same region -- **Disabled:** Single instance (no automatic failover) -- HA provides automatic failover with minimal downtime -- HA increases cost (standby instance + synchronous replication) - -**Use Cases:** -- **Enable (true):** Production workloads requiring high availability and automatic failover -- **Disable (false):** Development/testing environments, cost-sensitive non-critical workloads - ---- - -### Private Service Access Configuration - -**Description:** Choose how to configure Private Service Access (PSA) for CloudSQL private IP connectivity. - -**Type:** Dropdown -**Default:** `create_20` (Create new /20 range) -**Required:** Yes - -**Valid Options:** - -| Option | Value | IP Count | Use Case | -|--------|-------|----------|----------| -| Create new /20 range (4,096 IPs - recommended) | `create_20` | 4,096 | **Recommended** - Standard production | -| Create new /24 range (256 IPs - dev/test) | `create_24` | 256 | Development/testing | -| Create new /16 range (65,536 IPs - large deployments) | `create_16` | 65,536 | Very large multi-deployment environments | -| Use my existing PSA range (enter name below) | `existing` | — | VPCs with existing PSA configuration | - -**⚠️ CRITICAL WARNING:** - -"Create new" options should **ONLY** be used on VPCs with **NO existing PSA configuration**. - -Creating a new range on a VPC with existing PSA will **REPLACE all existing reserved peering ranges**, which may disrupt connectivity for other services using Private Service Access (CloudSQL, Cloud Composer, etc.). - -**When to use each option:** - -1. **`create_20` (default)** - Use for first PSA deployment on a VPC, standard production workloads -2. **`create_24`** - Use for dev/test on a dedicated VPC with no other PSA services -3. **`create_16`** - Use for large environments with many CloudSQL instances planned -4. **`existing`** - **Use this if your VPC already has PSA configured** (requires "Existing PSA Range Name" below) - -**How to check if your VPC has existing PSA:** -```bash -gcloud compute addresses list --global --filter="purpose=VPC_PEERING" --format="value(name)" -``` - -If this returns results, select **"Use my existing PSA range"** and enter the name below. - ---- - -### Existing PSA Range Name - -**Description:** Name of your existing Private Service Access IP range. Required when you selected "Use my existing PSA range" above. - -**Type:** String -**Default:** — (empty) -**Required:** Only if `psa_range_configuration = existing` -**Placeholder:** `google-managed-services-default` - -**Details:** -- This field is ignored unless you selected "Use my existing PSA range" above -- Enter the exact name of your existing PSA IP allocation -- Common names: `google-managed-services-default`, `cloudsql-private-ip-range` - -**How to find your PSA range name:** -```bash -gcloud compute addresses list --global \ - --filter="purpose=VPC_PEERING" \ - --format="value(name)" -``` - -**Example values:** -- `google-managed-services-default` -- `cloudsql-private-ip-range` -- `private-service-access` - ---- +## Section 0: Application Settings -## 3. Cloud Storage - -GCS bucket configuration for Data Commons Accelerator data and custom datasets. - -| Field | Type | Required | Default | Description | -|-------|------|----------|---------|-------------| -| **GCS Bucket Location** | Dropdown | Yes | `US` | Storage location for your bucket | -| **GCS Storage Class** | Dropdown | Yes | `STANDARD` | Storage class (cost vs access speed) | - -### GCS Bucket Location - -**Description:** Storage location for your Data Commons bucket. Choose between multi-region (higher availability) or single-region (lower latency and cost). - -**Type:** Dropdown -**Default:** `US` (Multi-region) -**Required:** Yes - -**Valid Options:** - -**Multi-Region:** -| Option | Value | Description | -|--------|-------|-------------| -| US (Multi-region) | `US` | United States multi-region (highest availability) | -| EU (Multi-region) | `EU` | European Union multi-region | -| ASIA (Multi-region) | `ASIA` | Asia-Pacific multi-region | - -**Single-Region (Americas):** -| Option | Value | Description | -|--------|-------|-------------| -| us-central1 | `us-central1` | Iowa, USA | -| us-east1 | `us-east1` | South Carolina, USA | -| us-west1 | `us-west1` | Oregon, USA | - -**Single-Region (Europe):** -| Option | Value | Description | -|--------|-------|-------------| -| europe-west1 | `europe-west1` | Belgium | -| europe-west3 | `europe-west3` | Frankfurt, Germany | - -**Single-Region (Asia):** -| Option | Value | Description | -|--------|-------|-------------| -| asia-southeast1 | `asia-southeast1` | Singapore | -| asia-northeast1 | `asia-northeast1` | Tokyo, Japan | - -**💡 TIP:** Choose a location that matches your GKE cluster region to minimize latency and data transfer costs. - -**Examples:** -- GKE cluster in `us-central1` → choose `US` or `us-central1` -- GKE cluster in `europe-west3` → choose `EU` or `europe-west1` -- GKE cluster in `asia-southeast1` → choose `ASIA` or `asia-southeast1` - -**Cost & Performance Trade-offs:** -- **Multi-region:** Higher availability, higher cost, geo-redundant -- **Single-region:** Lower cost, lower latency within region, single-region redundancy - ---- - -### GCS Storage Class - -**Description:** Storage class determines cost and access speed. Choose based on how frequently you'll access your data. - -**Type:** Dropdown -**Default:** `STANDARD` -**Required:** Yes - -**Valid Options:** - -| Option | Value | Use Case | Access Pattern | -|--------|-------|----------|----------------| -| Standard - Frequent access | `STANDARD` | Production workloads | Frequent/real-time access | -| Nearline - Monthly access | `NEARLINE` | Infrequent access | ~1x/month | -| Coldline - Quarterly access | `COLDLINE` | Archival data | ~1x/quarter | -| Archive - Long-term storage | `ARCHIVE` | Long-term archival | Rarely accessed | - -**Recommendations:** -- **`STANDARD`** - Recommended for most Data Commons deployments (active datasets) -- **`NEARLINE`** - For backup datasets accessed occasionally -- **`COLDLINE`** - For compliance/archival datasets -- **`ARCHIVE`** - For long-term retention with rare access - -**Note:** Lower-tier classes have retrieval costs and minimum storage durations (e.g., Nearline = 30 days minimum). - ---- - -## 4. API Keys - -API keys required for Data Commons integration. - -| Field | Type | Required | Default | Description | -|-------|------|----------|---------|-------------| -| **Data Commons API Key** | String | Yes | — | API key for Data Commons API access | - -### Data Commons API Key - -**Description:** API key for accessing Data Commons APIs. Required for the application to function. - -**Type:** String (sensitive) -**Required:** Yes -**Placeholder:** `...` - -**Where to get it:** -Follow the instructions at: [https://docs.datacommons.org/custom_dc/quickstart.html#get-a-data-commons-api-key](https://docs.datacommons.org/custom_dc/quickstart.html#get-a-data-commons-api-key) - -**Format:** -Alphanumeric string with underscores and hyphens. - -**Security:** -- Stored as Kubernetes Secret in your cluster -- Rotatable after deployment by updating the secret - ---- - -## 5. Application Settings - -Data Commons Accelerator application configuration and resource allocation. - -| Field | Type | Required | Default | Description | -|-------|------|----------|---------|-------------| -| **Resource Tier** | Dropdown | Yes | `medium` | Resource allocation for pods | -| **Domain Template** | Dropdown | Yes | — | Pre-built domain configuration | -| **Application Replicas** | Number | Yes | `1` | Number of application replicas | +| Field | Default | Description | +|-------|---------|-------------| +| **Resource Tier** | Medium | Controls how much CPU and memory the application gets, and the size of the database | +| **Domain Template** | Health | Pre-built configuration optimized for your domain | ### Resource Tier -**Description:** Resource allocation tier for Data Commons Accelerator pods. Determines CPU and memory limits. - -**Type:** Dropdown -**Default:** `medium` (recommended) -**Required:** Yes - -**Valid Options:** - -| Option | Value | RAM | CPU | Use Case | -|--------|-------|-----|-----|----------| -| Small - 2Gi RAM, 1 CPU | `small` | 2Gi | 1 | Development, small datasets | -| Medium - 4Gi RAM, 2 CPU (recommended) | `medium` | 4Gi | 2 | **Recommended for production** | -| Large - 8Gi RAM, 4 CPU | `large` | 8Gi | 4 | Large datasets, high concurrency | - -**Recommendations:** -- **`small`** - Dev/test only, small datasets -- **`medium`** - Standard production deployments (recommended starting point) -- **`large`** - High-traffic production, large datasets (>100GB), many concurrent users - -**Note:** Ensure your GKE cluster has sufficient capacity for the selected tier × number of replicas. - ---- +| Option | Memory | CPU | Replicas | Database size | High availability | +|--------|--------|-----|----------|---------------|-------------------| +| Small | 2 GB | 1 core | 1 | Standard | No | +| Medium (recommended) | 4 GB | 2 cores | 2 | Standard | No | +| Large | 8 GB | 4 cores | 3 | Large | Yes | ### Domain Template -**Description:** Select a pre-built Data Commons configuration optimized for a specific domain. Each domain includes curated datasets, statistical variables, and visualizations tailored to that subject area. Choose the domain that best matches your use case. - -**Type:** Dropdown -**Required:** Yes - -**Valid Options:** - -| Option | Value | Description | -|--------|-------|-------------| -| Education (education related data) | `education` | Pre-configured for education-related datasets (schools, enrollment, outcomes) | -| Health (health related data) | `health` | Pre-configured for health-related datasets (epidemiology, healthcare) | -| Energy (energy related data) | `energy` | Pre-configured for energy-related datasets (consumption, generation, emissions) | +| Option | Best for | +|--------|----------| +| Health | Health and epidemiology data | +| Education | School, enrollment, and outcomes data | +| Energy | Energy consumption and generation data | -**Note:** You can customize any template after deployment. The template just provides a starting point. +You can customize the template after deployment. --- -### Application Replicas (Advanced) - -**Description:** Number of Data Commons Accelerator application replicas for high availability and load distribution. - -**Type:** Number -**Default:** `1` -**Required:** Yes -**Level:** 1 (Advanced) - -**Valid Range:** 1-10 - -**Details:** -- **1 replica** - Single instance (no HA, suitable for dev/test) -- **2-3 replicas** - High availability with load balancing (recommended for production) -- **4+ replicas** - High-traffic production with automatic scaling - -**Recommendations:** -- **Dev/Test:** 1 replica -- **Production:** 2-3 replicas (for HA and rolling updates) -- **High-traffic:** 4+ replicas - -**Capacity Planning:** -- Total resource usage = resource_tier × app_replicas -- Example: `medium` tier (4Gi RAM, 2 CPU) × 3 replicas = 12Gi RAM, 6 CPU total -- Ensure your GKE cluster has sufficient capacity +## Section 1: API Keys -**Note:** Multiple replicas provide: -- High availability (if one pod fails, others continue serving) -- Load distribution across pods -- Zero-downtime rolling updates +| Field | Default | Description | +|-------|---------|-------------| +| **Data Commons API Key** | — | Required for the application to access Data Commons data. Get yours at [docs.datacommons.org](https://docs.datacommons.org/custom_dc/quickstart.html#get-a-data-commons-api-key). | diff --git a/docs/USER_GUIDE.md b/docs/USER_GUIDE.md index 1a8a174..b5d2fa1 100644 --- a/docs/USER_GUIDE.md +++ b/docs/USER_GUIDE.md @@ -2,200 +2,248 @@ This guide explains how to access, configure, and use your Custom Data Commons instance deployed via the Google Cloud Marketplace. -# Getting Started +--- -To configure the User Interface of the landing page, upload your company logo and private data you need to login as custom Data Commons Administrator. This is applicable to all domains. +## Table of Contents -## Retrieve Administrator Credentials +1. [Getting Started](#getting-started) +2. [Data Commons for Education](#data-commons-for-education) +3. [Data Commons for Health](#data-commons-for-health) +4. [Data Commons for Energy](#data-commons-for-energy) +5. [Known Limitations](#known-limitations) +6. [Request Support](#request-support) + +--- + +## Getting Started + +To configure the landing page, upload your company logo, and manage private data, you need to log in as the Data Commons Administrator. The steps below apply to all domain templates (Education, Health, Energy). + +> [!TIP] +> For deployment and initial setup instructions, see the [Deployment Guide](DEPLOYMENT_GUIDE.md). + +### Retrieve Administrator Credentials The application administrator password is not provided in the deployment outputs for security reasons. To retrieve your initial credentials: +> [!TIP] +> These commands are available pre-populated with your deployment values in the [Infrastructure Manager Deployments](https://console.cloud.google.com/infra-manager/deployments) > your deployment > **Outputs** tab. You can copy and run them directly. + 1. **Connect to your cluster** via Cloud Shell: - ```bash - gcloud container clusters get-credentials [CLUSTER_NAME] --region [REGION] + ```bash + gcloud container clusters get-credentials [CLUSTER_NAME] --region [REGION] ``` -2. **Run the secret retrieval command**: +2. **Run the secret retrieval command**: ```bash - kubectl get secret [DEPLOYMENT_NAME] -n [NAMESPACE] -o json | jq -r '.data | to_entries[] | "\\(.key): \\(.value | @base64d)"' + echo 'Admin Username:' && kubectl get secret datacommons -n [NAMESPACE] -o jsonpath='{.data.ADMIN_PANEL_USERNAME}' | base64 -d && echo && echo 'Admin Password:' && kubectl get secret datacommons -n [NAMESPACE] -o jsonpath='{.data.ADMIN_PANEL_PASSWORD}' | base64 -d && echo ``` -## Administrator Log In + Replace `[CLUSTER_NAME]`, `[REGION]`, and `[NAMESPACE]` with your deployment values. The namespace matches your deployment name. -1. Navigate to your application URL. -e.g. `https://education.example.com/`. -2. To access the **Admin Panel**, append `/admin` to the URL: -e.g.`https://education.example.com/admin/` -3. Enter the username and password generated in the previous step. -4. Depending on your choice during deployment you will be logged in as an admin for one of the custom data Commons templates for different domains (education, health, energy etc). +### Administrator Log In -# Data Commons for Education +1. Navigate to your application URL (e.g., `https://education.example.com/`) +2. To access the **Admin Panel**, append `/admin` to the URL (e.g., `https://education.example.com/admin/`) +3. Enter the username and password retrieved in the previous step +4. You will be logged in as an administrator for the domain template selected during deployment (Education, Health, or Energy) -***Template: Student Recruitment Intelligence Center*** +### Upload Custom Data -## Overview +To populate the dashboard with your custom data: -The Education template combines your private applicant data with public demographic trends to help universities identify high-potential recruitment regions. +1. See [Prepare and load your own data](https://docs.datacommons.org/custom_dc/custom_data.html). +2. Ensure your data matches the required schema for your domain template. You can download a sample CSV directly from the application **Data & Files** tab and fill in your data there. +3. Log in and navigate to the **Admin Panel**. +4. Go to **Data & Files** tab. +5. Locate the **Data Upload** section. +6. Click **Choose File**, select your CSV, and click **Upload**. + - *Success*: You will see a "Rows successfully uploaded" message. + - *Error*: The system will indicate specific line/column issues. + +> [!NOTE] +> Large CSV files may take a few moments to process. The dashboard refreshes automatically after upload. + +> [!TIP] +> **Trigger data sync immediately** — After uploading your CSV, the data is synced to the database by a CronJob (`datacommons-db-sync`) that runs every 3 hours. To avoid waiting, trigger it manually: +> +> **Via GKE Console (recommended):** +> 1. Go to **Kubernetes Engine** > **Workloads** +> 2. Filter by your namespace (matches your deployment name) +> 3. Click the **datacommons-db-sync** CronJob +> 4. Click **Run now** in the top toolbar +> 5. Monitor progress in the **Events** and **Logs** tabs +> +> **Via kubectl:** +> ```bash +> kubectl create job --from=cronjob/datacommons-db-sync manual-sync -n [NAMESPACE] +> kubectl logs -n [NAMESPACE] -l job-name=manual-sync -f +> ``` -## For Administrators +### Customize User Interface -### Prepare Custom Data +1. In the Admin Panel, navigate to **Theme Settings**. +2. **Organization Branding**: + - **Name:** Update the organization name displayed in the top bar. + - **Logo:** Upload a PNG image. -To populate the dashboard with your university's private data: +3. **Dashboard Text**: + - **Header Text:** Edit the main title (e.g., "Student Recruitment Intelligence Center"). + - **Hero Description:** Update the subtitle describing the purpose of your custom Data Commons. -1. See [Prepare and load your own data](https://docs.datacommons.org/custom_dc/custom_data.html). -2. Ensure your data matches the required schema for Education template. You can download a sample CSV directly from the application **Data & Files** tab and fill in your data there. +4. Click **Save Changes**. Updates are applied immediately. -### Upload Custom Data +### Data Security -To populate the dashboard with your university's private data: +Your data resides within a Google Cloud Storage bucket inside your dedicated GCP environment project. This setup allows you to control who can upload and manage the data that will be used for subsequent analytics. -1. Log in and navigate to the **Admin Panel**. -2. Go to **Data & Files** tab. -3. Locate **Applicant Data Upload** section. -4. Click **Choose File**, select your CSV, and click **Upload**. - *Success*: You will see a "Rows successfully uploaded" message. - *Error*: The system will indicate specific line/column issues. +> [!NOTE] +> Your data is kept private and is not shared with the public Data Commons. Data mixing and processing occur only on your deployed Custom Data Commons instance. -### Customize User Interface +--- -1. In the Admin Panel, navigate to **Theme Settings**. -2. **University Branding**: +## Data Commons for Education -* **Name:** Update the university name displayed in the top bar. -* **Logo:** Upload a PNG image. +***Template: Student Recruitment Intelligence Center*** -3. **Dashboard Text**: +### Education Overview -* **Header Text:** Edit the main title (e.g., "Student Recruitment Intelligence Center"). -* **Hero Description:** Update the subtitle describing the purpose of your custom Data Commons. +The Education template combines your private applicant data with public demographic trends to help universities identify high-potential recruitment regions. -4. Click **Save Changes**. Updates are applied immediately. +### Education: For Administrators -### Data Security +Your CSV data must match the Education template schema. Download a sample CSV from the **Data & Files** tab in the Admin Panel. -Your uploaded CSV data is stored securely within your Google Cloud SQL instance. It is combined with public Data Commons data only at the visualization layer and is not shared externally. +For upload instructions and UI customization, see [Getting Started](#getting-started). -## Data commons for Data Analysts & Researchers +### Education: For Data Analysts & Researchers -### Explore Recruitment Metrics +#### Explore Recruitment Metrics The dashboard provides a high-level view of your recruitment landscape: -* **Total Applicants:** Aggregated count for the target year. -* **Avg Opportunity Score:** A calculated metric indicating regional potential. -* **High Opportunity Markets:** Count of regions exceeding your target criteria. -* **Avg Household Income:** Public demographic data correlated with your target regions. +- **Total Applicants:** Aggregated count for the target year. +- **Avg Opportunity Score:** A calculated metric indicating regional potential. +- **High Opportunity Markets:** Count of regions exceeding your target criteria. +- **Avg Household Income:** Public demographic data correlated with your target regions. -### Interactive Maps +#### Interactive Maps The **Recruitment Potential by State** map visualizes where your applicants are coming from versus high-opportunity areas. -* **Hover:** Hover over a state to see specific applicant counts and opportunity scores. +- **Hover:** Hover over a state to see specific applicant counts and opportunity scores. + +#### Filtering & Deep Dives -### Filtering & Deep Dives +- **Filters:** Use the dropdowns at the top (e.g., Target Year) to filter all widgets on the page. +- **Standard Tools:** Click "Explore in Timeline Tool" on specific widgets to analyze the data using standard Data Commons graphing tools. -Filters: Use the dropdowns at the top (e.g., Target Year) to filter all widgets on the page. -Standard Tools: Click "Explore in Timeline Tool" on specific widgets to analyze the data using standard Data Commons graphing tools. +--- -# Data Commons for Health +## Data Commons for Health ***Template: Population Health & City Comparison*** -## Overview +### Health Overview The Health template allows organizations to compare specific health metrics (e.g., obesity, diabetes, smoking) across different cities, blending local private data with public CDC data. -## For Administrators +### Health: For Administrators -### Upload & Configuration +Your CSV must contain city-level health metrics formatted according to the Health template schema. -Follow the standard upload procedure outlined in the ***getting-started*** section. +For upload instructions and UI customization, see [Getting Started](#getting-started). -* **Data Requirement:** Ensure your CSV contains city-level health metrics formatted according to the template schema. +### Health: For Data Analysts & Researchers -### Customize Branding - -Follow the standard Theme Settings instructions to update the Organization Name and Logo. - -## For Data Analysts & Researchers - -### Compare Cities +#### Compare Cities The primary feature of this dashboard is the City Comparator. 1. Locate the "Compare Cities" section at the top. -2. The default city (e.g., Boston, MA) is selected. -3. Click **+ Add City** to select up to 4 additional cities. +2. The default city (e.g., Boston, MA) is selected. +3. Click **+ Add City** to select up to 4 additional cities. 4. The dashboard will update to show side-by-side metrics. -### Key Metrics Indicators +> [!TIP] +> The dashboard updates in real-time as you add or remove cities from the comparison. + +#### Key Metrics Indicators View cards displaying current percentages for: -* Obesity -* Smoking -* Physical Health -* Diabetes -* High Blood Pressure +- Obesity +- Smoking +- Physical Health +- Diabetes +- High Blood Pressure -### Visual Comparison & Distribution +#### Visual Comparison & Distribution -* **Bar Charts:** Compare the selected cities against each other across multiple health categories (e.g., "People Vaccinated," "People Who Are Sick"). -* **Trend Lines:** View the "Health Issue Distribution" over time (2020–2024) to identify rising or falling trends. +- **Bar Charts:** Compare the selected cities against each other across multiple health categories (e.g., "People Vaccinated," "People Who Are Sick"). +- **Trend Lines:** View the "Health Issue Distribution" over time (2020–2024) to identify rising or falling trends. -# Data Commons for Energy +--- + +## Data Commons for Energy ***Template: Methane Insights & Asset Risk*** -## Overview +### Energy Overview The Energy template focuses on environmental monitoring, specifically correlating private asset locations with public methane plume data to identify high-risk leaks and community impact. -## For Administrators +### Energy: For Administrators -### Upload Asset Data +> [!IMPORTANT] +> Your CSV must contain latitude/longitude coordinates for each asset. See the sample file in the Admin Panel for the required format. -* Navigate to **Admin Panel > Data & Files.** -* Upload your **Asset Locations CSV.** This file must contain coordinates (latitude/longitude) of your infrastructure. +For upload instructions and UI customization, see [Getting Started](#getting-started). -## For Data Analysts & Researchers +### Energy: For Data Analysts & Researchers -### Risk Overview (KPI Cards) +#### Risk Overview (KPI Cards) -* **Total Plume-Asset Intersections:** Percentage of assets currently intersecting with detected methane plumes. -* **Assets Near Communities:** Count of assets within a specific radius of populated areas. -* **High-Risk Issues:** Critical alerts detected in the last 30 days. +- **Total Plume-Asset Intersections:** Percentage of assets currently intersecting with detected methane plumes. +- **Assets Near Communities:** Count of assets within a specific radius of populated areas. +- **High-Risk Issues:** Critical alerts detected in the last 30 days. -### Methane Map Chart +#### Methane Map Chart This interactive map layers three datasets: -* **Methane Plumes (Public):** Satellite detection data. -* **Asset Density (Private):** Your uploaded infrastructure. -* **Community Risk Index (Public):** Census data indicating vulnerable populations. +- **Methane Plumes (Public):** Satellite detection data. +- **Asset Density (Private):** Your uploaded infrastructure. +- **Community Risk Index (Public):** Census data indicating vulnerable populations. -### Detailed Intersections Table +#### Detailed Intersections Table Review specific leak events in the table at the bottom of the dashboard: -* **Risk Score:** Low / Medium / High / Critical. -* **Leak Event ID:** Unique identifier for the detection. -* **Suspected Asset:** The specific asset ID linked to the leak. -* **Vulnerability Level:** Demographic risk score of the nearby community. -* **Action Status:** Current operational status (e.g., "Normal Operations"). +- **Risk Score:** Low / Medium / High / Critical. +- **Leak Event ID:** Unique identifier for the detection. +- **Suspected Asset:** The specific asset ID linked to the leak. +- **Vulnerability Level:** Demographic risk score of the nearby community. +- **Action Status:** Current operational status (e.g., "Normal Operations"). + +--- + +## Known Limitations + +- **Data Sync:** Dashboard data refreshes automatically after upload, but large CSVs may take a few moments to process. +- **Browser Support:** For best performance, use the latest version of Chrome. -# Known Limitations +> [!TIP] +> For troubleshooting deployment issues, port-forwarding errors, or pod status problems, see the [Deployment Guide — Troubleshooting](DEPLOYMENT_GUIDE.md#troubleshooting). -* **Data Sync:** Dashboard data refreshes automatically after upload, but large CSVs may take a few moments to process. -* **Browser Support:** For best performance, use the latest version of Chrome. +--- -# Request Support +## Request Support -If you encounter issues not covered in this guide: +If you encounter issues not covered in this guide: 1. Check the deployment logs in your Google Cloud Console. -2. Contact your organization's system administrator. -3. To report bugs, request new features [Get Data Commons support](https://docs.datacommons.org/support.html). +2. Contact your organization's system administrator. +3. To report bugs, request new features [Get Data Commons support](https://docs.datacommons.org/support.html). \ No newline at end of file diff --git a/mp-pkg/charts/datacommons/.helmignore b/mp-pkg/charts/datacommons/.helmignore new file mode 100644 index 0000000..266489d --- /dev/null +++ b/mp-pkg/charts/datacommons/.helmignore @@ -0,0 +1,35 @@ +.DS_Store +.git/ +.gitignore +.bzr/ +.bzrignore +.hg/ +.hgignore +.svn/ + +# Common backup files +*.swp +*.bak +*.tmp +*.orig +*~ + +# Various IDEs +.project +.idea/ +*.tmproj +.vscode/ + +# Testing +tests/ +test/ +ci/ + +# Documentation +README.md +CHANGELOG.md +LICENSE + +# CI/CD +.gitlab-ci.yml +Makefile diff --git a/mp-pkg/charts/datacommons/Chart.yaml b/mp-pkg/charts/datacommons/Chart.yaml new file mode 100644 index 0000000..69d677e --- /dev/null +++ b/mp-pkg/charts/datacommons/Chart.yaml @@ -0,0 +1,34 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +apiVersion: v2 +name: datacommons +description: Custom Data Commons Accelerator deployment +type: application +version: 3.3.12 +appVersion: "v3.3.12" + +# GCP Marketplace metadata +annotations: + marketplace.cloud.google.com/deploy-info: '{}' + +# Chart metadata +icon: https://datacommons.org/images/dc-logo.svg +keywords: + - datacommons + - data + - visualization + - gcp + - kubernetes + - marketplace diff --git a/mp-pkg/charts/datacommons/crds/app-crd.yaml b/mp-pkg/charts/datacommons/crds/app-crd.yaml new file mode 100644 index 0000000..8e08c7d --- /dev/null +++ b/mp-pkg/charts/datacommons/crds/app-crd.yaml @@ -0,0 +1,532 @@ +# Copyright 2020 The Kubernetes Authors. +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apiextensions.k8s.io/v1 +kind: CustomResourceDefinition +metadata: + annotations: + api-approved.kubernetes.io: https://github.com/kubernetes-sigs/application/pull/2 + controller-gen.kubebuilder.io/version: v0.4.0 + creationTimestamp: null + name: applications.app.k8s.io +spec: + group: app.k8s.io + names: + categories: + - all + kind: Application + listKind: ApplicationList + plural: applications + shortNames: + - app + singular: application + scope: Namespaced + versions: + - additionalPrinterColumns: + - description: The type of the application + jsonPath: .spec.descriptor.type + name: Type + type: string + - description: The creation date + jsonPath: .spec.descriptor.version + name: Version + type: string + - description: The application object owns the matched resources + jsonPath: .spec.addOwnerRef + name: Owner + type: boolean + - description: Numbers of components ready + jsonPath: .status.componentsReady + name: Ready + type: string + - description: The creation date + jsonPath: .metadata.creationTimestamp + name: Age + type: date + name: v1beta1 + schema: + openAPIV3Schema: + description: Application is the Schema for the applications API + properties: + apiVersion: + description: 'APIVersion defines the versioned schema of this representation + of an object. Servers should convert recognized schemas to the latest + internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources' + type: string + kind: + description: 'Kind is a string value representing the REST resource this + object represents. Servers may infer this from the endpoint the client + submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' + type: string + metadata: + type: object + spec: + description: ApplicationSpec defines the specification for an Application. + properties: + addOwnerRef: + description: AddOwnerRef objects - flag to indicate if we need to + add OwnerRefs to matching objects Matching is done by using Selector + to query all ComponentGroupKinds + type: boolean + assemblyPhase: + description: AssemblyPhase represents the current phase of the application's + assembly. An empty value is equivalent to "Succeeded". + type: string + componentKinds: + description: ComponentGroupKinds is a list of Kinds for Application's + components (e.g. Deployments, Pods, Services, CRDs). It can be used + in conjunction with the Application's Selector to list or watch + the Applications components. + items: + description: GroupKind specifies a Group and a Kind, but does not + force a version. This is useful for identifying concepts during + lookup stages without having partially valid types + properties: + group: + type: string + kind: + type: string + required: + - group + - kind + type: object + type: array + descriptor: + description: Descriptor regroups information and metadata about an + application. + properties: + description: + description: Description is a brief string description of the + Application. + type: string + icons: + description: Icons is an optional list of icons for an application. + Icon information includes the source, size, and mime type. + items: + description: ImageSpec contains information about an image used + as an icon. + properties: + size: + description: (optional) The size of the image in pixels + (e.g., 25x25). + type: string + src: + description: The source for image represented as either + an absolute URL to the image or a Data URL containing + the image. Data URLs are defined in RFC 2397. + type: string + type: + description: (optional) The mine type of the image (e.g., + "image/png"). + type: string + required: + - src + type: object + type: array + keywords: + description: Keywords is an optional list of key words associated + with the application (e.g. MySQL, RDBMS, database). + items: + type: string + type: array + links: + description: Links are a list of descriptive URLs intended to + be used to surface additional documentation, dashboards, etc. + items: + description: Link contains information about an URL to surface + documentation, dashboards, etc. + properties: + description: + description: Description is human readable content explaining + the purpose of the link. + type: string + url: + description: Url typically points at a website address. + type: string + type: object + type: array + maintainers: + description: Maintainers is an optional list of maintainers of + the application. The maintainers in this list maintain the the + source code, images, and package for the application. + items: + description: ContactData contains information about an individual + or organization. + properties: + email: + description: Email is the email address. + type: string + name: + description: Name is the descriptive name. + type: string + url: + description: Url could typically be a website address. + type: string + type: object + type: array + notes: + description: Notes contain a human readable snippets intended + as a quick start for the users of the Application. CommonMark + markdown syntax may be used for rich text representation. + type: string + owners: + description: Owners is an optional list of the owners of the installed + application. The owners of the application should be contacted + in the event of a planned or unplanned disruption affecting + the application. + items: + description: ContactData contains information about an individual + or organization. + properties: + email: + description: Email is the email address. + type: string + name: + description: Name is the descriptive name. + type: string + url: + description: Url could typically be a website address. + type: string + type: object + type: array + type: + description: Type is the type of the application (e.g. WordPress, + MySQL, Cassandra). + type: string + version: + description: Version is an optional version indicator for the + Application. + type: string + type: object + info: + description: Info contains human readable key,value pairs for the + Application. + items: + description: InfoItem is a human readable key,value pair containing + important information about how to access the Application. + properties: + name: + description: Name is a human readable title for this piece of + information. + type: string + type: + description: Type of the value for this InfoItem. + type: string + value: + description: Value is human readable content. + type: string + valueFrom: + description: ValueFrom defines a reference to derive the value + from another source. + properties: + configMapKeyRef: + description: Selects a key of a ConfigMap. + properties: + apiVersion: + description: API version of the referent. + type: string + fieldPath: + description: 'If referring to a piece of an object instead + of an entire object, this string should contain a + valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. + For example, if the object reference is to a container + within a pod, this would take on a value like: "spec.containers{name}" + (where "name" refers to the name of the container + that triggered the event) or if no container name + is specified "spec.containers[2]" (container with + index 2 in this pod). This syntax is chosen only to + have some well-defined way of referencing a part of + an object. TODO: this design is not final and this + field is subject to change in the future.' + type: string + key: + description: The key to select. + type: string + kind: + description: 'Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' + type: string + name: + description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names' + type: string + namespace: + description: 'Namespace of the referent. More info: + https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/' + type: string + resourceVersion: + description: 'Specific resourceVersion to which this + reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency' + type: string + uid: + description: 'UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids' + type: string + type: object + ingressRef: + description: Select an Ingress. + properties: + apiVersion: + description: API version of the referent. + type: string + fieldPath: + description: 'If referring to a piece of an object instead + of an entire object, this string should contain a + valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. + For example, if the object reference is to a container + within a pod, this would take on a value like: "spec.containers{name}" + (where "name" refers to the name of the container + that triggered the event) or if no container name + is specified "spec.containers[2]" (container with + index 2 in this pod). This syntax is chosen only to + have some well-defined way of referencing a part of + an object. TODO: this design is not final and this + field is subject to change in the future.' + type: string + host: + description: The optional host to select. + type: string + kind: + description: 'Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' + type: string + name: + description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names' + type: string + namespace: + description: 'Namespace of the referent. More info: + https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/' + type: string + path: + description: The optional HTTP path. + type: string + protocol: + description: Protocol for the ingress + type: string + resourceVersion: + description: 'Specific resourceVersion to which this + reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency' + type: string + uid: + description: 'UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids' + type: string + type: object + secretKeyRef: + description: Selects a key of a Secret. + properties: + apiVersion: + description: API version of the referent. + type: string + fieldPath: + description: 'If referring to a piece of an object instead + of an entire object, this string should contain a + valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. + For example, if the object reference is to a container + within a pod, this would take on a value like: "spec.containers{name}" + (where "name" refers to the name of the container + that triggered the event) or if no container name + is specified "spec.containers[2]" (container with + index 2 in this pod). This syntax is chosen only to + have some well-defined way of referencing a part of + an object. TODO: this design is not final and this + field is subject to change in the future.' + type: string + key: + description: The key to select. + type: string + kind: + description: 'Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' + type: string + name: + description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names' + type: string + namespace: + description: 'Namespace of the referent. More info: + https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/' + type: string + resourceVersion: + description: 'Specific resourceVersion to which this + reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency' + type: string + uid: + description: 'UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids' + type: string + type: object + serviceRef: + description: Select a Service. + properties: + apiVersion: + description: API version of the referent. + type: string + fieldPath: + description: 'If referring to a piece of an object instead + of an entire object, this string should contain a + valid JSON/Go field access statement, such as desiredState.manifest.containers[2]. + For example, if the object reference is to a container + within a pod, this would take on a value like: "spec.containers{name}" + (where "name" refers to the name of the container + that triggered the event) or if no container name + is specified "spec.containers[2]" (container with + index 2 in this pod). This syntax is chosen only to + have some well-defined way of referencing a part of + an object. TODO: this design is not final and this + field is subject to change in the future.' + type: string + kind: + description: 'Kind of the referent. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds' + type: string + name: + description: 'Name of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names' + type: string + namespace: + description: 'Namespace of the referent. More info: + https://kubernetes.io/docs/concepts/overview/working-with-objects/namespaces/' + type: string + path: + description: The optional HTTP path. + type: string + port: + description: The optional port to select. + format: int32 + type: integer + protocol: + description: Protocol for the service + type: string + resourceVersion: + description: 'Specific resourceVersion to which this + reference is made, if any. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#concurrency-control-and-consistency' + type: string + uid: + description: 'UID of the referent. More info: https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#uids' + type: string + type: object + type: + description: Type of source. + type: string + type: object + type: object + type: array + selector: + description: 'Selector is a label query over kinds that created by + the application. It must match the component objects'' labels. More + info: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#label-selectors' + properties: + matchExpressions: + description: matchExpressions is a list of label selector requirements. + The requirements are ANDed. + items: + description: A label selector requirement is a selector that + contains values, a key, and an operator that relates the key + and values. + properties: + key: + description: key is the label key that the selector applies + to. + type: string + operator: + description: operator represents a key's relationship to + a set of values. Valid operators are In, NotIn, Exists + and DoesNotExist. + type: string + values: + description: values is an array of string values. If the + operator is In or NotIn, the values array must be non-empty. + If the operator is Exists or DoesNotExist, the values + array must be empty. This array is replaced during a strategic + merge patch. + items: + type: string + type: array + required: + - key + - operator + type: object + type: array + matchLabels: + additionalProperties: + type: string + description: matchLabels is a map of {key,value} pairs. A single + {key,value} in the matchLabels map is equivalent to an element + of matchExpressions, whose key field is "key", the operator + is "In", and the values array contains only "value". The requirements + are ANDed. + type: object + type: object + type: object + status: + description: ApplicationStatus defines controller's the observed state + of Application + properties: + components: + description: Object status array for all matching objects + items: + description: ObjectStatus is a generic status holder for objects + properties: + group: + description: Object group + type: string + kind: + description: Kind of object + type: string + link: + description: Link to object + type: string + name: + description: Name of object + type: string + status: + description: 'Status. Values: InProgress, Ready, Unknown' + type: string + type: object + type: array + componentsReady: + description: 'ComponentsReady: status of the components in the format + ready/total' + type: string + conditions: + description: Conditions represents the latest state of the object + items: + description: Condition describes the state of an object at a certain + point. + properties: + lastTransitionTime: + description: Last time the condition transitioned from one status + to another. + format: date-time + type: string + lastUpdateTime: + description: Last time the condition was probed + format: date-time + type: string + message: + description: A human readable message indicating details about + the transition. + type: string + reason: + description: The reason for the condition's last transition. + type: string + status: + description: Status of the condition, one of True, False, Unknown. + type: string + type: + description: Type of condition. + type: string + required: + - status + - type + type: object + type: array + observedGeneration: + description: ObservedGeneration is the most recent generation observed. + It corresponds to the Object's generation, which is updated on mutation + by the API Server. + format: int64 + type: integer + type: object + type: object + served: true + storage: true + subresources: + status: {} +status: + acceptedNames: + kind: "" + plural: "" + conditions: [] + storedVersions: [] \ No newline at end of file diff --git a/mp-pkg/charts/datacommons/templates/NOTES.txt b/mp-pkg/charts/datacommons/templates/NOTES.txt new file mode 100644 index 0000000..00d6c86 --- /dev/null +++ b/mp-pkg/charts/datacommons/templates/NOTES.txt @@ -0,0 +1,85 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +********************************************************************** +* * +* Data Commons has been successfully deployed! * +* * +********************************************************************** + +Your Data Commons application is now running in the {{ .Release.Namespace }} namespace. + +## Quick Status Check + +To verify your deployment: + + kubectl get pods -n {{ .Release.Namespace }} + kubectl get service {{ include "datacommons.fullname" . }} -n {{ .Release.Namespace }} + kubectl get application {{ .Release.Name }} -n {{ .Release.Namespace }} + +## Accessing the Application + +The Data Commons service is running on ClusterIP. To access it: + +### Option 1: Port Forward (for testing) + + until kubectl port-forward svc/{{ include "datacommons.fullname" . }} 8080:8080 -n {{ .Release.Namespace }}; do + echo "Port-forward crashed. Respawning..." >&2 + sleep 1 + done + +Then open http://localhost:8080 in your browser. + +### Option 2: Quick Access via Cloud Shell (recommended) + +For the fastest way to access Data Commons: + +1. Go to GKE Console: https://console.cloud.google.com/kubernetes/list +2. Click your cluster name +3. Click the three-dot menu (more options) next to your cluster +4. Select "Connect" and click "Run in Cloud Shell" +5. Once Cloud Shell connects, run: + + until kubectl port-forward svc/{{ include "datacommons.fullname" . }} 8080:8080 -n {{ .Release.Namespace }}; do + echo "Port-forward crashed. Respawning..." >&2 + sleep 1 + done + +6. Click "Web Preview" (top-right toolbar) > "Preview on port 8080" + +### Option 3: Configure Ingress (recommended for production) + +This chart does NOT create ingress resources. Create your own Ingress or Gateway: +See more: https://docs.cloud.google.com/kubernetes-engine/docs/how-to/exposing-apps + +## Monitoring + +View application logs: + + kubectl logs -l app=datacommons -n {{ .Release.Namespace }} --tail=100 -f + +Monitor pod health: + + kubectl describe pod -l app=datacommons -n {{ .Release.Namespace }} + +## Troubleshooting + +If pods are not starting: +1. Check events: kubectl describe pod -l app=datacommons -n {{ .Release.Namespace }} +2. Verify secrets exist: kubectl get secret {{ .Values.existingSecret }} -n {{ .Release.Namespace }} +3. Check Workload Identity: kubectl get serviceaccount {{ include "datacommons.serviceAccountName" . }} -n {{ .Release.Namespace }} -o yaml + +For more information, see: https://docs.datacommons.org/ + +********************************************************************** diff --git a/mp-pkg/charts/datacommons/templates/_helpers.tpl b/mp-pkg/charts/datacommons/templates/_helpers.tpl new file mode 100644 index 0000000..20a1ac6 --- /dev/null +++ b/mp-pkg/charts/datacommons/templates/_helpers.tpl @@ -0,0 +1,104 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +{{/* +Expand the name of the chart. +*/}} +{{- define "datacommons.name" -}} +{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }} +{{- end }} + +{{/* +Create a default fully qualified app name. +*/}} +{{- define "datacommons.fullname" -}} +{{- if .Values.fullnameOverride }} +{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }} +{{- else }} +{{- .Release.Name | trunc 63 | trimSuffix "-" }} +{{- end }} +{{- end }} + +{{/* +Create chart name and version as used by the chart label. +*/}} +{{- define "datacommons.chart" -}} +{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }} +{{- end }} + +{{/* +Common labels +*/}} +{{- define "datacommons.labels" -}} +helm.sh/chart: {{ include "datacommons.chart" . }} +{{ include "datacommons.selectorLabels" . }} +{{- if .Chart.AppVersion }} +app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} +{{- end }} +app.kubernetes.io/managed-by: {{ .Release.Service }} +app.kubernetes.io/part-of: datacommons +{{- end }} + +{{/* +Selector labels +*/}} +{{- define "datacommons.selectorLabels" -}} +app.kubernetes.io/name: {{ include "datacommons.name" . }} +app.kubernetes.io/instance: {{ .Release.Name }} +app: datacommons +{{- end }} + +{{/* +Service Account name +*/}} +{{- define "datacommons.serviceAccountName" -}} +{{- if .Values.serviceAccount.create }} +{{- default "datacommons-ksa" .Values.serviceAccount.name }} +{{- else }} +{{- default "default" .Values.serviceAccount.name }} +{{- end }} +{{- end }} + +{{/* +GCS output directory +*/}} +{{- define "datacommons.gcsOutputDir" -}} +{{- $bucket := .Values.config.gcs.bucket }} +{{- $prefix := .Values.config.gcs.pathPrefix }} +{{- if $prefix }} +{{- printf "%s/%s/output" $bucket $prefix }} +{{- else }} +{{- printf "%s/output" $bucket }} +{{- end }} +{{- end }} + +{{/* +GCS input directory +*/}} +{{- define "datacommons.gcsInputDir" -}} +{{- $bucket := .Values.config.gcs.bucket }} +{{- $prefix := .Values.config.gcs.pathPrefix }} +{{- if $prefix }} +{{- printf "%s/%s/input" $bucket $prefix }} +{{- else }} +{{- printf "%s/input" $bucket }} +{{- end }} +{{- end }} + +{{/* +Namespace for resources +*/}} +{{- define "datacommons.namespace" -}} +{{- .Release.Namespace }} +{{- end }} diff --git a/mp-pkg/charts/datacommons/templates/application.yaml b/mp-pkg/charts/datacommons/templates/application.yaml new file mode 100644 index 0000000..9652a61 --- /dev/null +++ b/mp-pkg/charts/datacommons/templates/application.yaml @@ -0,0 +1,92 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +apiVersion: app.k8s.io/v1beta1 +kind: Application +metadata: + name: {{ .Release.Name }} + namespace: {{ .Release.Namespace }} + labels: + {{- include "datacommons.labels" . | nindent 4 }} + annotations: + kubernetes-engine.cloud.google.com/icon: >- + data:image/svg+xml;base64,PHN2ZyB3aWR0aD0iMjgiIGhlaWdodD0iMjgiIHZpZXdCb3g9IjAgMCAyOCAyOCIgZmlsbD0ibm9uZSIgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KPGcgaWQ9InVwZGF0ZWQtZGMtbG9nby5zdmciIGNsaXAtcGF0aD0idXJsKCNjbGlwMF81MTI5XzIyNTA4KSI+CjxnIGlkPSJkYy1sb2dvLnN2ZyBmaWxsIiBjbGlwLXBhdGg9InVybCgjY2xpcDFfNTEyOV8yMjUwOCkiPgo8ZyBpZD0iZGMtbG9nby5zdmciIGNsaXAtcGF0aD0idXJsKCNjbGlwMl81MTI5XzIyNTA4KSI+CjxnIGlkPSJHcm91cCI+CjxwYXRoIGlkPSJWZWN0b3IiIGQ9Ik0yNCAwSDRDMS43OTA4NiAwIDAgMS43OTA4NiAwIDRWMjRDMCAyNi4yMDkxIDEuNzkwODYgMjggNCAyOEgyNEMyNi4yMDkxIDI4IDI4IDI2LjIwOTEgMjggMjRWNEMyOCAxLjc5MDg2IDI2LjIwOTEgMCAyNCAwWiIgZmlsbD0iIzA3NTFCMyIvPgo8L2c+CjwvZz4KPC9nPgo8cGF0aCBpZD0iREMiIGQ9Ik0xOS42MTMyIDkuMDE4MzlDMjAuMTI1NCA5LjAxODQyIDIwLjYwMjkgOS4wODE4NyAyMS4wNDQ5IDkuMjEwNzdDMjEuNDg1OCA5LjMzOTM4IDIxLjg5MTEgOS41MjM5MyAyMi4yNTk3IDkuNzYzNTFDMjIuNjI3MiAxMC4wMDI0IDIyLjk2MjEgMTAuMjkxNiAyMy4yNjQ2IDEwLjYzMDdMMjMuMzI5IDEwLjcwMkwyMy4yNTk3IDEwLjc2OTRMMjIuMjIwNiAxMS43ODIxTDIyLjE0MjUgMTEuODU3M0wyMi4wNzQyIDExLjc3MzNDMjEuODYzNiAxMS41MTg4IDIxLjYzNTggMTEuMzA5NSAyMS4zOTI1IDExLjE0NDRMMjEuMzkwNiAxMS4xNDM0QzIxLjE0OCAxMC45NzAyIDIwLjg3OTQgMTAuODM5NyAyMC41ODM5IDEwLjc1MjhIMjAuNTgyOUMyMC4yOTY3IDEwLjY2NiAxOS45Nzc2IDEwLjYyMTkgMTkuNjI1OSAxMC42MjE5QzE4Ljk5NDYgMTAuNjIyIDE4LjQxNzcgMTAuNzcxMiAxNy44OTM1IDExLjA2ODJIMTcuODkyNUMxNy4zNjg4IDExLjM1NjIgMTYuOTQ4NyAxMS43NjE0IDE2LjYzMzcgMTIuMjg2TDE2LjYzNDcgMTIuMjg2OUMxNi4zMzAzIDEyLjgwODkgMTYuMTc1NyAxMy40MzQyIDE2LjE3NTcgMTQuMTY2OEMxNi4xNzU3IDE0Ljg4OTMgMTYuMzI5NiAxNS41MTQ3IDE2LjYzMzcgMTYuMDQ1N0MxNi45NDg5IDE2LjU3MTEgMTcuMzY4OSAxNi45ODE4IDE3Ljg5MzUgMTcuMjc5MUgxNy44OTI1QzE4LjQxNjggMTcuNTY3NSAxOC45OTQxIDE3LjcxMTcgMTkuNjI1OSAxNy43MTE4QzIwLjE5NjcgMTcuNzExOCAyMC42OTg0IDE3LjU5MzMgMjEuMTMxOCAxNy4zNTkyTDIxLjI5NjggMTcuMjY2NEMyMS42NzY1IDE3LjA0MDIgMjIuMDE3IDE2Ljc0NjYgMjIuMzE3MyAxNi4zODQ2TDIyLjM4NTcgMTYuMzAxNkwyMi40NjI4IDE2LjM3NjhMMjMuNTMwMiAxNy40MDIyTDIzLjU5NzYgMTcuNDY3NkwyMy41MzcgMTcuNTM4OUMyMy4yMzM3IDE3Ljg5NzQgMjIuODc5MiAxOC4yMTAzIDIyLjQ3NTUgMTguNDc2NEMyMi4wNzE1IDE4Ljc0MjYgMjEuNjMwOCAxOC45NDg5IDIxLjE1NDIgMTkuMDk1NUMyMC42NzU5IDE5LjI0MjcgMjAuMTYyIDE5LjMxNjIgMTkuNjEzMiAxOS4zMTYyQzE4Ljg5MDMgMTkuMzE2MiAxOC4yMTU2IDE5LjE4NzcgMTcuNTkwOCAxOC45MzA1TDE3LjU4ODggMTguOTI5NUMxNi45NjYgMTguNjYzOCAxNi40MjAyIDE4LjMwMTUgMTUuOTUyMSAxNy44NDI2TDE1Ljk1MDEgMTcuODQxNkMxNS40OTEyIDE3LjM3MzUgMTUuMTI4OSAxNi44MjY5IDE0Ljg2MzIgMTYuMjAzOUwxNC44NjIyIDE2LjIwM1YxNi4yMDJDMTQuNjA1MyAxNS41Njg1IDE0LjQ3NzUgMTQuODg5NyAxNC40Nzc1IDE0LjE2NjhDMTQuNDc3NSAxMy40MzUgMTQuNjA1OCAxMi43NTU3IDE0Ljg2MzIgMTIuMTMwN1YxMi4xMjk3QzE1LjEyOSAxMS41MDY2IDE1LjQ5MTggMTAuOTY1MSAxNS45NTExIDEwLjUwNTdDMTYuMzYxMSAxMC4wOTU3IDE2LjgzMTEgOS43NjYzNyAxNy4zNjAzIDkuNTE4MzlMMTcuNTkwOCA5LjQxNjgzQzE4LjIxNTUgOS4xNTA2NCAxOC44OTAzIDkuMDE4MzkgMTkuNjEzMiA5LjAxODM5Wk04LjE5NTI1IDkuMjM0MjFDOS4yMjQ4NCA5LjIzNDI1IDEwLjExNDggOS40NDM2MyAxMC44NjEzIDkuODY3MDJDMTEuNjE1NCAxMC4yODA5IDEyLjE5NjIgMTAuODYxOCAxMi42MDE1IDExLjYwNzNDMTMuMDE1OCAxMi4zNTMgMTMuMjIyNiAxMy4yMDcyIDEzLjIyMjYgMTQuMTY2OEMxMy4yMjI2IDE1LjEyNjUgMTMuMDE2NyAxNS45ODA3IDEyLjYwMjUgMTYuNzI2NEwxMi42MDE1IDE2LjcyNTRDMTIuMTk2NCAxNy40NzEgMTEuNjE3IDE4LjA1NjQgMTAuODYzMiAxOC40NzkzTDEwLjg2MjIgMTguNDgwM0MxMC4xMTU2IDE4Ljg5NSA5LjIyNTI0IDE5LjEwMDQgOC4xOTUyNSAxOS4xMDA0SDUuMDAzODVWOS4yMzQyMUg4LjE5NTI1Wk02LjY4ODQyIDE3LjQ5NTlIOC4xNTQyNEM4Ljg3MDM2IDE3LjQ5NTkgOS40Nzg4MiAxNy4zNjMgOS45ODMzNCAxNy4xMDI0QzEwLjQ4OCAxNi44MzI2IDEwLjg3MDMgMTYuNDUxNyAxMS4xMzA4IDE1Ljk1NjlDMTEuMzkxNyAxNS40NjExIDExLjUyNDQgMTQuODY1MiAxMS41MjQ0IDE0LjE2NjhDMTEuNTI0MyAxMy40Njg2IDExLjM5MTYgMTIuODczNCAxMS4xMzA4IDEyLjM3NzhDMTAuODcwNSAxMS44ODMyIDEwLjQ4OTQgMTEuNTA1NyA5Ljk4NTI5IDExLjI0NUg5Ljk4MzM0QzkuNDc4ODggMTAuOTc1NCA4Ljg3MDQyIDEwLjgzNzcgOC4xNTQyNCAxMC44Mzc3SDYuNjg4NDJWMTcuNDk1OVoiIGZpbGw9IndoaXRlIiBzdHJva2U9IndoaXRlIiBzdHJva2Utd2lkdGg9IjAuMiIvPgo8L2c+CjxkZWZzPgo8Y2xpcFBhdGggaWQ9ImNsaXAwXzUxMjlfMjI1MDgiPgo8cmVjdCB3aWR0aD0iMjgiIGhlaWdodD0iMjgiIGZpbGw9IndoaXRlIi8+CjwvY2xpcFBhdGg+CjxjbGlwUGF0aCBpZD0iY2xpcDFfNTEyOV8yMjUwOCI+CjxyZWN0IHdpZHRoPSIyOCIgaGVpZ2h0PSIyOCIgZmlsbD0id2hpdGUiLz4KPC9jbGlwUGF0aD4KPGNsaXBQYXRoIGlkPSJjbGlwMl81MTI5XzIyNTA4Ij4KPHJlY3Qgd2lkdGg9IjI4IiBoZWlnaHQ9IjI4IiBmaWxsPSJ3aGl0ZSIvPgo8L2NsaXBQYXRoPgo8L2RlZnM+Cjwvc3ZnPgo= + marketplace.cloud.google.com/deploy-info: '{}' +spec: + descriptor: + type: "Data Commons Accelerator" + version: "{{ .Chart.AppVersion }}" + description: |- + Data Commons Accelerator - a ready-to-deploy instance of Custom Data Commons + for exploring and visualizing structured data on GKE. + + **Features:** + - Interactive data exploration and visualization + - Integration with Google Cloud SQL for data storage + - Scalable Kubernetes-native architecture + - Support for custom data imports from GCS + - Pre-built domain templates (Education, Health, Energy) + links: + - description: "Data Commons Documentation" + url: "https://docs.datacommons.org/" + - description: "Source Code" + url: "https://github.com/datacommonsorg/website" + - description: "User Guide" + url: "https://docs.datacommons.org/custom_dc/" + notes: |- + # Getting Started + + Your Data Commons application has been deployed successfully! + + ## Next Steps + + 1. **Configure Ingress**: Create an Ingress or Gateway API to expose the service: + See [Exposing Apps](https://docs.cloud.google.com/kubernetes-engine/docs/how-to/exposing-apps) for more details. + ``` + kubectl get service {{ include "datacommons.fullname" . }} -n {{ .Release.Namespace }} + ``` + + 2. **Verify Database**: Check that the database initialization completed: + ``` + kubectl get jobs -n {{ .Release.Namespace }} + ``` + + 3. **Monitor Pods**: Ensure all pods are running: + ``` + kubectl get pods -n {{ .Release.Namespace }} + ``` + + 4. **View Logs**: Check application logs: + ``` + kubectl logs -l app.kubernetes.io/name=datacommons -n {{ .Release.Namespace }} + ``` + selector: + matchLabels: + {{- include "datacommons.selectorLabels" . | nindent 6 }} + componentKinds: + - group: apps/v1 + kind: Deployment + - group: v1 + kind: Service + - group: v1 + kind: ConfigMap + - group: v1 + kind: ServiceAccount + - group: batch/v1 + kind: Job + {{- if .Values.dbSync.enabled }} + - group: batch/v1 + kind: CronJob + {{- end }} + addOwnerRef: false diff --git a/mp-pkg/charts/datacommons/templates/configmap.yaml b/mp-pkg/charts/datacommons/templates/configmap.yaml new file mode 100644 index 0000000..ff93168 --- /dev/null +++ b/mp-pkg/charts/datacommons/templates/configmap.yaml @@ -0,0 +1,82 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +apiVersion: v1 +kind: ConfigMap +metadata: + name: {{ include "datacommons.fullname" . }}-config + namespace: {{ .Release.Namespace }} + labels: + {{- include "datacommons.labels" . | nindent 4 }} + annotations: + "helm.sh/hook": pre-install,pre-upgrade + "helm.sh/hook-weight": "-10" + "helm.sh/hook-delete-policy": before-hook-creation +data: + # Data Commons API Configuration + DC_API_ROOT: {{ .Values.config.dcApiRoot | quote }} + WEBSITE_MIXER_API_ROOT: "http://localhost:8081" + + # CloudSQL Configuration + {{- if .Values.config.cloudsql.enabled }} + USE_CLOUDSQL: "true" + CLOUDSQL_INSTANCE: {{ .Values.config.cloudsql.instance | quote }} + CLOUDSQL_USE_PRIVATE_IP: {{ .Values.config.cloudsql.usePrivateIP | default "true" | quote }} + DB_NAME: {{ .Values.config.cloudsql.database | quote }} + DB_USER: {{ .Values.config.cloudsql.user | quote }} + {{- end }} + + # GCS Configuration + {{- if .Values.config.gcs.bucket }} + OUTPUT_DIR: {{ include "datacommons.gcsOutputDir" . | quote }} + INPUT_DIR: {{ include "datacommons.gcsInputDir" . | quote }} + {{- end }} + + # Natural Language / Model Configuration + ENABLE_MODEL: {{ .Values.config.enableNaturalLanguage | quote }} + {{- if not .Values.config.enableNaturalLanguage }} + NL_DISASTER_CONFIG: "" + NL_FULFILLMENT_CONFIG: "" + {{- end }} + + # GCP Project + {{- if .Values.global.projectId }} + GCP_PROJECT_ID: {{ .Values.global.projectId | quote }} + {{- end }} + + # Domain Template + FLASK_ENV: {{ .Values.config.flaskEnv | quote }} + + # Mixer Settings + {{- if .Values.config.gomaxprocs }} + GOMAXPROCS: {{ .Values.config.gomaxprocs | quote }} + {{- end }} + {{- if .Values.config.maxConnections }} + MAX_CONNECTIONS: {{ .Values.config.maxConnections | quote }} + {{- end }} + + # Debug Settings + {{- if .Values.config.debug }} + DEBUG: "true" + {{- end }} + {{- if .Values.config.enableAdmin }} + ENABLE_ADMIN: "true" + {{- end }} + + {{- with .Values.config.extraEnv }} + # Additional environment variables + {{- range $key, $value := . }} + {{ $key }}: {{ $value | quote }} + {{- end }} + {{- end }} diff --git a/mp-pkg/charts/datacommons/templates/db-init-job.yaml b/mp-pkg/charts/datacommons/templates/db-init-job.yaml new file mode 100644 index 0000000..5ff766f --- /dev/null +++ b/mp-pkg/charts/datacommons/templates/db-init-job.yaml @@ -0,0 +1,76 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +{{- if .Values.dbInit.enabled }} +apiVersion: batch/v1 +kind: Job +metadata: + name: {{ include "datacommons.fullname" . }}-db-init + namespace: {{ .Release.Namespace }} + labels: + {{- include "datacommons.labels" . | nindent 4 }} + app.kubernetes.io/component: db-init + annotations: + "helm.sh/hook": pre-install,pre-upgrade + "helm.sh/hook-weight": "-5" + "helm.sh/hook-delete-policy": before-hook-creation +spec: + ttlSecondsAfterFinished: {{ .Values.dbInit.ttlSecondsAfterFinished | default 3600 }} + backoffLimit: {{ .Values.dbInit.backoffLimit | default 3 }} + activeDeadlineSeconds: {{ .Values.dbInit.activeDeadlineSeconds | default 120 }} + template: + metadata: + labels: + {{- include "datacommons.selectorLabels" . | nindent 8 }} + app.kubernetes.io/component: db-init + annotations: + # Checksum to ensure fresh config on each run + checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }} + checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }} + spec: + serviceAccountName: {{ include "datacommons.serviceAccountName" . }} + restartPolicy: Never + + containers: + - name: db-init + image: "{{ .Values.dbInit.image.repository }}:{{ .Values.dbInit.image.tag }}" + imagePullPolicy: {{ .Values.dbInit.image.pullPolicy | default "IfNotPresent" }} + # Environment variables from ConfigMap and Secrets + envFrom: + # Application configuration + - configMapRef: + name: {{ include "datacommons.fullname" . }}-config + # Helm-generated admin credentials + - secretRef: + name: {{ include "datacommons.fullname" . }} + {{- if .Values.existingSecret }} + # Terraform-managed secrets (DB_PASS, DC_API_KEY, MAPS_API_KEY) + - secretRef: + name: {{ .Values.existingSecret }} + {{- end }} + + # Additional job-specific environment variables + env: + # Data run mode + - name: DATA_RUN_MODE + value: {{ .Values.dbInit.mode | quote }} + # Input/Output directories + - name: INPUT_DIR + value: {{ include "datacommons.gcsInputDir" . | quote }} + - name: OUTPUT_DIR + value: {{ include "datacommons.gcsOutputDir" . | quote }} + + resources: + {{- toYaml .Values.dbInit.resources | nindent 12 }} +{{- end }} diff --git a/mp-pkg/charts/datacommons/templates/db-sync-cronjob.yaml b/mp-pkg/charts/datacommons/templates/db-sync-cronjob.yaml new file mode 100644 index 0000000..5a1a962 --- /dev/null +++ b/mp-pkg/charts/datacommons/templates/db-sync-cronjob.yaml @@ -0,0 +1,77 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +{{- if .Values.dbSync.enabled }} +apiVersion: batch/v1 +kind: CronJob +metadata: + name: {{ include "datacommons.fullname" . }}-db-sync + namespace: {{ .Release.Namespace }} + labels: + {{- include "datacommons.labels" . | nindent 4 }} + app.kubernetes.io/component: db-sync +spec: + schedule: {{ .Values.dbSync.schedule | default "0 */3 * * *" | quote }} + successfulJobsHistoryLimit: {{ .Values.dbSync.successfulJobsHistoryLimit | default 3 }} + failedJobsHistoryLimit: {{ .Values.dbSync.failedJobsHistoryLimit | default 1 }} + concurrencyPolicy: {{ .Values.dbSync.concurrencyPolicy | default "Forbid" }} + jobTemplate: + spec: + ttlSecondsAfterFinished: {{ .Values.dbSync.ttlSecondsAfterFinished | default 3600 }} + backoffLimit: {{ .Values.dbSync.backoffLimit | default 3 }} + template: + metadata: + labels: + {{- include "datacommons.selectorLabels" . | nindent 12 }} + app.kubernetes.io/component: db-sync + annotations: + # Checksum to ensure fresh config on each run + checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }} + checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }} + spec: + serviceAccountName: {{ include "datacommons.serviceAccountName" . }} + restartPolicy: Never + + containers: + - name: db-sync + image: "{{ .Values.dbSync.image.repository }}:{{ .Values.dbSync.image.tag }}" + imagePullPolicy: {{ .Values.dbSync.image.pullPolicy | default "IfNotPresent" }} + # Environment variables from ConfigMap and Secrets + envFrom: + # Application configuration + - configMapRef: + name: {{ include "datacommons.fullname" . }}-config + # Helm-generated admin credentials + - secretRef: + name: {{ include "datacommons.fullname" . }} + {{- if .Values.existingSecret }} + # Terraform-managed secrets (DB_PASS, DC_API_KEY, MAPS_API_KEY) + - secretRef: + name: {{ .Values.existingSecret }} + {{- end }} + + # Additional job-specific environment variables + env: + # Data run mode + - name: DATA_RUN_MODE + value: {{ .Values.dbSync.mode | quote }} + # Input/Output directories + - name: INPUT_DIR + value: {{ include "datacommons.gcsInputDir" . | quote }} + - name: OUTPUT_DIR + value: {{ include "datacommons.gcsOutputDir" . | quote }} + + resources: + {{- toYaml .Values.dbSync.resources | nindent 16 }} +{{- end }} diff --git a/mp-pkg/charts/datacommons/templates/deployment.yaml b/mp-pkg/charts/datacommons/templates/deployment.yaml new file mode 100644 index 0000000..b93c02e --- /dev/null +++ b/mp-pkg/charts/datacommons/templates/deployment.yaml @@ -0,0 +1,130 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: {{ include "datacommons.fullname" . }} + namespace: {{ .Release.Namespace }} + labels: + {{- include "datacommons.labels" . | nindent 4 }} +spec: + replicas: {{ .Values.deployment.replicas }} + selector: + matchLabels: + {{- include "datacommons.selectorLabels" . | nindent 6 }} + app.kubernetes.io/component: website + template: + metadata: + labels: + {{- include "datacommons.labels" . | nindent 8 }} + app.kubernetes.io/component: website + {{- with .Values.deployment.podLabels }} + {{- toYaml . | nindent 8 }} + {{- end }} + annotations: + # Force pod restart when ConfigMap or Secret changes + checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }} + checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }} + {{- with .Values.deployment.podAnnotations }} + {{- toYaml . | nindent 8 }} + {{- end }} + spec: + serviceAccountName: {{ include "datacommons.serviceAccountName" . }} + + {{- with .Values.deployment.securityContext }} + securityContext: + {{- toYaml . | nindent 8 }} + {{- end }} + + containers: + - name: website + image: "{{ .Values.deployment.image.repository }}:{{ .Values.deployment.image.tag }}" + imagePullPolicy: {{ .Values.deployment.image.pullPolicy }} + + ports: + - name: http + containerPort: {{ .Values.service.targetPort }} + protocol: TCP + + # Environment variables from ConfigMap and Secrets + envFrom: + # Application configuration + - configMapRef: + name: {{ include "datacommons.fullname" . }}-config + # Helm-generated admin credentials + - secretRef: + name: {{ include "datacommons.fullname" . }} + {{- if .Values.existingSecret }} + # Terraform-managed secrets (DB_PASS, DC_API_KEY, MAPS_API_KEY) + - secretRef: + name: {{ .Values.existingSecret }} + {{- end }} + + {{- with .Values.deployment.resources }} + resources: + {{- toYaml . | nindent 12 }} + {{- end }} + + {{- if .Values.deployment.probes.startup.enabled }} + startupProbe: + httpGet: + path: /healthz + port: http + initialDelaySeconds: {{ .Values.deployment.probes.startup.initialDelaySeconds }} + periodSeconds: {{ .Values.deployment.probes.startup.periodSeconds }} + timeoutSeconds: {{ .Values.deployment.probes.startup.timeoutSeconds }} + failureThreshold: {{ .Values.deployment.probes.startup.failureThreshold }} + {{- end }} + + {{- if .Values.deployment.probes.readiness.enabled }} + readinessProbe: + httpGet: + path: /healthz + port: http + periodSeconds: {{ .Values.deployment.probes.readiness.periodSeconds }} + timeoutSeconds: {{ .Values.deployment.probes.readiness.timeoutSeconds }} + failureThreshold: {{ .Values.deployment.probes.readiness.failureThreshold }} + {{- end }} + + {{- if .Values.deployment.probes.liveness.enabled }} + livenessProbe: + httpGet: + path: /healthz + port: http + initialDelaySeconds: {{ .Values.deployment.probes.liveness.initialDelaySeconds }} + periodSeconds: {{ .Values.deployment.probes.liveness.periodSeconds }} + timeoutSeconds: {{ .Values.deployment.probes.liveness.timeoutSeconds }} + failureThreshold: {{ .Values.deployment.probes.liveness.failureThreshold }} + {{- end }} + + {{- with .Values.deployment.containerSecurityContext }} + securityContext: + {{- toYaml . | nindent 12 }} + {{- end }} + + {{- with .Values.deployment.nodeSelector }} + nodeSelector: + {{- toYaml . | nindent 8 }} + {{- end }} + + {{- with .Values.deployment.affinity }} + affinity: + {{- toYaml . | nindent 8 }} + {{- end }} + + {{- with .Values.deployment.tolerations }} + tolerations: + {{- toYaml . | nindent 8 }} + {{- end }} diff --git a/mp-pkg/charts/datacommons/templates/secret.yaml b/mp-pkg/charts/datacommons/templates/secret.yaml new file mode 100644 index 0000000..76db52c --- /dev/null +++ b/mp-pkg/charts/datacommons/templates/secret.yaml @@ -0,0 +1,37 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +apiVersion: v1 +kind: Secret +metadata: + name: {{ include "datacommons.fullname" . }} + namespace: {{ include "datacommons.namespace" . }} + labels: + {{- include "datacommons.labels" . | nindent 4 }} + annotations: + "helm.sh/hook": pre-install + "helm.sh/hook-weight": "-10" +type: Opaque +data: + # Admin Panel Credentials + {{- $existingSecret := lookup "v1" "Secret" (include "datacommons.namespace" .) (include "datacommons.fullname" .) }} + {{- if $existingSecret }} + # Preserve existing admin credentials on upgrade + ADMIN_PANEL_USERNAME: {{ index $existingSecret.data "ADMIN_PANEL_USERNAME" }} + ADMIN_PANEL_PASSWORD: {{ index $existingSecret.data "ADMIN_PANEL_PASSWORD" }} + {{- else }} + # Generate new admin credentials on first install + ADMIN_PANEL_USERNAME: {{ "admin" | b64enc | quote }} + ADMIN_PANEL_PASSWORD: {{ randAlphaNum 12 | b64enc | quote }} + {{- end }} diff --git a/mp-pkg/charts/datacommons/templates/service.yaml b/mp-pkg/charts/datacommons/templates/service.yaml new file mode 100644 index 0000000..855e443 --- /dev/null +++ b/mp-pkg/charts/datacommons/templates/service.yaml @@ -0,0 +1,34 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +apiVersion: v1 +kind: Service +metadata: + name: {{ include "datacommons.fullname" . }} + namespace: {{ .Release.Namespace }} + labels: + {{- include "datacommons.labels" . | nindent 4 }} + {{- with .Values.service.annotations }} + annotations: + {{- toYaml . | nindent 4 }} + {{- end }} +spec: + type: {{ .Values.service.type }} + ports: + - port: {{ .Values.service.port }} + targetPort: {{ .Values.service.targetPort }} + protocol: TCP + name: http + selector: + {{- include "datacommons.selectorLabels" . | nindent 4 }} diff --git a/mp-pkg/charts/datacommons/templates/serviceaccount.yaml b/mp-pkg/charts/datacommons/templates/serviceaccount.yaml new file mode 100644 index 0000000..5f450fa --- /dev/null +++ b/mp-pkg/charts/datacommons/templates/serviceaccount.yaml @@ -0,0 +1,33 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +{{- if .Values.serviceAccount.create }} +apiVersion: v1 +kind: ServiceAccount +metadata: + name: {{ include "datacommons.serviceAccountName" . }} + namespace: {{ .Release.Namespace }} + labels: + {{- include "datacommons.labels" . | nindent 4 }} + annotations: + "helm.sh/hook": pre-install,pre-upgrade + "helm.sh/hook-weight": "-10" + "helm.sh/hook-delete-policy": before-hook-creation + {{- if .Values.serviceAccount.gcpServiceAccountEmail }} + iam.gke.io/gcp-service-account: {{ .Values.serviceAccount.gcpServiceAccountEmail }} + {{- end }} + {{- with .Values.serviceAccount.annotations }} + {{- toYaml . | nindent 4 }} + {{- end }} +{{- end }} diff --git a/mp-pkg/charts/datacommons/values.yaml b/mp-pkg/charts/datacommons/values.yaml new file mode 100644 index 0000000..36edf8e --- /dev/null +++ b/mp-pkg/charts/datacommons/values.yaml @@ -0,0 +1,221 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Global Settings +# ============================================ +global: + # GCP Project ID (injected by Terraform) + projectId: "" + +# ============================================ +# Deployment Configuration +# ============================================ +deployment: + replicas: 1 + + # Container image (injected by Terraform) + image: + repository: "" + tag: "" + pullPolicy: IfNotPresent + + # Resource limits + resources: + requests: + memory: "4Gi" + cpu: "2" + limits: + memory: "4Gi" + cpu: "2" + + # Health probes + probes: + startup: + enabled: true + initialDelaySeconds: 60 + periodSeconds: 10 + timeoutSeconds: 5 + failureThreshold: 30 + readiness: + enabled: true + periodSeconds: 10 + timeoutSeconds: 5 + failureThreshold: 3 + liveness: + enabled: true + initialDelaySeconds: 120 + periodSeconds: 30 + timeoutSeconds: 5 + failureThreshold: 3 + + # Pod security context + securityContext: + fsGroup: 1000 + + # Container security context + containerSecurityContext: {} + + # Node selector + nodeSelector: {} + + # Tolerations + tolerations: [] + + # Affinity + affinity: {} + + # Additional pod annotations + podAnnotations: {} + + # Additional pod labels + podLabels: {} + +# ============================================ +# Database Initialization Job +# ============================================ +dbInit: + enabled: true + + # Data container image (injected by Terraform) + image: + repository: "" + tag: "" + pullPolicy: IfNotPresent + + # Run mode: "schemaupdate" for schema creation only + mode: "schemaupdate" + + # Job settings + ttlSecondsAfterFinished: 3600 + backoffLimit: 3 + activeDeadlineSeconds: 600 + + # Resource limits + resources: + requests: + cpu: "500m" + memory: "1Gi" + limits: + cpu: "2" + memory: "4Gi" + +# ============================================ +# Database Sync CronJob +# ============================================ +dbSync: + enabled: true + + # Cron schedule (default: every 3 hours) + schedule: "0 */3 * * *" + + # Data container image (injected by Terraform) + image: + repository: "" + tag: "" + pullPolicy: IfNotPresent + + # Run mode: "customdc" for data import. But should be left empty to enable it. + mode: "" + + # CronJob settings + concurrencyPolicy: "Forbid" + successfulJobsHistoryLimit: 3 + failedJobsHistoryLimit: 1 + ttlSecondsAfterFinished: 3600 + backoffLimit: 3 + + # Resource limits + resources: + requests: + cpu: "500m" + memory: "1Gi" + limits: + cpu: "2" + memory: "4Gi" + +# ============================================ +# Application Configuration +# ============================================ +config: + # Enable Natural Language features + enableNaturalLanguage: true + + # Enable periodic data sync + enableDataSync: true + + # Domain template (FLASK_ENV) + flaskEnv: "custom" + + # Data Commons API root + dcApiRoot: "https://api.datacommons.org" + + # CloudSQL configuration (injected by Terraform) + cloudsql: + enabled: true + # Instance connection name: project:region:instance + instance: "" + # Database name + database: "datacommons" + # Database user + user: "datacommons" + # Use private IP connection + usePrivateIP: true + + # GCS configuration (injected by Terraform) + gcs: + # Base bucket URL + bucket: "" + # Optional path prefix within bucket + pathPrefix: "" + + # Mixer settings + gomaxprocs: 50 + maxConnections: 100 + + # Debug settings + debug: false + enableAdmin: false + + # Additional environment variables + extraEnv: {} + +# ============================================ +# Service Account (Workload Identity) +# ============================================ +serviceAccount: + create: true + # K8s ServiceAccount name + name: "datacommons-ksa" + # GCP Service Account email (injected by Terraform) + gcpServiceAccountEmail: "" + # Additional annotations + annotations: {} + +# ============================================ +# Secrets Configuration +# ============================================ +# Secrets are created by Terraform and referenced here +# The chart does NOT create secrets - only references them +existingSecret: "datacommons-secrets" + + +# ============================================ +# Service Configuration +# ============================================ +service: + type: ClusterIP + port: 8080 + targetPort: 8080 + annotations: {} diff --git a/mp-pkg/terraform/README.md b/mp-pkg/terraform/README.md new file mode 100644 index 0000000..b7214e7 --- /dev/null +++ b/mp-pkg/terraform/README.md @@ -0,0 +1,152 @@ +# Data Commons Accelerator - Terraform Infrastructure + +This Terraform configuration deploys Data Commons Accelerator to Google Kubernetes Engine (GKE) with CloudSQL MySQL and Cloud Storage backends. It manages Private Service Access for database connectivity, Workload Identity for secure GCP service integration, and deploys the application via Helm. + +## Enterprise Flexibility Features + +This solution is designed for enterprise environments with maximum flexibility: + +### Resource Naming with Random Suffixes + +By default, resources use auto-generated names with random suffixes to prevent collisions: +- CloudSQL instance: `{deployment-name}-db-{random-suffix}` +- GCS bucket: `{deployment-name}-data-{random-suffix}` +- Service account: `{deployment-name}-sa-{random-suffix}` + +### Name Overrides (Optional) + +For environments requiring specific resource names (compliance, naming conventions): +- `cloudsql_instance_name_override` - Specify exact CloudSQL instance name +- `gcs_bucket_name_override` - Specify exact GCS bucket name +- `service_account_name_override` - Specify exact service account name + +### Pre-existing Resources Support + +- **Namespace**: Set `create_namespace = false` if namespace already exists +- **APIs**: Handles pre-enabled APIs idempotently (no errors if already enabled) + +### Example: Enterprise Deployment with Existing Resources + +```hcl +# Use existing namespace +create_namespace = false +namespace = "datacommons-prod" + +# Override resource names to match enterprise conventions +cloudsql_instance_name_override = "dc-mysql-prod-001" +gcs_bucket_name_override = "company-dc-data-prod" +service_account_name_override = "svc-datacommons-prod" +``` + +## Requirements + +- GCP project with required APIs enabled +- Existing GKE cluster (VPC-native, Workload Identity enabled) — only required when deploying to an existing cluster +- Terraform >= 1.5.7 +- Google Cloud Provider >= 7.0.0 +- Kubernetes Provider >= 2.20 +- Helm Provider >= 2.12 + +### Deployment Service Account IAM Roles + +The deployment service account (created by GCP Marketplace and used by Infrastructure Manager to run Terraform) must have the following IAM roles assigned at the project level: + +| Role | Purpose | +|------|---------| +| `roles/container.developer` | Deploy to GKE cluster (Helm releases, namespaces, secrets) | +| `roles/cloudsql.admin` | Create and manage CloudSQL MySQL instances | +| `roles/storage.admin` | Create and manage GCS buckets | +| `roles/iam.serviceAccountAdmin` | Create and manage GCP service accounts (Workload Identity) | +| `roles/compute.networkAdmin` | Manage VPC networking and Private Service Access | +| `roles/serviceusage.serviceUsageAdmin` | Enable required GCP APIs | +| `roles/serviceusage.apiKeysAdmin` | Create Google Maps API keys | +| `roles/resourcemanager.projectIamAdmin` | Assign IAM roles to service accounts at project level | + +These roles are configured in the GCP Marketplace Producer Portal during solution setup. Additionally, the **Infrastructure Manager Agent** service account requires `roles/resourcemanager.projectIamAdmin` to function. + +## What Gets Deployed + +- CloudSQL MySQL instance with private IP +- Cloud Storage bucket for data artifacts +- Workload Identity binding (GKE service account ↔ GCP service account) +- Google Maps API keys (if needed) +- Kubernetes secrets for database credentials +- Helm release of Data Commons application + +## Private Service Access + +CloudSQL uses private IP connectivity via Private Service Access (PSA). A /20 IP range is automatically allocated and a PSA connection is created during deployment. If the VPC already has PSA configured, the existing ranges are preserved alongside the new one. + +## Configuration + +### Terraform Requirements + +| Name | Version | +|------|---------| +| terraform | >= 1.5.7 | +| google | >= 7.0.0, < 8.0.0 | +| google-beta | >= 7.0.0, < 8.0.0 | +| helm | ~> 2.12.0 | +| kubernetes | >= 2.20.0 | +| random | ~> 3.6.0 | + +### Providers + +| Name | Version | +|------|---------| +| google | 7.15.0 | +| helm | 2.12.1 | +| kubernetes | 3.0.1 | +| random | 3.6.3 | + +### Modules + +| Name | Source | +|------|--------| +| cloudsql | ./modules/cloudsql | +| gcs_bucket | ./modules/gcs-bucket | +| k8s_secrets | ./modules/k8s-secrets | +| maps_api_keys | ./modules/maps-api-keys | + +## Inputs + +| Name | Description | Type | Default | Required | +|------|-------------|------|---------|:--------:| +| goog_cm_deployment_name | Deployment name for the Data Commons Accelerator solution (used by GCP Marketplace for tracking and avoiding resource name collisions) | string | n/a | yes | +| project_id | GCP project ID where Data Commons Accelerator will be deployed | string | n/a | yes | +| create_new_cluster | Create a new GKE cluster with VPC networking (VPC, subnet, Cloud Router, Cloud NAT, and PSA). When false, deploy to an existing cluster specified by gke_cluster_name. | bool | `true` | no | +| region | GCP region for the new GKE cluster and networking resources. Only used when create_new_cluster is true. | string | `"us-central1"` | no | +| gke_cluster_name | Name of the existing GKE cluster to deploy to. Only used when create_new_cluster is false. | string | `""` | no | +| gke_cluster_location | Location (region or zone) of the existing GKE cluster. The GCP region for CloudSQL and other resources is derived from this value. Only used when create_new_cluster is false. | string | `""` | no | +| namespace | Kubernetes namespace for Data Commons Accelerator deployment. Defaults to the deployment name if not provided. | string | `""` | no | +| create_namespace | Create new Kubernetes namespace. Set to false if namespace already exists. | bool | `true` | no | +| cloudsql_instance_name_override | Override CloudSQL instance name (uses generated name if not specified) | string | `""` | no | +| gcs_bucket_name_override | Override GCS bucket name (uses generated name if not specified) | string | `""` | no | +| service_account_name_override | Override service account name (uses generated name if not specified) | string | `""` | no | +| cdc_services_image_repo | Container image repository for CDC Services (populated by GCP Marketplace) | string | n/a | yes | +| cdc_services_image_tag | Container image tag for CDC Services (populated by GCP Marketplace) | string | n/a | yes | +| data_image_repo | Container image repository for Data service (populated by GCP Marketplace) | string | n/a | yes | +| data_image_tag | Container image tag for Data service (populated by GCP Marketplace) | string | n/a | yes | +| helm_chart_repo | Helm chart repository URL (populated by GCP Marketplace) | string | n/a | yes | +| helm_chart_version | Helm chart version (populated by GCP Marketplace) | string | n/a | yes | +| helm_chart_name | Helm chart name (populated by GCP Marketplace) | string | `"datacommons"` | no | +| app_replicas | Number of replicas for the Data Commons Accelerator application deployment | number | `1` | no | +| resource_tier | Resource allocation tier for the application (small, medium, large). Also controls CloudSQL machine tier and high availability. | string | `"medium"` | no | +| flask_env | Data Commons domain template (health, education, energy) | string | `"health"` | no | +| dc_api_key | Data Commons API key for accessing Data Commons APIs | string | n/a | yes | +| enable_natural_language | Enable natural language query features | bool | `true` | no | +| enable_data_sync | Enable automatic synchronization of custom data from GCS bucket to CloudSQL database | bool | `true` | no | + +## Outputs + +| Name | Description | +|------|-------------| +| namespace | Kubernetes namespace where DataCommons is deployed | +| gcs\_bucket\_url | GCS bucket URL (gs://\) | +| kubectl\_configure | Command to configure kubectl for your GKE cluster | +| verify\_pods | Command to verify Data Commons pods are running | +| port\_forward | Port-forward command to access Data Commons locally (with auto-retry) | +| cloud\_shell\_access | Cloud Shell quick access instructions for Data Commons | +| upload\_data | Command to upload custom data to GCS bucket | +| view\_logs | Command to view application logs | +| retrieve\_admin\_credentials | Commands to retrieve admin panel credentials (username and password) | diff --git a/mp-pkg/terraform/api-keys.tf b/mp-pkg/terraform/api-keys.tf new file mode 100644 index 0000000..f1d956f --- /dev/null +++ b/mp-pkg/terraform/api-keys.tf @@ -0,0 +1,37 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Google Maps API Keys +# ============================================ +module "maps_api_keys" { + source = "./modules/maps-api-keys" + + project_id = var.project_id + name_prefix = "${local.deployment_name}-maps" + api_targets = [ + "maps-backend.googleapis.com", + "places-backend.googleapis.com" + ] + + labels = merge( + local.common_labels, + { + component = "api-keys" + purpose = "maps-integration" + } + ) + + depends_on = [google_project_service.apis["apikeys.googleapis.com"]] +} diff --git a/mp-pkg/terraform/cloudsql.tf b/mp-pkg/terraform/cloudsql.tf new file mode 100644 index 0000000..e4a0c35 --- /dev/null +++ b/mp-pkg/terraform/cloudsql.tf @@ -0,0 +1,131 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Private Service Access (PSA) - Auto-Detection +# ============================================ + +# Query the Service Networking API to detect existing PSA connections on the VPC. +# Only active in BYO mode — in create mode, the new VPC has no existing PSA. +data "http" "psa_connections" { + count = var.create_new_cluster ? 0 : 1 + + url = "https://servicenetworking.googleapis.com/v1/services/servicenetworking.googleapis.com/connections?network=projects/${data.google_project.current.number}/global/networks/${local.vpc_network_name}" + + request_headers = { + Authorization = "Bearer ${data.google_client_config.default.access_token}" + Accept = "application/json" + } + + depends_on = [ + google_project_service.apis["servicenetworking.googleapis.com"], + ] + + lifecycle { + postcondition { + # Accept 200 (success) as well as 403/404 (no access or no connections) — all are non-fatal. + # Any of these statuses means we can safely fall through to the create path. + condition = contains([200, 403, 404], self.status_code) + error_message = "Service Networking API returned unexpected status ${self.status_code}. Expected 200, 403, or 404." + } + } +} + +# Allocate a /20 IP range for Private Service Access. +# Always created; in BYO mode existing ranges are preserved via concat below. +resource "google_compute_global_address" "cloudsql_private_ip" { + count = local.create_psa_range ? 1 : 0 + + provider = google + + name = "${local.deployment_name}-psa-${local.resource_suffix}" + purpose = "VPC_PEERING" + address_type = "INTERNAL" + prefix_length = local.psa_range_prefix_length + network = local.vpc_network_self_link + project = var.project_id + + labels = merge( + local.common_labels, + { + component = "networking" + purpose = "cloudsql-psa" + } + ) + + depends_on = [ + google_project_service.apis["compute.googleapis.com"], + google_project_service.apis["servicenetworking.googleapis.com"], + ] +} + +# Create or update the Private Service Access connection. +# Always created; reserved_peering_ranges includes BOTH any existing ranges AND the new +# range to prevent destructive replacement of existing peering configuration. +resource "google_service_networking_connection" "cloudsql_private_vpc_connection" { + count = local.create_psa_connection ? 1 : 0 + + provider = google + + network = local.vpc_network_self_link + service = "servicenetworking.googleapis.com" + reserved_peering_ranges = distinct(concat( + local.existing_psa_range_names, + [google_compute_global_address.cloudsql_private_ip[0].name] + )) + + # Prevent deletion of VPC peering connection before CloudSQL instance is removed + deletion_policy = "ABANDON" + + # Handle case where connection already exists with different ranges + update_on_creation_fail = true + + depends_on = [ + google_project_service.apis["compute.googleapis.com"], + google_project_service.apis["servicenetworking.googleapis.com"], + google_compute_global_address.cloudsql_private_ip[0], + ] +} + +# ============================================ +# CloudSQL MySQL Instance +# ============================================ + +module "cloudsql" { + source = "./modules/cloudsql" + + project_id = var.project_id + region = local.cloudsql_region + instance_name = local.cloudsql_instance_name + tier = local.cloudsql_tier + disk_size = local.cloudsql_disk_size + availability_type = local.cloudsql_availability_type + network_self_link = local.vpc_network_self_link + allocated_ip_range = local.psa_range_name + database_name = "datacommons" + user_name = "datacommons" + + labels = merge( + local.common_labels, + { + component = "database" + tier = replace(local.cloudsql_tier, "db-", "") + } + ) + + depends_on = [ + google_service_networking_connection.cloudsql_private_vpc_connection, + data.http.psa_connections, + ] +} \ No newline at end of file diff --git a/mp-pkg/terraform/gcs.tf b/mp-pkg/terraform/gcs.tf new file mode 100644 index 0000000..84ea3ce --- /dev/null +++ b/mp-pkg/terraform/gcs.tf @@ -0,0 +1,71 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# GCS Bucket for DataCommons Data +# ============================================ +module "gcs_bucket" { + source = "./modules/gcs-bucket" + + project_id = var.project_id + bucket_name = local.gcs_bucket_name + location = local.gcs_location + storage_class = "STANDARD" + force_destroy = true + lifecycle_rules = [] + + # IAM Members - empty during plan to avoid for_each dependency issues + # IAM will be granted separately via google_storage_bucket_iam_member + iam_members = [] + + labels = merge( + local.common_labels, + { + component = "storage" + purpose = "datacommons-data" + } + ) + + depends_on = [google_project_service.apis["storage.googleapis.com"]] +} + +# Grant workload service account access to the bucket +# Using separate resource to avoid for_each dependency issue during plan +resource "google_storage_bucket_iam_member" "datacommons_workload_storage_admin" { + bucket = module.gcs_bucket.bucket_name + role = "roles/storage.objectAdmin" + member = "serviceAccount:${google_service_account.datacommons_workload.email}" + + depends_on = [ + module.gcs_bucket, + google_service_account.datacommons_workload + ] +} + +# Create default directory structure required by DataCommons application +resource "google_storage_bucket_object" "input_dir" { + bucket = module.gcs_bucket.bucket_name + name = "input/" + content = " " + + depends_on = [module.gcs_bucket] +} + +resource "google_storage_bucket_object" "output_dir" { + bucket = module.gcs_bucket.bucket_name + name = "output/" + content = " " + + depends_on = [module.gcs_bucket] +} diff --git a/mp-pkg/terraform/gke.tf b/mp-pkg/terraform/gke.tf new file mode 100644 index 0000000..0a98e8d --- /dev/null +++ b/mp-pkg/terraform/gke.tf @@ -0,0 +1,109 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# GKE Node Service Account +# ============================================ + +resource "google_service_account" "gke_node" { + count = var.create_new_cluster ? 1 : 0 + + provider = google + + account_id = local.gke_node_sa_name + display_name = "GKE Node SA (${local.deployment_name})" + description = "Service account for GKE node pools" + project = var.project_id + + depends_on = [google_project_service.apis["iam.googleapis.com"]] +} + +resource "google_project_iam_member" "gke_node_default_sa" { + count = var.create_new_cluster ? 1 : 0 + project = var.project_id + role = "roles/container.defaultNodeServiceAccount" + member = "serviceAccount:${google_service_account.gke_node[0].email}" +} + +resource "google_project_iam_member" "gke_node_metric_writer" { + count = var.create_new_cluster ? 1 : 0 + project = var.project_id + role = "roles/monitoring.metricWriter" + member = "serviceAccount:${google_service_account.gke_node[0].email}" +} + +resource "google_project_iam_member" "gke_node_resource_metadata_writer" { + count = var.create_new_cluster ? 1 : 0 + project = var.project_id + role = "roles/stackdriver.resourceMetadata.writer" + member = "serviceAccount:${google_service_account.gke_node[0].email}" +} + +# ============================================ +# GKE Cluster +# ============================================ + +resource "google_container_cluster" "autopilot" { + count = var.create_new_cluster ? 1 : 0 + + provider = google-beta + + name = "${local.deployment_name}-gke" + location = local.region + project = var.project_id + + enable_autopilot = true + + network = google_compute_network.vpc[0].name + subnetwork = google_compute_subnetwork.primary[0].name + + ip_allocation_policy { + cluster_secondary_range_name = "pods" + services_secondary_range_name = "services" + } + + private_cluster_config { + enable_private_nodes = true + enable_private_endpoint = false + master_ipv4_cidr_block = "172.16.0.0/28" + + master_global_access_config { + enabled = true + } + } + + release_channel { + channel = "STABLE" + } + + workload_identity_config { + workload_pool = "${var.project_id}.svc.id.goog" + } + + deletion_protection = false + + cluster_autoscaling { + auto_provisioning_defaults { + service_account = google_service_account.gke_node[0].email + } + } + + depends_on = [ + google_project_service.apis, + google_compute_subnetwork.primary, + google_project_iam_member.gke_node_default_sa, + google_project_iam_member.gke_node_metric_writer, + google_project_iam_member.gke_node_resource_metadata_writer, + ] +} diff --git a/mp-pkg/terraform/helm.tf b/mp-pkg/terraform/helm.tf new file mode 100644 index 0000000..3e1eb5f --- /dev/null +++ b/mp-pkg/terraform/helm.tf @@ -0,0 +1,210 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Helm Release +# ============================================ +resource "helm_release" "datacommons" { + name = "datacommons" + repository = var.helm_chart_repo + chart = var.helm_chart_name + version = var.helm_chart_version + namespace = local.namespace_name + + wait = true + wait_for_jobs = true + timeout = 900 + + # ============================================ + # Container Images (Marketplace-populated) + # ============================================ + + # CDC Services image + set { + name = "deployment.image.repository" + value = var.cdc_services_image_repo + } + + set { + name = "deployment.image.tag" + value = var.cdc_services_image_tag + } + + # Data service image (db-init) + set { + name = "dbInit.image.repository" + value = var.data_image_repo + } + + set { + name = "dbInit.image.tag" + value = var.data_image_tag + } + + # Data service image (db-sync) + set { + name = "dbSync.image.repository" + value = var.data_image_repo + } + + set { + name = "dbSync.image.tag" + value = var.data_image_tag + } + + # ============================================ + # Application Configuration + # ============================================ + + set { + name = "deployment.replicas" + value = local.tier_preset.replicas + } + + set { + name = "config.enableNaturalLanguage" + value = var.enable_natural_language + } + + set { + name = "config.enableDataSync" + value = var.enable_data_sync + } + + set { + name = "config.flaskEnv" + value = var.flask_env + } + + set { + name = "dbInit.activeDeadlineSeconds" + value = "900" + } + + # ============================================ + # Resource Allocation + # ============================================ + + set { + name = "deployment.resources.limits.memory" + value = local.tier_preset.memory + } + + set { + name = "deployment.resources.limits.cpu" + value = local.tier_preset.cpu + } + + set { + name = "deployment.resources.requests.memory" + value = local.tier_preset.memory + } + + set { + name = "deployment.resources.requests.cpu" + value = local.tier_preset.cpu + } + + # ============================================ + # CloudSQL Configuration + # ============================================ + + set { + name = "config.cloudsql.enabled" + value = "true" + } + + set { + name = "config.cloudsql.instance" + value = module.cloudsql.instance_connection_name + } + + set { + name = "config.cloudsql.database" + value = module.cloudsql.database_name + } + + set { + name = "config.cloudsql.user" + value = module.cloudsql.user_name + } + + set { + name = "config.cloudsql.usePrivateIP" + value = "true" + } + + # ============================================ + # GCS Bucket Configuration + # ============================================ + + set { + name = "config.gcs.bucket" + value = module.gcs_bucket.bucket_url + } + + # ============================================ + # Workload Identity + # ============================================ + + set { + name = "serviceAccount.gcpServiceAccountEmail" + value = google_service_account.datacommons_workload.email + } + + set { + name = "serviceAccount.name" + value = "datacommons-ksa" + } + + set { + name = "serviceAccount.create" + value = "true" + } + + set { + name = "serviceAccount.annotations.iam\\.gke\\.io/gcp-service-account" + value = google_service_account.datacommons_workload.email + } + + # ============================================ + # Secrets Configuration + # ============================================ + + # Use existing Kubernetes secret created by Terraform + set { + name = "existingSecret" + value = "datacommons-secrets" + } + + depends_on = [ + kubernetes_namespace_v1.datacommons, + data.kubernetes_namespace_v1.existing, + module.cloudsql, + module.gcs_bucket, + module.maps_api_keys, + module.k8s_secrets, + google_service_account.datacommons_workload, + google_project_iam_member.datacommons_cloudsql_client, + google_storage_bucket_iam_member.datacommons_workload_storage_admin, + google_storage_bucket_object.input_dir, + google_storage_bucket_object.output_dir, + google_service_account_iam_member.datacommons_workload_identity_user, + google_service_networking_connection.cloudsql_private_vpc_connection, + data.google_container_cluster.gke, + data.google_compute_network.vpc, + google_container_cluster.autopilot, + google_compute_router_nat.nat, + ] +} diff --git a/mp-pkg/terraform/locals.tf b/mp-pkg/terraform/locals.tf new file mode 100644 index 0000000..1b428c2 --- /dev/null +++ b/mp-pkg/terraform/locals.tf @@ -0,0 +1,164 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Local Values +# ============================================ + +locals { + # ============================================ + # Common Labels + # ============================================ + # Applied to all resources + common_labels = { + project = "datacommons" + solution = "datacommons-marketplace" + } + + # ============================================ + # Resource Presets + # ============================================ + # Pre-configured resource allocations for different deployment tiers + # small: Development/testing workloads + # medium: Production workloads with moderate traffic + # large: Production workloads with high traffic + resource_presets = { + small = { + memory = "2Gi" + cpu = "1" + replicas = 1 + cloudsql_tier = "db-n1-standard-1" + cloudsql_disk = 30 + cloudsql_ha = false + } + medium = { + memory = "4Gi" + cpu = "2" + replicas = 2 + cloudsql_tier = "db-n1-standard-2" + cloudsql_disk = 50 + cloudsql_ha = false + } + large = { + memory = "8Gi" + cpu = "4" + replicas = 3 + cloudsql_tier = "db-n1-standard-4" + cloudsql_disk = 100 + cloudsql_ha = true + } + } + + # ============================================ + # Computed Values + # ============================================ + # Derive GCP region from cluster location. + # In create mode: use var.region directly. + # In BYO mode: derive from gke_cluster_location (strip zone suffix if present). + # "europe-west3" → "europe-west3", "europe-west3-a" → "europe-west3" + region = var.create_new_cluster ? var.region : join("-", slice(split("-", var.gke_cluster_location), 0, 2)) + cloudsql_region = local.region + + # Auto-derive GCS location from region + # Maps region prefix to nearest GCS multi-region + gcs_location = ( + startswith(local.region, "us-") || startswith(local.region, "northamerica-") ? "US" : + startswith(local.region, "europe-") ? "EU" : + startswith(local.region, "asia-") || startswith(local.region, "australia-") ? "ASIA" : + "US" # Default fallback + ) + + # All tier-derived values accessed from resource_presets + tier_preset = local.resource_presets[var.resource_tier] + cloudsql_tier = local.tier_preset.cloudsql_tier + cloudsql_ha_enabled = local.tier_preset.cloudsql_ha + cloudsql_availability_type = local.cloudsql_ha_enabled ? "REGIONAL" : "ZONAL" + cloudsql_disk_size = local.tier_preset.cloudsql_disk + + # Resource naming with deployment name for collision avoidance + deployment_name = var.goog_cm_deployment_name + resource_suffix = random_id.suffix.hex + + # ============================================ + # Resource Names (with override support) + # ============================================ + # CloudSQL instance name - use override or computed with random suffix + cloudsql_instance_name = var.cloudsql_instance_name_override != "" ? var.cloudsql_instance_name_override : "${local.deployment_name}-db-${local.resource_suffix}" + + # GCS bucket name - use override or computed with random suffix + gcs_bucket_name_computed = "${local.deployment_name}-data-${local.resource_suffix}" + gcs_bucket_name = var.gcs_bucket_name_override != "" ? var.gcs_bucket_name_override : local.gcs_bucket_name_computed + + # Service account name - use override or computed with random suffix + service_account_name = var.service_account_name_override != "" ? var.service_account_name_override : "${local.deployment_name}-sa-${local.resource_suffix}" + + # GKE node service account - only created for new clusters + gke_node_sa_name = "${substr(local.deployment_name, 0, 14)}-gke-${local.resource_suffix}" + + # ============================================ + # Namespace Derivation + # ============================================ + # Derive namespace from deployment name if not explicitly provided. + # This prevents namespace collisions when users deploy the same solution twice. + namespace = var.namespace != "" ? var.namespace : var.goog_cm_deployment_name + + # ============================================ + # Namespace Reference (conditional creation) + # ============================================ + # Use created namespace or existing namespace based on create_namespace flag + namespace_name = var.create_namespace ? kubernetes_namespace_v1.datacommons[0].metadata[0].name : data.kubernetes_namespace_v1.existing[0].metadata[0].name + + # ============================================ + # VPC Network (conditional: created vs. discovered) + # ============================================ + # In create mode: use the new VPC created in vpc.tf. + # In BYO mode: use the VPC discovered from the existing GKE cluster. + vpc_network_name = var.create_new_cluster ? google_compute_network.vpc[0].name : data.google_compute_network.vpc[0].name + vpc_network_self_link = var.create_new_cluster ? google_compute_network.vpc[0].self_link : data.google_compute_network.vpc[0].self_link + + # ============================================ + # Cluster Configuration (dual-mode) + # ============================================ + # In create mode: use the new Autopilot cluster created in gke.tf. + # In BYO mode: use the existing cluster data source from main.tf. + cluster_endpoint = var.create_new_cluster ? google_container_cluster.autopilot[0].endpoint : data.google_container_cluster.gke[0].endpoint + cluster_ca_cert = var.create_new_cluster ? google_container_cluster.autopilot[0].master_auth[0].cluster_ca_certificate : data.google_container_cluster.gke[0].master_auth[0].cluster_ca_certificate + cluster_name = var.create_new_cluster ? google_container_cluster.autopilot[0].name : var.gke_cluster_name + cluster_location = var.create_new_cluster ? google_container_cluster.autopilot[0].location : var.gke_cluster_location + + # ============================================ + # PSA Auto-Detection Configuration + # ============================================ + # In create mode: always create new PSA (new VPC has no existing PSA). + # In BYO mode: query the Service Networking API to detect existing connections. + + # Parse the HTTP response from the Service Networking API + psa_api_response = var.create_new_cluster ? {} : try(jsondecode(data.http.psa_connections[0].response_body), {}) + existing_psa_connections = try(local.psa_api_response.connections, []) + psa_already_exists = length(local.existing_psa_connections) > 0 + existing_psa_range_names = local.psa_already_exists ? local.existing_psa_connections[0].reservedPeeringRanges : [] + + # Always create new PSA range and connection in both modes. + # In BYO mode, reserved_peering_ranges (in cloudsql.tf) uses distinct(concat()) + # to preserve any existing ranges alongside the new one. + # This eliminates the Terraform 1.5.7 plan-time count limitation entirely. + create_psa_range = true + create_psa_connection = true + + # PSA range prefix length: /20 for all auto-created ranges + psa_range_prefix_length = 20 + + # Always use the newly created range for CloudSQL allocated_ip_range + psa_range_name = google_compute_global_address.cloudsql_private_ip[0].name +} diff --git a/mp-pkg/terraform/main.tf b/mp-pkg/terraform/main.tf new file mode 100644 index 0000000..318b81b --- /dev/null +++ b/mp-pkg/terraform/main.tf @@ -0,0 +1,131 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Provider Configurations +# ============================================ + +# Google Cloud provider +provider "google" { + project = var.project_id + region = local.region +} + +# Google Cloud Beta provider +provider "google-beta" { + project = var.project_id + region = local.region +} + +# Kubernetes provider - configured from cluster locals (works in both create and BYO modes) +provider "kubernetes" { + host = "https://${local.cluster_endpoint}" + token = data.google_client_config.default.access_token + cluster_ca_certificate = base64decode(local.cluster_ca_cert) +} + +# Helm provider - configured from cluster locals (works in both create and BYO modes) +provider "helm" { + kubernetes { + host = "https://${local.cluster_endpoint}" + token = data.google_client_config.default.access_token + cluster_ca_certificate = base64decode(local.cluster_ca_cert) + } +} + +# ============================================ +# Data Sources +# ============================================ +# Get current GCP client configuration for authentication +data "google_client_config" "default" {} + +# Get current GCP project details +data "google_project" "current" { + project_id = var.project_id +} + +# ============================================ +# GCP API Services +# ============================================ + +locals { + # List of all required GCP APIs + required_apis = toset([ + "cloudresourcemanager.googleapis.com", # Project-level resource management + "compute.googleapis.com", # VPC networking and compute resources + "container.googleapis.com", # GKE cluster management + "sqladmin.googleapis.com", # CloudSQL MySQL database + "storage.googleapis.com", # GCS bucket management + "places-backend.googleapis.com", # Maps API + "maps-backend.googleapis.com", # Maps API + "apikeys.googleapis.com", # Maps API key generation + "serviceusage.googleapis.com", # API usage and quotas monitoring + "servicenetworking.googleapis.com", # Private Service Access (CloudSQL) + "iam.googleapis.com", # Service account and IAM management + ]) +} + +resource "google_project_service" "apis" { + for_each = local.required_apis + + project = var.project_id + service = each.value + disable_on_destroy = false + disable_dependent_services = false + + # Prevent race conditions during parallel API enablement + timeouts { + create = "30m" + update = "30m" + } +} + +# ============================================ +# Resource Naming Utilities +# ============================================ + +resource "random_id" "suffix" { + byte_length = 4 + + keepers = { + project_id = var.project_id + } +} + +# ============================================ +# GKE Cluster and VPC Network Discovery (BYO mode only) +# ============================================ +# Only active when create_new_cluster = false. +# In create mode, cluster and VPC are managed resources in gke.tf / vpc.tf. +data "google_container_cluster" "gke" { + count = var.create_new_cluster ? 0 : 1 + name = var.gke_cluster_name + location = var.gke_cluster_location + project = var.project_id + + depends_on = [ + google_project_service.apis["container.googleapis.com"] + ] +} + +# Get full VPC network details for Private Service Access (BYO mode only) +data "google_compute_network" "vpc" { + count = var.create_new_cluster ? 0 : 1 + name = element(split("/", data.google_container_cluster.gke[0].network), length(split("/", data.google_container_cluster.gke[0].network)) - 1) + project = var.project_id + + depends_on = [ + google_project_service.apis["compute.googleapis.com"] + ] +} diff --git a/mp-pkg/terraform/marketplace_test.tfvars b/mp-pkg/terraform/marketplace_test.tfvars new file mode 100644 index 0000000..5fcdb10 --- /dev/null +++ b/mp-pkg/terraform/marketplace_test.tfvars @@ -0,0 +1,31 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# The marketplace_test.tfvars file is used to validate the Terraform template. +# Marketplace will validate your product with this file as its `--var-file` +# argument. +# +# Do not include the following variables in marketplace_test.tfvars, as they +# will be provided by Marketplace: +# +# - project_id +# - helm_chart_repo +# - helm_chart_name +# - helm_chart_version +# - Any variables declared in schema.yaml + +goog_cm_deployment_name = "datacommons-test" +create_new_cluster = true +region = "us-central1" +dc_api_key = "test-api-key-placeholder" diff --git a/mp-pkg/terraform/metadata.display.yaml b/mp-pkg/terraform/metadata.display.yaml new file mode 100644 index 0000000..2f256b4 --- /dev/null +++ b/mp-pkg/terraform/metadata.display.yaml @@ -0,0 +1,164 @@ +apiVersion: blueprints.cloud.google.com/v1alpha1 +kind: BlueprintMetadata +metadata: + name: datacommons-marketplace-display + annotations: + config.kubernetes.io/local-config: "true" +spec: + info: + title: Data Commons Accelerator + source: {} + ui: + input: + variables: + goog_cm_deployment_name: + name: goog_cm_deployment_name + title: Deployment Name + tooltip: Name for this deployment (2-18 characters) + placeholder: dc + validation: ^[a-z][a-z0-9-]{0,16}[a-z0-9]$ + create_new_cluster: + name: create_new_cluster + title: Create New GKE Cluster + invisible: true + # To enable Create/Existing cluster toggle, uncomment below and remove invisible: + # tooltip: Choose whether to create a new GKE Autopilot cluster or use an existing one. + # xGoogleProperty: + # type: ET_CREATE_RESOURCE + region: + name: region + title: GCP Region + tooltip: GCP region where the cluster and cloud resources will be created. + xGoogleProperty: + type: ET_GCE_REGION + gke_cluster_name: + name: gke_cluster_name + title: GKE Cluster Name + invisible: true + # To enable cluster picker, uncomment below and remove invisible: + # tooltip: Select your existing GKE cluster for deployment + # section: cluster + # xGoogleProperty: + # type: ET_GKE_CLUSTER + # gkeCluster: + # locationVariable: gke_cluster_location + # clusterCreationVariable: create_new_cluster + resource_tier: + name: resource_tier + title: Resource Tier + tooltip: | + Resource allocation for the deployment. Controls both application pod sizing and database tier. + Small: App 2Gi/1CPU, DB Standard-1 (dev/test) + Medium: App 4Gi/2CPU, DB Standard-2 + Large: App 8Gi/4CPU, DB Standard-4 + section: application + enumValueLabels: + - label: Small - 2Gi RAM, 1 CPU + value: small + - label: Medium - 4Gi RAM, 2 CPU (recommended) + value: medium + - label: Large - 8Gi RAM, 4 CPU + value: large + flask_env: + name: flask_env + title: Domain Template + tooltip: | + Select a pre-built Data Commons configuration optimized for a specific domain. + Each domain includes curated datasets, statistical variables, and visualizations tailored to that subject area. + Select the domain that best matches your use case. + placeholder: Choose from the list + section: application + enumValueLabels: + - label: Education + value: education + - label: Health + value: health + - label: Energy + value: energy + dc_api_key: + name: dc_api_key + title: Data Commons API Key + tooltip: | + API key for Data Commons API access. + Get yours at: https://docs.datacommons.org/custom_dc/quickstart.html#get-a-data-commons-api-key + placeholder: "..." + section: api + validation: ^[A-Za-z0-9_-]+$ + gke_cluster_location: + name: gke_cluster_location + title: GKE Cluster Location + invisible: true + # To enable location picker, uncomment below and remove invisible: + # xGoogleProperty: + # type: ET_GCE_LOCATION + project_id: + name: project_id + title: Project ID + invisible: true + create_namespace: + name: create_namespace + title: Create Namespace + invisible: true + namespace: + name: namespace + title: Kubernetes Namespace + invisible: true + app_replicas: + name: app_replicas + title: Application Replicas + invisible: true + enable_natural_language: + name: enable_natural_language + title: Enable Natural Language Queries + invisible: true + enable_data_sync: + name: enable_data_sync + title: Enable Custom Data Sync + invisible: true + cloudsql_instance_name_override: + name: cloudsql_instance_name_override + title: CloudSQL Instance Name Override + invisible: true + gcs_bucket_name_override: + name: gcs_bucket_name_override + title: GCS Bucket Name Override + invisible: true + service_account_name_override: + name: service_account_name_override + title: Service Account Name Override + invisible: true + cdc_services_image_repo: + name: cdc_services_image_repo + title: CDC Services Image Repository + invisible: true + cdc_services_image_tag: + name: cdc_services_image_tag + title: CDC Services Image Tag + invisible: true + data_image_repo: + name: data_image_repo + title: Data Image Repository + invisible: true + data_image_tag: + name: data_image_tag + title: Data Image Tag + invisible: true + helm_chart_repo: + name: helm_chart_repo + title: Helm Chart Repository + invisible: true + helm_chart_name: + name: helm_chart_name + title: Helm Chart Name + invisible: true + helm_chart_version: + name: helm_chart_version + title: Helm Chart Version + invisible: true + sections: + - name: application + title: Application Settings + tooltip: Data Commons Accelerator application configuration and resource allocation + - name: api + title: API Keys + tooltip: API keys required for Data Commons and Google Maps integration diff --git a/mp-pkg/terraform/metadata.yaml b/mp-pkg/terraform/metadata.yaml new file mode 100644 index 0000000..f48a0a8 --- /dev/null +++ b/mp-pkg/terraform/metadata.yaml @@ -0,0 +1,141 @@ +apiVersion: blueprints.cloud.google.com/v1alpha1 +kind: BlueprintMetadata +metadata: + name: datacommons-marketplace + annotations: + config.kubernetes.io/local-config: "true" +spec: + info: + title: Data Commons Accelerator + source: {} + actuationTool: + flavor: Terraform + version: ">= 1.5.7" + description: {} + content: + subBlueprints: + - name: cloudsql + location: modules/cloudsql + - name: gcs-bucket + location: modules/gcs-bucket + - name: k8s-secrets + location: modules/k8s-secrets + - name: maps-api-keys + location: modules/maps-api-keys + interfaces: + variables: + - name: goog_cm_deployment_name + description: Deployment name for the Data Commons Accelerator solution (used by GCP Marketplace for tracking and avoiding resource name collisions) + varType: string + required: true + - name: project_id + description: GCP project ID where Data Commons Accelerator will be deployed + varType: string + required: true + - name: create_new_cluster + description: Create a new GKE cluster with VPC networking. + varType: bool + defaultValue: true + - name: region + description: GCP region for new cluster and resources (e.g., us-central1). Only used when create_new_cluster is true. + varType: string + defaultValue: us-central1 + - name: gke_cluster_name + description: Name of an existing GKE cluster. Only used when create_new_cluster is false. + varType: string + defaultValue: "" + - name: gke_cluster_location + description: Location (region or zone) of the existing GKE cluster. Only used when create_new_cluster is false. + varType: string + defaultValue: "" + - name: namespace + description: Kubernetes namespace for Data Commons Accelerator deployment. Defaults to the deployment name (goog_cm_deployment_name) if not provided. + varType: string + defaultValue: "" + - name: create_namespace + description: Create new Kubernetes namespace. Set to false if namespace already exists in the cluster. + varType: bool + defaultValue: true + - name: cloudsql_instance_name_override + description: Override CloudSQL instance name (uses generated name with random suffix if not specified) + varType: string + defaultValue: "" + - name: gcs_bucket_name_override + description: Override GCS bucket name (uses generated name with random suffix if not specified) + varType: string + defaultValue: "" + - name: service_account_name_override + description: Override GCP service account name (uses generated name with random suffix if not specified) + varType: string + defaultValue: "" + - name: dc_api_key + description: Data Commons API key for accessing Data Commons APIs + varType: string + required: true + - name: app_replicas + description: Number of replicas for the Data Commons Accelerator application deployment + varType: number + defaultValue: 1 + - name: resource_tier + description: Resource allocation tier controlling application pod resources and CloudSQL database sizing (small, medium, large) + varType: string + defaultValue: medium + - name: enable_natural_language + description: Enable natural language query features + varType: bool + defaultValue: true + - name: enable_data_sync + description: Enable automatic synchronization of custom data from GCS bucket to CloudSQL database + varType: bool + defaultValue: true + - name: flask_env + description: Data Commons domain template (pre-built configurations for specific domains) + varType: string + defaultValue: health + - name: cdc_services_image_repo + description: Container image repository for CDC Services (populated by GCP Marketplace) + varType: string + defaultValue: "" + - name: cdc_services_image_tag + description: Container image tag for CDC Services (populated by GCP Marketplace) + varType: string + defaultValue: "" + - name: data_image_repo + description: Container image repository for Data service (populated by GCP Marketplace) + varType: string + defaultValue: "" + - name: data_image_tag + description: Container image tag for Data service (populated by GCP Marketplace) + varType: string + defaultValue: "" + - name: helm_chart_repo + description: Helm chart repository URL (populated by GCP Marketplace) + varType: string + defaultValue: "" + - name: helm_chart_name + description: Helm chart name (populated by GCP Marketplace) + varType: string + defaultValue: datacommons + - name: helm_chart_version + description: Helm chart version (populated by GCP Marketplace) + varType: string + defaultValue: "" + outputs: + - name: cloud_shell_access + description: "Cloud Shell quick access: GKE Console > cluster > Connect > Run in Cloud Shell, then run the port-forward command" + - name: gcs_bucket_url + description: GCS bucket URL (gs://) + - name: kubectl_configure + description: Command to configure kubectl for your GKE cluster + - name: namespace + description: Kubernetes namespace where DataCommons is deployed + - name: port_forward + description: Port-forward command to access Data Commons locally (with auto-retry) + - name: retrieve_admin_credentials + description: Commands to retrieve admin panel credentials (username and password) + - name: upload_data + description: Command to upload custom data to GCS bucket + - name: verify_pods + description: Command to verify Data Commons pods are running + - name: view_logs + description: Command to view application logs diff --git a/mp-pkg/terraform/modules/cloudsql/main.tf b/mp-pkg/terraform/modules/cloudsql/main.tf new file mode 100644 index 0000000..fd2e20c --- /dev/null +++ b/mp-pkg/terraform/modules/cloudsql/main.tf @@ -0,0 +1,127 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# CloudSQL MySQL Instance +# ============================================ +resource "google_sql_database_instance" "instance" { + project = var.project_id + name = var.instance_name + region = var.region + database_version = var.database_version + + settings { + tier = var.tier + availability_type = var.availability_type + + # Storage configuration + disk_size = var.disk_size + disk_type = var.disk_type + disk_autoresize = var.disk_autoresize + disk_autoresize_limit = var.disk_autoresize_limit + + # Backup configuration + backup_configuration { + enabled = true + start_time = "03:00" + binary_log_enabled = var.point_in_time_recovery_enabled + location = var.backup_location + point_in_time_recovery_enabled = var.point_in_time_recovery_enabled + transaction_log_retention_days = 7 + backup_retention_settings { + retained_backups = 7 + retention_unit = "COUNT" + } + } + + # Network configuration - Private IP only + ip_configuration { + ipv4_enabled = false + private_network = var.network_self_link + ssl_mode = "ENCRYPTED_ONLY" + allocated_ip_range = var.allocated_ip_range + enable_private_path_for_google_cloud_services = true + } + + # Maintenance window + maintenance_window { + day = 7 # Sunday + hour = 3 # 3 AM + update_track = "stable" + } + + # Database flags (MySQL specific) + database_flags { + name = "local_infile" + value = "off" + } + + # Query Insights + insights_config { + query_insights_enabled = true + query_plans_per_minute = 5 + query_string_length = 1024 + record_application_tags = true + record_client_address = true + } + + # User labels + user_labels = var.labels + } + + deletion_protection = var.deletion_protection + + # Prevent disk size downgrade after autoresize + lifecycle { + ignore_changes = [ + settings[0].disk_size + ] + } +} + +# ============================================ +# Database Creation +# ============================================ +resource "google_sql_database" "database" { + project = var.project_id + instance = google_sql_database_instance.instance.name + name = var.database_name + charset = "utf8mb4" + collation = "utf8mb4_unicode_ci" +} + +# ============================================ +# Password Management +# ============================================ +# Generate random password +resource "random_password" "db_password" { + length = 16 + special = true + # Ensure password meets MySQL requirements + min_lower = 1 + min_upper = 1 + min_numeric = 1 + min_special = 1 +} + +# Create database user with generated password +resource "google_sql_user" "user" { + project = var.project_id + instance = google_sql_database_instance.instance.name + name = var.user_name + host = "%" + password = random_password.db_password.result + + depends_on = [google_sql_database_instance.instance] +} diff --git a/mp-pkg/terraform/modules/cloudsql/outputs.tf b/mp-pkg/terraform/modules/cloudsql/outputs.tf new file mode 100644 index 0000000..5184ee1 --- /dev/null +++ b/mp-pkg/terraform/modules/cloudsql/outputs.tf @@ -0,0 +1,107 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Outputs +# ============================================ + +# ============================================ +# Instance Information +# ============================================ +output "instance_name" { + description = "CloudSQL instance name" + value = google_sql_database_instance.instance.name +} + +output "instance_connection_name" { + description = "CloudSQL instance connection name (project:region:instance) for Cloud SQL Auth Proxy" + value = google_sql_database_instance.instance.connection_name +} + +output "instance_self_link" { + description = "CloudSQL instance self link" + value = google_sql_database_instance.instance.self_link +} + +output "instance_service_account_email" { + description = "CloudSQL instance service account email" + value = google_sql_database_instance.instance.service_account_email_address +} + +# ============================================ +# Network Information +# ============================================ +output "private_ip_address" { + description = "Private IP address for direct connection from GKE (within VPC)" + value = google_sql_database_instance.instance.private_ip_address +} + +# ============================================ +# Database Information +# ============================================ +output "database_name" { + description = "Name of the created database" + value = var.database_name +} + +# ============================================ +# User Credentials +# ============================================ +output "user_name" { + description = "Database user name" + value = var.user_name +} + +output "user_password" { + description = "Database user password (auto-generated)" + value = random_password.db_password.result + sensitive = true +} + +# ============================================ +# Connection Information +# ============================================ +output "connection_string" { + description = "MySQL connection string for direct private IP connection" + value = "mysql://${var.user_name}@${google_sql_database_instance.instance.private_ip_address}:3306/${var.database_name}" + sensitive = true +} + +output "jdbc_connection_string" { + description = "JDBC connection string for applications" + value = "jdbc:mysql://${google_sql_database_instance.instance.private_ip_address}:3306/${var.database_name}?useSSL=true&requireSSL=true" +} + +# ============================================ +# Kubernetes Configuration +# ============================================ +output "k8s_env_vars" { + description = "Environment variables for Kubernetes deployments" + value = { + CLOUDSQL_INSTANCE = google_sql_database_instance.instance.connection_name + DB_HOST = google_sql_database_instance.instance.private_ip_address + DB_PORT = "3306" + DB_NAME = var.database_name + DB_USER = var.user_name + USE_CLOUDSQL = "true" + } +} + +output "k8s_secret_data" { + description = "Secret data for Kubernetes Secret resource (use with k8s-secrets module)" + value = { + DB_PASSWORD = random_password.db_password.result + } + sensitive = true +} diff --git a/mp-pkg/terraform/modules/cloudsql/variables.tf b/mp-pkg/terraform/modules/cloudsql/variables.tf new file mode 100644 index 0000000..8fb8259 --- /dev/null +++ b/mp-pkg/terraform/modules/cloudsql/variables.tf @@ -0,0 +1,170 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Variables +# ============================================ +variable "project_id" { + description = "The GCP project ID" + type = string +} + +variable "region" { + description = "The GCP region for the CloudSQL instance" + type = string +} + +variable "instance_name" { + description = "CloudSQL instance name (must be unique within project)" + type = string +} + +variable "network_self_link" { + description = "VPC network self link for private IP connection (from GKE cluster VPC)" + type = string +} + +variable "allocated_ip_range" { + description = "Name of the allocated IP range for Private Service Access (PSA)" + type = string +} + +# ============================================ +# Instance Configuration +# ============================================ +variable "database_version" { + description = "MySQL version (MYSQL_8_0, MYSQL_8_0_36, etc.)" + type = string + default = "MYSQL_8_0" +} + +variable "tier" { + description = "Machine tier (e.g., db-f1-micro, db-n1-standard-1, db-custom-2-7680)" + type = string + default = "db-n1-standard-1" +} + +variable "zone" { + description = "Primary zone for the CloudSQL instance" + type = string + default = null +} + +variable "secondary_zone" { + description = "Secondary zone for high availability (required if availability_type is REGIONAL)" + type = string + default = null +} + +variable "availability_type" { + description = "Availability type: ZONAL or REGIONAL (REGIONAL provides high availability)" + type = string + default = "ZONAL" + + validation { + condition = contains(["ZONAL", "REGIONAL"], var.availability_type) + error_message = "Availability type must be either ZONAL or REGIONAL." + } +} + +# ============================================ +# Storage Configuration +# ============================================ +variable "disk_size" { + description = "Disk size in GB (minimum 10 GB for MySQL)" + type = number + default = 20 + + validation { + condition = var.disk_size >= 10 + error_message = "Disk size must be at least 10 GB for MySQL." + } +} + +variable "disk_type" { + description = "Disk type: PD_SSD (recommended for production) or PD_HDD" + type = string + default = "PD_SSD" + + validation { + condition = contains(["PD_SSD", "PD_HDD"], var.disk_type) + error_message = "Disk type must be either PD_SSD or PD_HDD." + } +} + +variable "disk_autoresize" { + description = "Enable automatic disk size increase" + type = bool + default = true +} + +variable "disk_autoresize_limit" { + description = "Maximum disk size in GB for autoresize (0 = unlimited)" + type = number + default = 0 +} + +# ============================================ +# Backup Configuration +# ============================================ +variable "backup_location" { + description = "Backup location (defaults to instance region if not specified)" + type = string + default = null +} + +variable "point_in_time_recovery_enabled" { + description = "Enable point-in-time recovery (requires binary logging)" + type = bool + default = false +} + +# ============================================ +# Database and User Configuration +# ============================================ +variable "database_name" { + description = "Name of the database to create" + type = string + default = "datacommons" +} + +variable "user_name" { + description = "Name of the database user to create" + type = string + default = "datacommons" +} + +# ============================================ +# Security Configuration +# ============================================ +variable "deletion_protection" { + description = "Terraform deletion protection (prevents accidental terraform destroy)" + type = bool + default = false +} + +variable "deletion_protection_enabled" { + description = "GCP deletion protection (prevents deletion via console/API)" + type = bool + default = false +} + +# ============================================ +# Labels +# ============================================ +variable "labels" { + description = "Resource labels/tags for organization and cost tracking" + type = map(string) + default = {} +} diff --git a/mp-pkg/terraform/modules/cloudsql/versions.tf b/mp-pkg/terraform/modules/cloudsql/versions.tf new file mode 100644 index 0000000..efad6fe --- /dev/null +++ b/mp-pkg/terraform/modules/cloudsql/versions.tf @@ -0,0 +1,32 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +terraform { + required_version = ">= 1.5.7" + + required_providers { + google = { + source = "hashicorp/google" + version = ">= 7.0.0, < 8.0.0" + } + google-beta = { + source = "hashicorp/google-beta" + version = ">= 7.0.0, < 8.0.0" + } + random = { + source = "hashicorp/random" + version = ">= 3.0.0" + } + } +} diff --git a/mp-pkg/terraform/modules/gcs-bucket/main.tf b/mp-pkg/terraform/modules/gcs-bucket/main.tf new file mode 100644 index 0000000..5d087a6 --- /dev/null +++ b/mp-pkg/terraform/modules/gcs-bucket/main.tf @@ -0,0 +1,92 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# GCS Bucket Resource +# Creates the Cloud Storage bucket with security and lifecycle settings +# ============================================ +resource "google_storage_bucket" "bucket" { + name = var.bucket_name + project = var.project_id + location = var.location + storage_class = var.storage_class + force_destroy = var.force_destroy + labels = var.labels + + # Security: Uniform bucket-level access (replaces legacy object ACLs) + # This enforces IAM-only access control, improving security posture + uniform_bucket_level_access = true + + # Versioning: Protect against accidental overwrites and deletions + # Conditionally enabled based on var.versioning_enabled + dynamic "versioning" { + for_each = var.versioning_enabled ? [1] : [] + content { + enabled = true + } + } + + # Encryption: Customer-managed encryption keys (CMEK) via Cloud KMS + # If encryption_key_name is null, Google-managed keys are used automatically + dynamic "encryption" { + for_each = var.encryption_key_name != null ? [1] : [] + content { + default_kms_key_name = var.encryption_key_name + } + } + + # Lifecycle Rules: Automated object lifecycle management + # Supports transitions, deletions, and versioning actions + dynamic "lifecycle_rule" { + for_each = var.lifecycle_rules + content { + # Action block: What to do with matching objects + action { + type = lifecycle_rule.value.action.type + storage_class = lookup(lifecycle_rule.value.action, "storage_class", null) + } + + # Condition block: When to apply this rule + condition { + age = lookup(lifecycle_rule.value.condition, "age", null) + created_before = lookup(lifecycle_rule.value.condition, "created_before", null) + with_state = lookup(lifecycle_rule.value.condition, "with_state", null) + matches_storage_class = lookup(lifecycle_rule.value.condition, "matches_storage_class", null) + matches_prefix = lookup(lifecycle_rule.value.condition, "matches_prefix", null) + matches_suffix = lookup(lifecycle_rule.value.condition, "matches_suffix", null) + num_newer_versions = lookup(lifecycle_rule.value.condition, "num_newer_versions", null) + custom_time_before = lookup(lifecycle_rule.value.condition, "custom_time_before", null) + days_since_custom_time = lookup(lifecycle_rule.value.condition, "days_since_custom_time", null) + days_since_noncurrent_time = lookup(lifecycle_rule.value.condition, "days_since_noncurrent_time", null) + noncurrent_time_before = lookup(lifecycle_rule.value.condition, "noncurrent_time_before", null) + } + } + } +} + +# ============================================ +# IAM Bindings +# Grant access to bucket using IAM roles +# Each member is bound independently for safe incremental additions +# ============================================ +resource "google_storage_bucket_iam_member" "members" { + for_each = { + for idx, member in var.iam_members : + "${member.role}-${member.member}" => member + } + + bucket = google_storage_bucket.bucket.name + role = each.value.role + member = each.value.member +} diff --git a/mp-pkg/terraform/modules/gcs-bucket/outputs.tf b/mp-pkg/terraform/modules/gcs-bucket/outputs.tf new file mode 100644 index 0000000..dc03fa6 --- /dev/null +++ b/mp-pkg/terraform/modules/gcs-bucket/outputs.tf @@ -0,0 +1,37 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# GCS Bucket Outputs +# ============================================ + +output "bucket_name" { + description = "Name of the created GCS bucket" + value = google_storage_bucket.bucket.name +} + +output "bucket_url" { + description = "GCS bucket URL (gs://)" + value = google_storage_bucket.bucket.url +} + +output "bucket_self_link" { + description = "GCS bucket self link (full resource URI)" + value = google_storage_bucket.bucket.self_link +} + +output "bucket" { + description = "Full bucket resource object" + value = google_storage_bucket.bucket +} diff --git a/mp-pkg/terraform/modules/gcs-bucket/variables.tf b/mp-pkg/terraform/modules/gcs-bucket/variables.tf new file mode 100644 index 0000000..9c84356 --- /dev/null +++ b/mp-pkg/terraform/modules/gcs-bucket/variables.tf @@ -0,0 +1,107 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# DataCommons GCS Bucket Module Variables +# ============================================ + +# ============================================ +# Required Variables +# ============================================ +variable "project_id" { + description = "The GCP project ID" + type = string +} + +variable "bucket_name" { + description = "Name of the GCS bucket (must be globally unique)" + type = string +} + +# ============================================ +# Location Configuration +# ============================================ +variable "location" { + description = "Bucket location (e.g., US, EU, ASIA, or specific region like us-central1)" + type = string + default = "US" +} + +variable "storage_class" { + description = "Storage class: STANDARD, NEARLINE, COLDLINE, or ARCHIVE" + type = string + default = "STANDARD" + + validation { + condition = contains(["STANDARD", "NEARLINE", "COLDLINE", "ARCHIVE"], var.storage_class) + error_message = "Storage class must be STANDARD, NEARLINE, COLDLINE, or ARCHIVE." + } +} + +# ============================================ +# Versioning Configuration +# ============================================ +variable "versioning_enabled" { + description = "Enable object versioning (recommended for data protection)" + type = bool + default = true +} + +# ============================================ +# Lifecycle Rules +# ============================================ +variable "lifecycle_rules" { + description = "Lifecycle rules for object management (passed directly to underlying module)" + type = any + default = [] +} + +# ============================================ +# IAM Configuration +# ============================================ +variable "iam_members" { + description = "IAM members and their roles for bucket access" + type = list(object({ + role = string + member = string + })) + default = [] +} + +# ============================================ +# Encryption Configuration +# ============================================ +variable "encryption_key_name" { + description = "Cloud KMS key name for customer-managed encryption (optional, uses Google-managed keys by default)" + type = string + default = null +} + +# ============================================ +# Deletion Protection +# ============================================ +variable "force_destroy" { + description = "Allow bucket deletion even if it contains objects (use with caution)" + type = bool + default = false +} + +# ============================================ +# Labels +# ============================================ +variable "labels" { + description = "Resource labels/tags for organization and cost tracking" + type = map(string) + default = {} +} diff --git a/mp-pkg/terraform/modules/gcs-bucket/versions.tf b/mp-pkg/terraform/modules/gcs-bucket/versions.tf new file mode 100644 index 0000000..43edb22 --- /dev/null +++ b/mp-pkg/terraform/modules/gcs-bucket/versions.tf @@ -0,0 +1,24 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +terraform { + required_version = ">= 1.5.7" + + required_providers { + google = { + source = "hashicorp/google" + version = ">= 7.0.0, < 8.0.0" + } + } +} diff --git a/mp-pkg/terraform/modules/k8s-secrets/main.tf b/mp-pkg/terraform/modules/k8s-secrets/main.tf new file mode 100644 index 0000000..48df4ed --- /dev/null +++ b/mp-pkg/terraform/modules/k8s-secrets/main.tf @@ -0,0 +1,30 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Kubernetes Secrets +# ============================================ + +resource "kubernetes_secret_v1" "secrets" { + for_each = var.secrets + + metadata { + name = each.key + namespace = var.namespace + labels = merge(var.labels, each.value.labels) + } + + data = each.value.data + type = var.secret_type +} diff --git a/mp-pkg/terraform/modules/k8s-secrets/outputs.tf b/mp-pkg/terraform/modules/k8s-secrets/outputs.tf new file mode 100644 index 0000000..2878f9c --- /dev/null +++ b/mp-pkg/terraform/modules/k8s-secrets/outputs.tf @@ -0,0 +1,32 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Kubernetes Secrets Outputs +# ============================================ + +output "secret_names" { + description = "Map of secret keys to their created names in Kubernetes" + value = { for k, v in kubernetes_secret_v1.secrets : k => v.metadata[0].name } +} + +output "secret_namespaces" { + description = "Map of secret keys to their namespaces" + value = { for k, v in kubernetes_secret_v1.secrets : k => v.metadata[0].namespace } +} + +output "secret_ids" { + description = "Map of secret keys to their full Kubernetes resource IDs (namespace/name)" + value = { for k, v in kubernetes_secret_v1.secrets : k => "${v.metadata[0].namespace}/${v.metadata[0].name}" } +} diff --git a/mp-pkg/terraform/modules/k8s-secrets/variables.tf b/mp-pkg/terraform/modules/k8s-secrets/variables.tf new file mode 100644 index 0000000..134dbeb --- /dev/null +++ b/mp-pkg/terraform/modules/k8s-secrets/variables.tf @@ -0,0 +1,75 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Kubernetes Secrets +# ============================================ +variable "namespace" { + description = "Kubernetes namespace where secrets will be created" + type = string +} + +variable "secrets" { + description = <<-EOT + Map of secrets to create. Each secret can have: + - data: Map of key-value pairs (values will be base64 encoded by Kubernetes) + - labels: Optional additional labels for this specific secret + + Example: + { + "db-credentials" = { + data = { + DB_PASSWORD = "secret-value" + DB_USER = "myuser" + } + labels = { + component = "database" + } + } + } + EOT + type = map(object({ + data = map(string) + labels = optional(map(string), {}) + })) +} + +# ============================================ +# Optional Variables +# ============================================ +variable "secret_type" { + description = "Kubernetes secret type (Opaque, kubernetes.io/tls, etc.)" + type = string + default = "Opaque" + + validation { + condition = contains([ + "Opaque", + "kubernetes.io/service-account-token", + "kubernetes.io/dockercfg", + "kubernetes.io/dockerconfigjson", + "kubernetes.io/basic-auth", + "kubernetes.io/ssh-auth", + "kubernetes.io/tls", + "bootstrap.kubernetes.io/token" + ], var.secret_type) + error_message = "Secret type must be a valid Kubernetes secret type." + } +} + +variable "labels" { + description = "Common labels to apply to all secrets" + type = map(string) + default = {} +} diff --git a/mp-pkg/terraform/modules/k8s-secrets/versions.tf b/mp-pkg/terraform/modules/k8s-secrets/versions.tf new file mode 100644 index 0000000..3b1b330 --- /dev/null +++ b/mp-pkg/terraform/modules/k8s-secrets/versions.tf @@ -0,0 +1,24 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +terraform { + required_version = ">= 1.5.7" + + required_providers { + kubernetes = { + source = "hashicorp/kubernetes" + version = ">= 2.0.0" + } + } +} diff --git a/mp-pkg/terraform/modules/maps-api-keys/main.tf b/mp-pkg/terraform/modules/maps-api-keys/main.tf new file mode 100644 index 0000000..d04ea6f --- /dev/null +++ b/mp-pkg/terraform/modules/maps-api-keys/main.tf @@ -0,0 +1,70 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# API Keys +# ============================================ + +# Random suffix for unique naming +resource "random_id" "key_suffix" { + byte_length = 4 +} + +# ============================================ +# Google Maps API Key +# ============================================ +resource "google_apikeys_key" "maps_key" { + project = var.project_id + name = "${var.name_prefix}-key-${random_id.key_suffix.hex}" + display_name = "${var.name_prefix} Maps API Key" + + restrictions { + dynamic "api_targets" { + for_each = var.api_targets + content { + service = api_targets.value + } + } + + # Optional browser key restrictions + dynamic "browser_key_restrictions" { + for_each = length(var.allowed_referrers) > 0 ? [1] : [] + content { + allowed_referrers = var.allowed_referrers + } + } + + # Optional Android app restrictions + dynamic "android_key_restrictions" { + for_each = length(var.allowed_android_applications) > 0 ? [1] : [] + content { + dynamic "allowed_applications" { + for_each = var.allowed_android_applications + content { + package_name = allowed_applications.value.package_name + sha1_fingerprint = allowed_applications.value.sha1_fingerprint + } + } + } + } + + # Optional iOS app restrictions + dynamic "ios_key_restrictions" { + for_each = length(var.allowed_ios_bundle_ids) > 0 ? [1] : [] + content { + allowed_bundle_ids = var.allowed_ios_bundle_ids + } + } + } +} diff --git a/mp-pkg/terraform/modules/maps-api-keys/outputs.tf b/mp-pkg/terraform/modules/maps-api-keys/outputs.tf new file mode 100644 index 0000000..d217e11 --- /dev/null +++ b/mp-pkg/terraform/modules/maps-api-keys/outputs.tf @@ -0,0 +1,49 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# API Keys Outputs +# ============================================ + +output "api_key_id" { + description = "The unique identifier of the API key resource" + value = google_apikeys_key.maps_key.id +} + +output "api_key_name" { + description = "The resource name of the API key" + value = google_apikeys_key.maps_key.name +} + +output "api_key_uid" { + description = "The unique ID assigned to the API key by Google" + value = google_apikeys_key.maps_key.uid +} + +output "api_key" { + description = "The actual API key value" + value = google_apikeys_key.maps_key.key_string + sensitive = true +} + +# ============================================ +# Kubernetes Secret Data +# ============================================ +output "k8s_secret_data" { + description = "Secret data for Kubernetes Secret resource" + value = { + MAPS_API_KEY = google_apikeys_key.maps_key.key_string + } + sensitive = true +} diff --git a/mp-pkg/terraform/modules/maps-api-keys/variables.tf b/mp-pkg/terraform/modules/maps-api-keys/variables.tf new file mode 100644 index 0000000..683d7bc --- /dev/null +++ b/mp-pkg/terraform/modules/maps-api-keys/variables.tf @@ -0,0 +1,75 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# API Keys Variables +# ============================================ + +# ============================================ +# Required Variables +# ============================================ +variable "project_id" { + description = "The GCP project ID" + type = string +} + +variable "name_prefix" { + description = "Prefix for API key naming (e.g., 'datacommons-prod')" + type = string +} + +# ============================================ +# API Configuration +# ============================================ +variable "api_targets" { + description = "List of Google API services this key can access" + type = list(string) + default = [ + "maps-backend.googleapis.com", + "places-backend.googleapis.com" + ] +} + +# ============================================ +# Application Restrictions +# ============================================ +variable "allowed_referrers" { + description = "List of allowed HTTP referrers for browser applications (e.g., ['https://example.com/*'])" + type = list(string) + default = [] +} + +variable "allowed_android_applications" { + description = "List of allowed Android applications (package_name and sha1_fingerprint)" + type = list(object({ + package_name = string + sha1_fingerprint = string + })) + default = [] +} + +variable "allowed_ios_bundle_ids" { + description = "List of allowed iOS bundle IDs" + type = list(string) + default = [] +} + +# ============================================ +# Labels +# ============================================ +variable "labels" { + description = "Resource labels/tags for organization and cost tracking" + type = map(string) + default = {} +} diff --git a/mp-pkg/terraform/modules/maps-api-keys/versions.tf b/mp-pkg/terraform/modules/maps-api-keys/versions.tf new file mode 100644 index 0000000..d360c68 --- /dev/null +++ b/mp-pkg/terraform/modules/maps-api-keys/versions.tf @@ -0,0 +1,28 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +terraform { + required_version = ">= 1.5.7" + + required_providers { + google = { + source = "hashicorp/google" + version = ">= 7.0.0, < 8.0.0" + } + random = { + source = "hashicorp/random" + version = ">= 3.0.0" + } + } +} diff --git a/mp-pkg/terraform/nat.tf b/mp-pkg/terraform/nat.tf new file mode 100644 index 0000000..d0c4850 --- /dev/null +++ b/mp-pkg/terraform/nat.tf @@ -0,0 +1,54 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Cloud Router + Cloud NAT +# ============================================ + +resource "google_compute_router" "nat_router" { + count = var.create_new_cluster ? 1 : 0 + + provider = google + + name = "${local.deployment_name}-router" + region = local.region + network = google_compute_network.vpc[0].id + project = var.project_id + + depends_on = [ + google_project_service.apis["compute.googleapis.com"] + ] +} + +resource "google_compute_router_nat" "nat" { + count = var.create_new_cluster ? 1 : 0 + + provider = google + + name = "${local.deployment_name}-nat" + router = google_compute_router.nat_router[0].name + region = local.region + project = var.project_id + nat_ip_allocate_option = "AUTO_ONLY" + source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES" + + log_config { + enable = false + filter = "ERRORS_ONLY" + } + + depends_on = [ + google_compute_subnetwork.primary + ] +} diff --git a/mp-pkg/terraform/outputs.tf b/mp-pkg/terraform/outputs.tf new file mode 100644 index 0000000..ab1e6f0 --- /dev/null +++ b/mp-pkg/terraform/outputs.tf @@ -0,0 +1,68 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Namespace +# ============================================ +output "namespace" { + description = "Kubernetes namespace where DataCommons is deployed" + value = local.namespace_name +} + +# ============================================ +# GCS Bucket +# ============================================ +output "gcs_bucket_url" { + description = "GCS bucket URL (gs://)" + value = module.gcs_bucket.bucket_url +} + +# ============================================ +# Access Commands +# ============================================ +output "kubectl_configure" { + description = "Command to configure kubectl for your GKE cluster" + value = "gcloud container clusters get-credentials ${local.cluster_name} --location=${local.cluster_location} --project=${var.project_id}" +} + +output "verify_pods" { + description = "Command to verify Data Commons pods are running" + value = "kubectl get pods -n ${local.namespace_name}" +} + +output "port_forward" { + description = "Port-forward command to access Data Commons locally (with auto-retry)" + value = "until kubectl port-forward -n ${local.namespace_name} svc/datacommons 8080:8080; do echo 'Port-forward crashed. Respawning...' >&2; sleep 1; done" +} + +output "cloud_shell_access" { + description = "Cloud Shell quick access: GKE Console > cluster > Connect > Run in Cloud Shell, then run the port-forward command" + value = "GKE Console > ${local.cluster_name} > Connect > Run in Cloud Shell, then run: until kubectl port-forward -n ${local.namespace_name} svc/datacommons 8080:8080; do echo 'Respawning...' >&2; sleep 1; done — then click 'Web Preview' > 'Preview on port 8080'" +} + +output "upload_data" { + description = "Command to upload custom data to GCS bucket" + value = "gsutil cp -r /path/to/your/data gs://${module.gcs_bucket.bucket_name}/input" +} + +output "view_logs" { + description = "Command to view application logs" + value = "kubectl logs -n ${local.namespace_name} -l app=datacommons --tail=100 -f" +} + +output "retrieve_admin_credentials" { + description = "Commands to retrieve admin panel credentials (username and password)" + value = "echo 'Admin Username:' && kubectl get secret datacommons -n ${local.namespace_name} -o jsonpath='{.data.ADMIN_PANEL_USERNAME}' | base64 -d && echo && echo 'Admin Password:' && kubectl get secret datacommons -n ${local.namespace_name} -o jsonpath='{.data.ADMIN_PANEL_PASSWORD}' | base64 -d && echo" +} + diff --git a/mp-pkg/terraform/schema.yaml b/mp-pkg/terraform/schema.yaml new file mode 100644 index 0000000..978da5b --- /dev/null +++ b/mp-pkg/terraform/schema.yaml @@ -0,0 +1,31 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# GCP Marketplace Image Mapping Schema +# ============================================ + +images: + cdc-services: + variables: + cdc_services_image_repo: + type: REPO_WITH_REGISTRY_WITH_NAME + cdc_services_image_tag: + type: TAG + data: + variables: + data_image_repo: + type: REPO_WITH_REGISTRY_WITH_NAME + data_image_tag: + type: TAG diff --git a/mp-pkg/terraform/secrets.tf b/mp-pkg/terraform/secrets.tf new file mode 100644 index 0000000..d6b2cd2 --- /dev/null +++ b/mp-pkg/terraform/secrets.tf @@ -0,0 +1,89 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Kubernetes Secrets Configuration +# ============================================ +resource "kubernetes_namespace_v1" "datacommons" { + count = var.create_namespace ? 1 : 0 + + metadata { + name = local.namespace + + labels = merge( + local.common_labels, + { + name = local.namespace + } + ) + } + + depends_on = [ + data.google_container_cluster.gke, + google_container_cluster.autopilot, + ] +} + +# Data source for existing namespace (when create_namespace=false) +data "kubernetes_namespace_v1" "existing" { + count = var.create_namespace ? 0 : 1 + + metadata { + name = local.namespace + } + + depends_on = [ + data.google_container_cluster.gke, + google_container_cluster.autopilot, + ] +} + +# ============================================ +# Kubernetes Secrets +# ============================================ + +module "k8s_secrets" { + source = "./modules/k8s-secrets" + + namespace = local.namespace_name + + secrets = { + "datacommons-secrets" = { + data = { + # Database password from CloudSQL module + DB_PASS = module.cloudsql.user_password + + # Google Maps API key from maps-api-keys module + MAPS_API_KEY = module.maps_api_keys.api_key + + # DataCommons API key from user input + DC_API_KEY = var.dc_api_key + } + } + } + + labels = merge( + local.common_labels, + { + component = "secrets" + } + ) + + depends_on = [ + kubernetes_namespace_v1.datacommons, + data.kubernetes_namespace_v1.existing, + module.cloudsql, + module.maps_api_keys + ] +} diff --git a/mp-pkg/terraform/variables.tf b/mp-pkg/terraform/variables.tf new file mode 100644 index 0000000..9d54060 --- /dev/null +++ b/mp-pkg/terraform/variables.tf @@ -0,0 +1,252 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Variables +# ============================================ + +# ============================================ +# Marketplace Deployment Configuration +# ============================================ +variable "goog_cm_deployment_name" { + description = "Deployment name for the Data Commons Accelerator solution (used by GCP Marketplace for tracking and avoiding resource name collisions)" + type = string + + validation { + condition = length(var.goog_cm_deployment_name) >= 2 && length(var.goog_cm_deployment_name) <= 18 + error_message = "Deployment name must be between 2 and 18 characters (limited by GCP service account 30-char max: name + '-sa-' + 8-char suffix)." + } + + validation { + condition = can(regex("^[a-z][a-z0-9-]{0,16}[a-z0-9]$", var.goog_cm_deployment_name)) + error_message = "Deployment name must start with a lowercase letter, contain only lowercase letters, numbers, and hyphens, and end with a letter or number. Maximum 18 characters." + } +} + +# ============================================ +# Project & Region Configuration +# ============================================ +variable "project_id" { + description = "GCP project ID where Data Commons Accelerator will be deployed" + type = string + + validation { + condition = can(regex("^[a-z][a-z0-9-]{4,28}[a-z0-9]$", var.project_id)) + error_message = "Project ID must be 6-30 characters, start with a letter, and contain only lowercase letters, numbers, and hyphens." + } +} + +variable "create_new_cluster" { + description = "Create a new GKE Autopilot cluster with VPC networking. Set to false to use an existing cluster." + type = bool + default = true +} + +variable "region" { + description = "GCP region for new cluster and resources (e.g., us-central1). Only used when create_new_cluster is true." + type = string + default = "us-central1" + + validation { + condition = var.region == "" || can(regex("^[a-z]+-[a-z]+[0-9]+$", var.region)) + error_message = "Region must be a valid GCP region (e.g., us-central1, europe-west3)." + } +} + +# ============================================ +# GKE Cluster Configuration (Bring-Your-Own) +# ============================================ +variable "gke_cluster_name" { + description = "Name of an existing GKE cluster. Only used when create_new_cluster is false." + type = string + default = "" +} + +variable "gke_cluster_location" { + description = "Location (region or zone) of the existing GKE cluster. Only used when create_new_cluster is false." + type = string + default = "" +} + +variable "namespace" { + description = "Kubernetes namespace for Data Commons Accelerator deployment. Defaults to the deployment name (goog_cm_deployment_name) if not provided." + type = string + default = "" + + validation { + condition = var.namespace == "" || can(regex("^[a-z0-9]([-a-z0-9]*[a-z0-9])?$", var.namespace)) + error_message = "Namespace must consist of lowercase alphanumeric characters or '-', and must start and end with an alphanumeric character." + } +} + +variable "create_namespace" { + description = "Create new Kubernetes namespace. Set to false if namespace already exists in the cluster." + type = bool + default = true +} + +# ============================================ +# Resource Name Overrides (Optional) +# ============================================ +# These variables allow enterprise customers to specify exact resource names +# when they have naming conventions or pre-existing resources to integrate with. +# If not specified, resources use auto-generated names with random suffixes. + +variable "cloudsql_instance_name_override" { + description = "Override CloudSQL instance name (uses generated name with random suffix if not specified)" + type = string + default = "" + + validation { + condition = var.cloudsql_instance_name_override == "" || can(regex("^[a-z][a-z0-9-]{0,78}[a-z0-9]$", var.cloudsql_instance_name_override)) + error_message = "CloudSQL instance name must be lowercase, start with a letter, and contain only lowercase letters, numbers, and hyphens." + } +} + +variable "gcs_bucket_name_override" { + description = "Override GCS bucket name (uses generated name with random suffix if not specified)" + type = string + default = "" + + validation { + condition = var.gcs_bucket_name_override == "" || can(regex("^[a-z0-9][a-z0-9-_.]{1,61}[a-z0-9]$", var.gcs_bucket_name_override)) + error_message = "Bucket name must be 3-63 characters, start and end with lowercase letter or number, and contain only lowercase letters, numbers, hyphens, underscores, and dots." + } +} + +variable "service_account_name_override" { + description = "Override GCP service account name (uses generated name with random suffix if not specified)" + type = string + default = "" + + validation { + condition = var.service_account_name_override == "" || can(regex("^[a-z][a-z0-9-]{4,28}[a-z0-9]$", var.service_account_name_override)) + error_message = "Service account name must be 6-30 characters, start with a letter, and contain only lowercase letters, numbers, and hyphens." + } +} + +# ============================================ +# API Keys Configuration +# ============================================ +variable "dc_api_key" { + description = "Data Commons API key for accessing Data Commons APIs" + type = string + sensitive = true + + validation { + condition = length(var.dc_api_key) > 0 + error_message = "Data Commons API key cannot be empty." + } +} + +# ============================================ +# Application Configuration +# ============================================ +variable "app_replicas" { + description = "Number of replicas for the Data Commons Accelerator application deployment" + type = number + default = 1 + + validation { + condition = var.app_replicas >= 1 && var.app_replicas <= 10 + error_message = "Application replicas must be between 1 and 10." + } +} + +variable "resource_tier" { + description = "Resource allocation tier controlling application pod resources and CloudSQL database sizing (small, medium, large)" + type = string + default = "medium" + + validation { + condition = contains(["small", "medium", "large"], var.resource_tier) + error_message = "Resource tier must be one of: small, medium, large." + } +} + +variable "enable_natural_language" { + description = "Enable natural language query features" + type = bool + default = true +} + +variable "enable_data_sync" { + description = "Enable automatic synchronization of custom data from GCS bucket to CloudSQL database" + type = bool + default = true +} + +variable "flask_env" { + description = "Data Commons domain template (pre-built configurations for specific domains)" + type = string + default = "health" + + validation { + condition = contains(["health", "education", "energy"], var.flask_env) + error_message = "Domain template must be one of: health, education, energy." + } +} + +# ============================================ +# Container Image Variables (Marketplace-populated) +# ============================================ +variable "cdc_services_image_repo" { + description = "Container image repository for CDC Services (populated by GCP Marketplace)" + type = string + default = "" +} + +variable "cdc_services_image_tag" { + description = "Container image tag for CDC Services (populated by GCP Marketplace)" + type = string + default = "" +} + +variable "data_image_repo" { + description = "Container image repository for Data service (populated by GCP Marketplace)" + type = string + default = "" +} + +variable "data_image_tag" { + description = "Container image tag for Data service (populated by GCP Marketplace)" + type = string + default = "" +} + +# ============================================ +# Helm Chart Variables (Marketplace-populated) +# ============================================ +variable "helm_chart_repo" { + description = "Helm chart repository URL (populated by GCP Marketplace)" + type = string + default = "" +} + +variable "helm_chart_name" { + description = "Helm chart name (populated by GCP Marketplace)" + type = string + default = "datacommons" + + validation { + condition = length(var.helm_chart_name) > 0 + error_message = "Helm chart name cannot be empty." + } +} + +variable "helm_chart_version" { + description = "Helm chart version (populated by GCP Marketplace)" + type = string + default = "" +} \ No newline at end of file diff --git a/mp-pkg/terraform/versions.tf b/mp-pkg/terraform/versions.tf new file mode 100644 index 0000000..cb72278 --- /dev/null +++ b/mp-pkg/terraform/versions.tf @@ -0,0 +1,48 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Terraform Version Constraints +# ============================================ + +terraform { + required_version = ">= 1.5.7" + + required_providers { + google = { + source = "hashicorp/google" + version = ">= 7.0.0, < 8.0.0" + } + google-beta = { + source = "hashicorp/google-beta" + version = ">= 7.0.0, < 8.0.0" + } + kubernetes = { + source = "hashicorp/kubernetes" + version = ">= 2.20.0" + } + helm = { + source = "hashicorp/helm" + version = "~> 2.12.0" + } + random = { + source = "hashicorp/random" + version = "~> 3.6.0" + } + http = { + source = "hashicorp/http" + version = "~> 3.4.0" + } + } +} diff --git a/mp-pkg/terraform/vpc.tf b/mp-pkg/terraform/vpc.tf new file mode 100644 index 0000000..ac51e2c --- /dev/null +++ b/mp-pkg/terraform/vpc.tf @@ -0,0 +1,68 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# VPC Network +# ============================================ + +resource "google_compute_network" "vpc" { + count = var.create_new_cluster ? 1 : 0 + + provider = google + + name = "${local.deployment_name}-vpc" + auto_create_subnetworks = false + routing_mode = "REGIONAL" + project = var.project_id + + depends_on = [ + google_project_service.apis["compute.googleapis.com"] + ] +} + +# ============================================ +# Primary Subnet with GKE Secondary Ranges +# ============================================ + +resource "google_compute_subnetwork" "primary" { + count = var.create_new_cluster ? 1 : 0 + + provider = google + + name = "${local.deployment_name}-subnet" + network = google_compute_network.vpc[0].id + region = local.region + ip_cidr_range = "10.0.0.0/20" + private_ip_google_access = true + project = var.project_id + + secondary_ip_range { + range_name = "pods" + ip_cidr_range = "10.1.0.0/17" + } + + secondary_ip_range { + range_name = "services" + ip_cidr_range = "10.2.0.0/22" + } + + log_config { + aggregation_interval = "INTERVAL_30_SEC" + flow_sampling = 0.5 + } + + depends_on = [ + google_project_service.apis["compute.googleapis.com"] + ] +} diff --git a/mp-pkg/terraform/workload-identity.tf b/mp-pkg/terraform/workload-identity.tf new file mode 100644 index 0000000..89c0817 --- /dev/null +++ b/mp-pkg/terraform/workload-identity.tf @@ -0,0 +1,64 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# ============================================ +# Workload Identity Configuration +# ============================================ + +# ============================================ +# GCP Service Account for DataCommons Workload +# ============================================ +resource "google_service_account" "datacommons_workload" { + provider = google + + account_id = local.service_account_name + display_name = "DataCommons Workload Service Account (${local.deployment_name})" + description = "Service account for DataCommons application running on GKE with Workload Identity" + project = var.project_id + + depends_on = [google_project_service.apis["iam.googleapis.com"]] +} + +# ============================================ +# IAM Role Bindings - Project Level +# ============================================ + +# CloudSQL Client - Required for CloudSQL access via Cloud SQL Auth Proxy or private IP +resource "google_project_iam_member" "datacommons_cloudsql_client" { + provider = google + + project = var.project_id + role = "roles/cloudsql.client" + member = "serviceAccount:${google_service_account.datacommons_workload.email}" + + depends_on = [google_service_account.datacommons_workload] +} + +# NOTE: Storage Object Admin IAM is granted at the bucket level (not project level) + +# ============================================ +# Workload Identity Binding +# ============================================ + +resource "google_service_account_iam_member" "datacommons_workload_identity_user" { + provider = google + + service_account_id = google_service_account.datacommons_workload.name + role = "roles/iam.workloadIdentityUser" + + # Member format: serviceAccount:{PROJECT_ID}.svc.id.goog[{NAMESPACE}/{KSA_NAME}] + member = "serviceAccount:${var.project_id}.svc.id.goog[${local.namespace_name}/datacommons-ksa]" + + depends_on = [google_service_account.datacommons_workload] +} From 280f07f4d3588469b0dd78d2e9a65531948c46b1 Mon Sep 17 00:00:00 2001 From: Artur Date: Fri, 20 Mar 2026 18:14:59 +0100 Subject: [PATCH 3/3] update version to v3.4.3, docs --- docs/DEPLOYMENT_GUIDE.md | 4 +-- docs/MARKETPLACE_FIELDS.md | 7 +++-- docs/USER_GUIDE.md | 29 +++++++++++++++---- mp-pkg/charts/datacommons/Chart.yaml | 4 +-- .../datacommons/templates/application.yaml | 2 +- .../datacommons/templates/configmap.yaml | 2 +- mp-pkg/charts/datacommons/values.yaml | 2 +- mp-pkg/terraform/README.md | 2 +- mp-pkg/terraform/metadata.display.yaml | 8 +++-- mp-pkg/terraform/metadata.yaml | 2 +- mp-pkg/terraform/variables.tf | 6 ++-- 11 files changed, 45 insertions(+), 23 deletions(-) diff --git a/docs/DEPLOYMENT_GUIDE.md b/docs/DEPLOYMENT_GUIDE.md index a12c094..bb943ea 100644 --- a/docs/DEPLOYMENT_GUIDE.md +++ b/docs/DEPLOYMENT_GUIDE.md @@ -182,7 +182,7 @@ This section walks through deploying Data Commons Accelerator via GCP Marketplac ### Step 2: Complete the Deployment Configuration Form -The Marketplace will open a deployment configuration form. Enter your **Deployment Name** and select a **GCP Region** at the top, then configure **Application Settings** (resource tier and domain template) and provide your **API Keys** (Data Commons API key). A new GKE cluster is created automatically. +The Marketplace will open a deployment configuration form. Enter your **Deployment Name** and select a **GCP Region** at the top, then configure **Application Settings** (resource tier and sample) and provide your **API Keys** (Data Commons API key). A new GKE cluster is created automatically. > [!TIP] > Each field has built-in tooltips with detailed guidance—hover over or click the help icon next to any field for clarification. The form validates your inputs and shows clear error messages if anything is incorrect. @@ -199,7 +199,7 @@ Once you've completed all sections: 2. **Accept the terms** by checking the Terms checkbox 3. **Click the Deploy button** -Deployment takes approximately **15-20 minutes**. A progress indicator will appear. +Deployment takes approximately **20–30 minutes**. A progress indicator will appear. > [!WARNING] > **Do not close the browser tab** during deployment. Closing it may interrupt the provisioning process. diff --git a/docs/MARKETPLACE_FIELDS.md b/docs/MARKETPLACE_FIELDS.md index 4283c3b..39bf769 100644 --- a/docs/MARKETPLACE_FIELDS.md +++ b/docs/MARKETPLACE_FIELDS.md @@ -29,7 +29,7 @@ The deployment form has **5 fields** across **2 sections**. A new GKE cluster is | Field | Default | Description | |-------|---------|-------------| | **Resource Tier** | Medium | Controls how much CPU and memory the application gets, and the size of the database | -| **Domain Template** | Health | Pre-built configuration optimized for your domain | +| **Samples** | Health | Pre-built configuration optimized for your domain | ### Resource Tier @@ -39,15 +39,16 @@ The deployment form has **5 fields** across **2 sections**. A new GKE cluster is | Medium (recommended) | 4 GB | 2 cores | 2 | Standard | No | | Large | 8 GB | 4 cores | 3 | Large | Yes | -### Domain Template +### Samples | Option | Best for | |--------|----------| | Health | Health and epidemiology data | | Education | School, enrollment, and outcomes data | | Energy | Energy consumption and generation data | +| Custom | Custom data configuration with no pre-built datasets | -You can customize the template after deployment. +You can customize the sample after deployment. --- diff --git a/docs/USER_GUIDE.md b/docs/USER_GUIDE.md index b5d2fa1..d8d98d2 100644 --- a/docs/USER_GUIDE.md +++ b/docs/USER_GUIDE.md @@ -10,14 +10,15 @@ This guide explains how to access, configure, and use your Custom Data Commons i 2. [Data Commons for Education](#data-commons-for-education) 3. [Data Commons for Health](#data-commons-for-health) 4. [Data Commons for Energy](#data-commons-for-energy) -5. [Known Limitations](#known-limitations) -6. [Request Support](#request-support) +5. [Data Commons for Custom](#data-commons-for-custom) +6. [Known Limitations](#known-limitations) +7. [Request Support](#request-support) --- ## Getting Started -To configure the landing page, upload your company logo, and manage private data, you need to log in as the Data Commons Administrator. The steps below apply to all domain templates (Education, Health, Energy). +To configure the landing page, upload your company logo, and manage private data, you need to log in as the Data Commons Administrator. The steps below apply to all samples (Education, Health, Energy, Custom). > [!TIP] > For deployment and initial setup instructions, see the [Deployment Guide](DEPLOYMENT_GUIDE.md). @@ -48,14 +49,14 @@ The application administrator password is not provided in the deployment outputs 1. Navigate to your application URL (e.g., `https://education.example.com/`) 2. To access the **Admin Panel**, append `/admin` to the URL (e.g., `https://education.example.com/admin/`) 3. Enter the username and password retrieved in the previous step -4. You will be logged in as an administrator for the domain template selected during deployment (Education, Health, or Energy) +4. You will be logged in as an administrator for the sample selected during deployment (Education, Health, Energy, or Custom) ### Upload Custom Data To populate the dashboard with your custom data: 1. See [Prepare and load your own data](https://docs.datacommons.org/custom_dc/custom_data.html). -2. Ensure your data matches the required schema for your domain template. You can download a sample CSV directly from the application **Data & Files** tab and fill in your data there. +2. Ensure your data matches the required schema for your selected sample. You can download a sample CSV directly from the application **Data & Files** tab and fill in your data there. 3. Log in and navigate to the **Admin Panel**. 4. Go to **Data & Files** tab. 5. Locate the **Data Upload** section. @@ -230,6 +231,24 @@ Review specific leak events in the table at the bottom of the dashboard: --- +## Data Commons for Custom + +***Template: Custom Configuration*** + +### Custom Overview + +The Custom sample provides a blank-slate Data Commons instance with no pre-loaded domain-specific datasets, statistical variables, or visualizations. Use this option when your use case doesn't align with the Education, Health, or Energy samples, or when you want to build a fully custom configuration from scratch. + +### Custom: For Administrators + +Upload your own data following the generic CSV schema. See [Getting Started](#getting-started) for upload instructions and UI customization. + +### Custom: For Data Analysts & Researchers + +The Custom instance starts with an empty dashboard. After your administrator uploads data, the dashboard will populate based on the uploaded dataset structure. Use the standard Data Commons exploration and visualization tools to analyze your data. + +--- + ## Known Limitations - **Data Sync:** Dashboard data refreshes automatically after upload, but large CSVs may take a few moments to process. diff --git a/mp-pkg/charts/datacommons/Chart.yaml b/mp-pkg/charts/datacommons/Chart.yaml index 69d677e..9ae914b 100644 --- a/mp-pkg/charts/datacommons/Chart.yaml +++ b/mp-pkg/charts/datacommons/Chart.yaml @@ -16,8 +16,8 @@ apiVersion: v2 name: datacommons description: Custom Data Commons Accelerator deployment type: application -version: 3.3.12 -appVersion: "v3.3.12" +version: 3.4.3 +appVersion: "v3.4.3" # GCP Marketplace metadata annotations: diff --git a/mp-pkg/charts/datacommons/templates/application.yaml b/mp-pkg/charts/datacommons/templates/application.yaml index 9652a61..8882cd5 100644 --- a/mp-pkg/charts/datacommons/templates/application.yaml +++ b/mp-pkg/charts/datacommons/templates/application.yaml @@ -36,7 +36,7 @@ spec: - Integration with Google Cloud SQL for data storage - Scalable Kubernetes-native architecture - Support for custom data imports from GCS - - Pre-built domain templates (Education, Health, Energy) + - Pre-built samples (Education, Health, Energy, Custom) links: - description: "Data Commons Documentation" url: "https://docs.datacommons.org/" diff --git a/mp-pkg/charts/datacommons/templates/configmap.yaml b/mp-pkg/charts/datacommons/templates/configmap.yaml index ff93168..cf619a3 100644 --- a/mp-pkg/charts/datacommons/templates/configmap.yaml +++ b/mp-pkg/charts/datacommons/templates/configmap.yaml @@ -55,7 +55,7 @@ data: GCP_PROJECT_ID: {{ .Values.global.projectId | quote }} {{- end }} - # Domain Template + # Sample FLASK_ENV: {{ .Values.config.flaskEnv | quote }} # Mixer Settings diff --git a/mp-pkg/charts/datacommons/values.yaml b/mp-pkg/charts/datacommons/values.yaml index 36edf8e..d5485a3 100644 --- a/mp-pkg/charts/datacommons/values.yaml +++ b/mp-pkg/charts/datacommons/values.yaml @@ -155,7 +155,7 @@ config: # Enable periodic data sync enableDataSync: true - # Domain template (FLASK_ENV) + # Sample (FLASK_ENV) flaskEnv: "custom" # Data Commons API root diff --git a/mp-pkg/terraform/README.md b/mp-pkg/terraform/README.md index b7214e7..f291d9f 100644 --- a/mp-pkg/terraform/README.md +++ b/mp-pkg/terraform/README.md @@ -132,7 +132,7 @@ CloudSQL uses private IP connectivity via Private Service Access (PSA). A /20 IP | helm_chart_name | Helm chart name (populated by GCP Marketplace) | string | `"datacommons"` | no | | app_replicas | Number of replicas for the Data Commons Accelerator application deployment | number | `1` | no | | resource_tier | Resource allocation tier for the application (small, medium, large). Also controls CloudSQL machine tier and high availability. | string | `"medium"` | no | -| flask_env | Data Commons domain template (health, education, energy) | string | `"health"` | no | +| flask_env | Data Commons sample (health, education, energy, custom) | string | `"health"` | no | | dc_api_key | Data Commons API key for accessing Data Commons APIs | string | n/a | yes | | enable_natural_language | Enable natural language query features | bool | `true` | no | | enable_data_sync | Enable automatic synchronization of custom data from GCS bucket to CloudSQL database | bool | `true` | no | diff --git a/mp-pkg/terraform/metadata.display.yaml b/mp-pkg/terraform/metadata.display.yaml index 2f256b4..57917f5 100644 --- a/mp-pkg/terraform/metadata.display.yaml +++ b/mp-pkg/terraform/metadata.display.yaml @@ -61,11 +61,11 @@ spec: value: large flask_env: name: flask_env - title: Domain Template + title: Samples tooltip: | Select a pre-built Data Commons configuration optimized for a specific domain. - Each domain includes curated datasets, statistical variables, and visualizations tailored to that subject area. - Select the domain that best matches your use case. + Each sample includes curated datasets, statistical variables, and visualizations tailored to that subject area. + Select the sample that best matches your use case. placeholder: Choose from the list section: application enumValueLabels: @@ -75,6 +75,8 @@ spec: value: health - label: Energy value: energy + - label: Custom + value: custom dc_api_key: name: dc_api_key title: Data Commons API Key diff --git a/mp-pkg/terraform/metadata.yaml b/mp-pkg/terraform/metadata.yaml index f48a0a8..4f663bb 100644 --- a/mp-pkg/terraform/metadata.yaml +++ b/mp-pkg/terraform/metadata.yaml @@ -89,7 +89,7 @@ spec: varType: bool defaultValue: true - name: flask_env - description: Data Commons domain template (pre-built configurations for specific domains) + description: Data Commons sample (pre-built configurations for specific domains) varType: string defaultValue: health - name: cdc_services_image_repo diff --git a/mp-pkg/terraform/variables.tf b/mp-pkg/terraform/variables.tf index 9d54060..4ba24f6 100644 --- a/mp-pkg/terraform/variables.tf +++ b/mp-pkg/terraform/variables.tf @@ -188,13 +188,13 @@ variable "enable_data_sync" { } variable "flask_env" { - description = "Data Commons domain template (pre-built configurations for specific domains)" + description = "Data Commons sample (pre-built configurations for specific domains)" type = string default = "health" validation { - condition = contains(["health", "education", "energy"], var.flask_env) - error_message = "Domain template must be one of: health, education, energy." + condition = contains(["health", "education", "energy", "custom"], var.flask_env) + error_message = "Sample must be one of: health, education, energy, custom." } }