Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
275 changes: 275 additions & 0 deletions services-custom/dynamodb-enhanced/EC2_BENCHMARK_INSTRUCTIONS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,275 @@
# Enhanced Query Benchmark Instructions

This document covers two ways to run the Enhanced Query (join and aggregation) benchmarks: **local DynamoDB** (no AWS required) and **EC2 + real DynamoDB** (production-like numbers).

---

## Local DynamoDB benchmark (recommended for design doc)

No AWS account or credentials required. Uses in-process DynamoDB Local, creates and seeds 1000 customers × 1000 orders, then runs five scenarios and prints latency stats (avgMs, p50Ms, p95Ms, rows).

### How to run

From the **repository root**:

```bash
./services-custom/dynamodb-enhanced/run-enhanced-query-benchmark-local.sh
```

The script sets `USE_LOCAL_DYNAMODB=true` and invokes the benchmark runner. Results are printed to stdout. To save CSV results:

```bash
BENCHMARK_OUTPUT_FILE=benchmark_local.csv ./services-custom/dynamodb-enhanced/run-enhanced-query-benchmark-local.sh
```

Optional env vars (set before running the script): `BENCHMARK_ITERATIONS` (default 5), `BENCHMARK_WARMUP` (default 2), `BENCHMARK_OUTPUT_FILE` (optional path for CSV).

### Results

Example output (environment and scenario lines):

```
Using in-process DynamoDB Local.
Creating tables and seeding data (1000 customers x 1000 orders)...
...
Environment: DynamoDB Local (in-process) CUSTOMERS_TABLE=customers_large ORDERS_TABLE=orders_large
Warmup=2 Iterations=5
---
baseOnly_keyCondition: avgMs=... p50Ms=... p95Ms=... rows=1
joinInner_c1: avgMs=... p50Ms=... p95Ms=... rows=1000
...
```

Use this output (or the CSV file) in the design document. See [COMPLEX_QUERIES_DESIGN.md](COMPLEX_QUERIES_DESIGN.md#benchmarking) for where to reference the benchmark and link to results.

---

## EC2 + Real DynamoDB benchmark

Use this for production-like latency (e.g. external claims or SLA discussions). Requires AWS account and EC2.

### Prerequisites

- AWS account with permissions to create DynamoDB tables and launch EC2 instances (or use existing EC2).
- AWS CLI configured (`aws configure`) or IAM role for EC2 with DynamoDB access.
- Java 8+ and Maven 3.6+ (on your machine for building; on EC2 for running).

---

## Step 1: Create DynamoDB tables (AWS CLI)

Create two tables in your chosen region (e.g. `us-east-1`) with the same schema as the functional tests.

**Customers table** (partition key: `customerId` String):

```bash
export AWS_REGION=us-east-1
export CUSTOMERS_TABLE=customers_large
export ORDERS_TABLE=orders_large

aws dynamodb create-table \
--table-name $CUSTOMERS_TABLE \
--attribute-definitions AttributeName=customerId,AttributeType=S \
--key-schema AttributeName=customerId,KeyType=HASH \
--billing-mode PAY_PER_REQUEST \
--region $AWS_REGION
```

**Orders table** (partition key: `customerId` String, sort key: `orderId` String):

```bash
aws dynamodb create-table \
--table-name $ORDERS_TABLE \
--attribute-definitions \
AttributeName=customerId,AttributeType=S \
AttributeName=orderId,AttributeType=S \
--key-schema \
AttributeName=customerId,KeyType=HASH \
AttributeName=orderId,KeyType=RANGE \
--billing-mode PAY_PER_REQUEST \
--region $AWS_REGION
```

Wait until both tables are `ACTIVE`:

```bash
aws dynamodb describe-table --table-name $CUSTOMERS_TABLE --query 'Table.TableStatus'
aws dynamodb describe-table --table-name $ORDERS_TABLE --query 'Table.TableStatus'
```

---

## Step 2: Seed the tables (optional: use benchmark runner with CREATE_AND_SEED)

You can either seed from your **local machine** (or an EC2 instance) by running the benchmark runner once with `CREATE_AND_SEED=true`. This creates tables if they do not exist (skip if you already created them in Step 1) and seeds **1000 customers × 1000 orders** (1M orders). For tables you already created, use the same table names and set only the seed path.

**Option A – Seed from local (or EC2) with the runner**

From the **repo root**:

```bash
export AWS_REGION=us-east-1
export CUSTOMERS_TABLE=customers_large
export ORDERS_TABLE=orders_large
export CREATE_AND_SEED=true

mvn test-compile exec:java -pl services-custom/dynamodb-enhanced \
-Dexec.mainClass="software.amazon.awssdk.enhanced.dynamodb.functionaltests.EnhancedQueryBenchmarkRunner" \
-Dexec.classpathScope=test
```

If the tables already exist, the initializer will skip creation and only seed data (idempotent). If you use **pay-per-request** billing, no capacity settings are needed. Seeding 1M items may take several minutes and incur write cost.

**Option B – Create tables via runner (omit Step 1)**

If you omit Step 1 and set `CREATE_AND_SEED=true`, the runner will try to create the tables. The SDK’s `createTable` uses **provisioned** throughput by default (50 RCU/WCU). For pay-per-request, create tables in Step 1 and only seed via the runner (run with `CREATE_AND_SEED=true` once; the initializer skips create if tables exist).

---

## Step 3: Launch EC2 and install Java + Maven

1. Launch an EC2 instance (e.g. Amazon Linux 2 or Ubuntu) in the **same region** as your DynamoDB tables.
2. Attach an **IAM role** to the instance with at least:
- `dynamodb:GetItem`, `dynamodb:PutItem`, `dynamodb:Query`, `dynamodb:Scan`, `dynamodb:BatchWriteItem`, `dynamodb:DescribeTable`, `dynamodb:CreateTable` (if you use CREATE_AND_SEED).
3. SSH into the instance and install Java and Maven:

**Amazon Linux 2:**

```bash
sudo yum install -y java-11-amazon-corretto maven
```

**Ubuntu:**

```bash
sudo apt-get update && sudo apt-get install -y openjdk-11-jdk maven
```

4. Verify:

```bash
java -version
mvn -version
```

---

## Step 4: Build and copy the project to EC2

**On your local machine** (from repo root):

```bash
cd /path/to/aws-sdk-java-v2
mvn clean package -pl services-custom/dynamodb-enhanced -DskipTests -q
```

Copy the module and its dependencies to EC2. Option A: copy the whole repo and build on EC2. Option B: copy the built JAR and dependency JARs.

**Option A – Copy repo and build on EC2**

```bash
scp -r . ec2-user@<EC2_PUBLIC_IP>:~/aws-sdk-java-v2
ssh ec2-user@<EC2_PUBLIC_IP> "cd ~/aws-sdk-java-v2 && mvn clean test-compile -pl services-custom/dynamodb-enhanced -DskipTests -q"
```

**Option B – Copy only the dynamodb-enhanced module and run with mvn exec:java on EC2**

Copy the entire `aws-sdk-java-v2` repo (or at least the parent POMs and `services-custom/dynamodb-enhanced`) so that `mvn exec:java -pl services-custom/dynamodb-enhanced` can resolve the parent and run the benchmark. Building on EC2 is usually simpler:

```bash
rsync -avz --exclude='.git' . ec2-user@<EC2_PUBLIC_IP>:~/aws-sdk-java-v2
```

Then on EC2:

```bash
cd ~/aws-sdk-java-v2
mvn test-compile -pl services-custom/dynamodb-enhanced -DskipTests -q
```

---

## Step 5: Run the benchmark on EC2

SSH to the EC2 instance and set environment variables, then run the benchmark.

```bash
cd ~/aws-sdk-java-v2

export AWS_REGION=us-east-1
export CUSTOMERS_TABLE=customers_large
export ORDERS_TABLE=orders_large
export BENCHMARK_ITERATIONS=5
export BENCHMARK_WARMUP=2
# Optional: append CSV results to a file
export BENCHMARK_OUTPUT_FILE=benchmark_results.csv

# Do NOT set CREATE_AND_SEED unless you want to create/seed from this instance (tables should already exist and be seeded).

mvn exec:java -pl services-custom/dynamodb-enhanced \
-Dexec.mainClass="software.amazon.awssdk.enhanced.dynamodb.functionaltests.EnhancedQueryBenchmarkRunner" \
-Dexec.classpathScope=test -q
```

Example output:

```
Environment: AWS_REGION=us-east-1 CUSTOMERS_TABLE=customers_large ORDERS_TABLE=orders_large
Warmup=2 Iterations=5
---
baseOnly_keyCondition: avgMs=45.20 p50Ms=42 p95Ms=58 rows=1
joinInner_c1: avgMs=320.40 p50Ms=310 p95Ms=380 rows=1000
aggregation_groupByCount_c1: avgMs=305.20 p50Ms=298 p95Ms=350 rows=1
aggregation_groupBySum_c1: avgMs=318.60 p50Ms=312 p95Ms=355 rows=1
joinLeft_c1_limit50: avgMs=89.40 p50Ms=85 p95Ms=102 rows=50
```

---

## Step 6: Collect results

- **Stdout**: Redirect to a file, e.g. `mvn exec:java ... > benchmark_stdout.txt 2>&1`.
- **CSV**: If `BENCHMARK_OUTPUT_FILE` is set, the runner appends one CSV line per scenario to the file. Copy the file from EC2:

```bash
scp ec2-user@<EC2_PUBLIC_IP>:~/aws-sdk-java-v2/benchmark_results.csv .
```

Use the output (avgMs, p50Ms, p95Ms, rows) in your design doc. Document in the doc: **region**, **EC2 instance type**, **table names**, **dataset size** (e.g. 1000 customers × 1000 orders), and **billing mode** (pay-per-request or provisioned).

---

## Step 7: Cleanup (optional)

To avoid ongoing cost, delete the DynamoDB tables and terminate the EC2 instance when done:

```bash
aws dynamodb delete-table --table-name customers_large --region us-east-1
aws dynamodb delete-table --table-name orders_large --region us-east-1
# Terminate the EC2 instance from the AWS Console or CLI.
```

---

## Environment variable reference

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `AWS_REGION` | No | default SDK region | DynamoDB region (e.g. `us-east-1`). |
| `CUSTOMERS_TABLE` | No | `customers_large` | Customers table name. |
| `ORDERS_TABLE` | No | `orders_large` | Orders table name. |
| `CREATE_AND_SEED` | No | (unset) | Set to `true` to create tables (if missing) and seed 1000×1000 data. Requires DynamoDB create/put permissions. |
| `BENCHMARK_ITERATIONS` | No | `5` | Number of measured runs per scenario. |
| `BENCHMARK_WARMUP` | No | `2` | Warm-up runs per scenario before measuring. |
| `BENCHMARK_OUTPUT_FILE` | No | (none) | If set, CSV results are appended to this path. |

---

## Running locally against DynamoDB Local

To run the same benchmark against **DynamoDB Local** (e.g. for CI or no-AWS runs):

1. Start DynamoDB Local (e.g. `docker run -p 8000:8000 amazon/dynamodb-local` or the SDK’s embedded LocalDynamoDb).
2. Set `AWS_REGION` and point the SDK to the local endpoint (e.g. `DYNAMODB_ENDPOINT_OVERRIDE=http://localhost:8000` if your test setup supports it, or run the functional tests which use in-process LocalDynamoDb).

The benchmark runner does **not** set an endpoint override by default; it uses the default DynamoDB endpoint for the given region. To run against Local, you would need to configure the client with an endpoint override (e.g. in a variant of the runner or via a system property your client builder reads). The functional tests and `run-enhanced-query-tests-and-print-timing.sh` already run against Local and produce timing output for the design doc.
7 changes: 7 additions & 0 deletions services-custom/dynamodb-enhanced/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,13 @@
<artifactId>http-client-spi</artifactId>
<version>${awsjavasdk.version}</version>
</dependency>
<!-- DynamoDB Local ServerRunner telemetry loads UrlConnectionHttpClient (Pinpoint); tests and exec:java use test scope -->
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>url-connection-client</artifactId>
<version>${awsjavasdk.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>software.amazon.awssdk</groupId>
<artifactId>sdk-core</artifactId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@
import software.amazon.awssdk.enhanced.dynamodb.model.TransactWriteItemsEnhancedRequest;
import software.amazon.awssdk.enhanced.dynamodb.model.TransactWriteItemsEnhancedResponse;
import software.amazon.awssdk.enhanced.dynamodb.model.UpdateItemEnhancedRequest;
import software.amazon.awssdk.enhanced.dynamodb.query.result.EnhancedQueryRow;
import software.amazon.awssdk.enhanced.dynamodb.query.spec.QueryExpressionSpec;
import software.amazon.awssdk.services.dynamodb.DynamoDbAsyncClient;

/**
Expand Down Expand Up @@ -527,13 +529,22 @@ default CompletableFuture<TransactWriteItemsEnhancedResponse> transactWriteItems
throw new UnsupportedOperationException();
}

/**
* Executes an enhanced query (joins, aggregations, filters) described by the given spec, asynchronously.
*
* @param spec the query specification
* @return a publisher of result rows
*/
default SdkPublisher<EnhancedQueryRow> enhancedQuery(QueryExpressionSpec spec) {
throw new UnsupportedOperationException();
}

/**
* Returns the underlying low-level {@link DynamoDbAsyncClient} that this enhanced client uses to make API calls.
* <p>
* The returned client is the same instance that was provided during construction via
* {@link Builder#dynamoDbClient(DynamoDbAsyncClient)}, or the internally-created one if {@link #create()} was used.
* It is <b>not</b> a copy — any operations performed on it (including {@code close()}) will affect this
* enhanced client as well.
* {@link Builder#dynamoDbClient(DynamoDbAsyncClient)}, or the internally-created one if {@link #create()} was used. It is
* <b>not</b> a copy — any operations performed on it (including {@code close()}) will affect this enhanced client as well.
*
* @return the underlying {@link DynamoDbAsyncClient}
* @throws UnsupportedOperationException if the implementation does not support this operation
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@
import software.amazon.awssdk.enhanced.dynamodb.model.TransactWriteItemsEnhancedRequest;
import software.amazon.awssdk.enhanced.dynamodb.model.TransactWriteItemsEnhancedResponse;
import software.amazon.awssdk.enhanced.dynamodb.model.UpdateItemEnhancedRequest;
import software.amazon.awssdk.enhanced.dynamodb.query.result.EnhancedQueryLatencyReport;
import software.amazon.awssdk.enhanced.dynamodb.query.result.EnhancedQueryResult;
import software.amazon.awssdk.enhanced.dynamodb.query.spec.QueryExpressionSpec;
import software.amazon.awssdk.services.dynamodb.DynamoDbClient;
import software.amazon.awssdk.services.dynamodb.model.BatchGetItemRequest;

Expand Down Expand Up @@ -537,13 +540,34 @@ default TransactWriteItemsEnhancedResponse transactWriteItemsWithResponse(
throw new UnsupportedOperationException();
}

/**
* Executes an enhanced query (joins, aggregations, filters) described by the given spec.
*
* @param spec the query specification
* @return iterable of result rows
*/
default EnhancedQueryResult enhancedQuery(QueryExpressionSpec spec) {
throw new UnsupportedOperationException();
}

/**
* Executes an enhanced query and optionally reports latency.
*
* @param spec the query specification
* @param reportConsumer optional consumer for the latency report; may be null
* @return iterable of result rows
*/
default EnhancedQueryResult enhancedQuery(QueryExpressionSpec spec,
Consumer<EnhancedQueryLatencyReport> reportConsumer) {
throw new UnsupportedOperationException();
}

/**
* Returns the underlying low-level {@link DynamoDbClient} that this enhanced client uses to make API calls.
* <p>
* The returned client is the same instance that was provided during construction via
* {@link Builder#dynamoDbClient(DynamoDbClient)}, or the internally-created one if {@link #create()} was used.
* It is <b>not</b> a copy — any operations performed on it (including {@code close()}) will affect this
* enhanced client as well.
* {@link Builder#dynamoDbClient(DynamoDbClient)}, or the internally-created one if {@link #create()} was used. It is
* <b>not</b> a copy — any operations performed on it (including {@code close()}) will affect this enhanced client as well.
*
* @return the underlying {@link DynamoDbClient}
* @throws UnsupportedOperationException if the implementation does not support this operation
Expand Down
Loading