Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
29 changes: 25 additions & 4 deletions testframework/data/PQSDKTestData.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,40 @@ record data** and the **Taxi Zone Lookup table**. The details of the data could
[TLC Trip Record Data](https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page) page on the NYC Taxi & Limousine
Commission website.

The modified dataset is open for anyone to use under the [CDLA-Permissive-2.0 license](https://cdla.dev/permissive-2-0/).
The modified dataset is open for anyone to use under the [CDLA-Permissive-2.0 license](https://cdla.dev/permissive-2-0/).

## PQ SDK Test Framework - Test Data Details:

The PQ SDK Test Framework dataset contains the below files:

- **nyc_taxi_tripdata.csv** file which contains 10000 rows sampled from the February 2023 green trip data
- **nyc_taxi_tripdata.csv** file with 10000 rows sampled from the February 2023 green trip data
- **nyc_taxi_trip_date_data.csv** file with 10000 rows containing record identifier and two date columns processed from
the February 2023 green trip data
- **taxi+\_zone_lookup.csv** file which contains 265 rows from the taxi zone lookup table
- **PQSDKTestFrameworkDataSchema.sql** file contains the schema for NyxTaxiGreen and TaxiZoneLookup table

## PQ SDK Test Framework - Data Types and Precision

The schema in **PQSDKTestFrameworkDataSchema.sql** uses generic type names (`int`, `double`, `boolean`, `timestamp`,
`date`, `string`) that should be mapped to the equivalent types in your data source. In particular:

| Schema Type | Description | Example Mappings |
|-------------|-------------|------------------|
| `int` | Whole numbers | INTEGER, INT, NUMBER(38,0) |
| `double` | Floating-point values rounded to **two decimal places** in the taxi data | FLOAT, DOUBLE, REAL, DECIMAL(10,2), NUMBER |
| `boolean` | True/false flags | BOOLEAN, BIT |
| `timestamp` | Date and time | DATETIME, TIMESTAMP, TIMESTAMP_NTZ |
| `date` | Date only | DATE |
| `string` | Variable-length text | VARCHAR, NVARCHAR, TEXT |

> **Note:** All `double` columns in the **NycTaxiData** table (e.g., `trip_distance`, `fare_amount`, `total_amount`)
> contain values with at most two decimal places. When choosing a data source type, either a floating-point type
> (FLOAT/DOUBLE) or a fixed-precision decimal type (e.g., DECIMAL(10,2)) will work.

## PQ SDK Test Framework - Test Data Loading

The PQ SDK Test Framework dataset needs to be loaded to the datasource for your extension connector before running the
PQ SDK Testframework Test Suites. The data is provided in convenient csv format so that it can be easily be loaded to
any datasource. The **nyc_taxi_tripdata.csv** and **taxi+\_zone_lookup.csv** files should be respectively loaded into
NyxTaxiGreen and TaxiZoneLookup tables as per the schema specified in the **PQSDKTestFrameworkDataSchema.sql** file.
any datasource. The **nyc_taxi_tripdata.csv**, **nyc_taxi_trip_date_data.csv** and **taxi+\_zone_lookup.csv** files
should be respectively loaded into **NycTaxiData**, **NycTaxiDateData** and **TaxiZoneLookup** tables as per the schema
specified in the **PQSDKTestFrameworkDataSchema.sql** file.
68 changes: 38 additions & 30 deletions testframework/data/PQSDKTestFrameworkDataSchema.sql
Original file line number Diff line number Diff line change
@@ -1,35 +1,43 @@
/*
NOTE:
1) While uploading to the data source, all decimal values should have a scale of 2. That is, the number of digits after the decimal point should be 2.
2) All timestamp values should be uploaded to the datasouce in MM/DD/YYYY HH24:MI:SS format.
*/
-- Note: Columns defined as 'double' in the taxi data contain values rounded to
-- two decimal places. Map 'double' to the appropriate floating-point or decimal
-- type in your data source (e.g., FLOAT, DOUBLE, REAL, DECIMAL(10,2), NUMBER).
-- Similarly, map 'int' to INTEGER/INT, 'boolean' to BOOLEAN/BIT, 'timestamp' to
-- DATETIME/TIMESTAMP, 'date' to DATE, and 'string' to VARCHAR/NVARCHAR/TEXT as
-- supported by your data source.

CREATE TABLE NycTaxiGreen (
RecordID int,
VendorID int,
lpepPickupDatetime timestamp,
lpepDropoffDatetime timestamp,
storeAndFwdFlag boolean,
RateCodeID int,
PULocationID int,
DOLocationID int,
passenger_count int,
trip_distance double,
fare_amount double,
extra double,
mta_tax double,
tip_amount double,
tolls_amount double,
improvement_surcharge double,
total_amount double,
payment_type int,
trip_type int,
congestion_surcharge double
CREATE TABLE NycTaxiData(
RecordID int,
VendorID int,
lpep_pickup_datetime timestamp,
lpep_dropoff_datetime timestamp,
store_and_fwd_flag boolean,
RatecodeID int,
PULocationID int,
DOLocationID int,
passenger_count int,
trip_distance double,
fare_amount double,
extra double,
mta_tax double,
tip_amount double,
tolls_amount double,
ehail_fee double,
improvement_surcharge double,
total_amount double,
payment_type int,
trip_type int,
congestion_surcharge double
);

CREATE TABLE NycTaxiDateData (
RecordID int NOT NULL,
lpep_pickup_date date NOT NULL,
lpep_dropoff_date date NOT NULL
);

CREATE TABLE TaxiZoneLookup (
LocationId int,
Borough string,
Zone string,
service_zone string
LocationID int,
Borough string,
Zone string,
service_zone string
);
Loading