Skip to content

refactor(jdbc): connection properties and data source properties#4102

Draft
keshavdandeva wants to merge 2 commits intomainfrom
jdbc/refactor-connection-properties-and-data-source
Draft

refactor(jdbc): connection properties and data source properties#4102
keshavdandeva wants to merge 2 commits intomainfrom
jdbc/refactor-connection-properties-and-data-source

Conversation

@keshavdandeva
Copy link
Contributor

b/429272203

@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. api: bigquery Issues related to the googleapis/java-bigquery API. labels Feb 12, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @keshavdandeva, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request undertakes a substantial refactoring of how connection properties are handled within the BigQuery JDBC driver. The primary goal is to consolidate and simplify the parsing and management of these properties by introducing a DataSource object as the central repository for all connection-related configurations. This change moves away from a fragmented parsing approach to a more cohesive and maintainable design, where the BigQueryConnection directly consumes a pre-populated DataSource instance.

Highlights

  • Centralized Property Parsing: Connection string properties are now parsed into a DataSource object, centralizing the parsing logic previously scattered across multiple utility methods.
  • Simplified Connection Initialization: The BigQueryConnection constructor has been streamlined to retrieve all connection parameters from a single DataSource instance, improving readability and maintainability.
  • Enhanced DataSource Model: The DataSource class has been significantly expanded to hold a comprehensive set of BigQuery JDBC connection properties, providing a unified data model for configuration.
  • Reduced Redundancy in Utility: Numerous individual property parsing methods in BigQueryJdbcUrlUtility have been removed, as their functionality is now handled by the new DataSource-centric approach.
Changelog
  • google-cloud-bigquery-jdbc/src/main/java/com/google/cloud/bigquery/jdbc/BigQueryConnection.java
    • Updated imports to include Arrays, Collections, HashMap, and Collectors for new data structures and stream operations.
    • Refactored the constructor to utilize BigQueryJdbcUrlUtility.parseUrlProperties to get a map of properties, then populate a DataSource object using BigQueryJdbcUrlUtility.setDataSourceProperties, and finally initialize connection fields from this DataSource.
    • Removed direct calls to many individual property parsing methods from BigQueryJdbcUrlUtility, such as parseLabels, parseMaximumBytesBilled, parseRetryTimeoutInSecs, parseJobTimeout, parseStringProperty, parseOverrideProperties, parseBooleanProperty, parseIntProperty, parseLongProperty, parseJobCreationMode, and parseStringListProperty.
    • Adjusted the initialization of fields like authProperties, overrideProperties, additionalProjects, labels, maxBytesBilled, retryTimeoutInSeconds, retryInitialDelayInSeconds, retryMaxDelayInSeconds, jobTimeoutInSeconds, useStatelessQueryMode, httpConnectTimeout, httpReadTimeout, requestGoogleDriveScope, metadataFetchThreadCount, requestReason, enableHighThroughputAPI, highThroughputMinTableSize, highThroughputActivationRatio, enableSession, unsupportedHTAPIFallback, useQueryCache, queryDialect, allowLargeResults, destinationTable, destinationDataset, destinationDatasetExpirationTime, kmsKeyName, maxResults, enableWriteAPI, writeAPIAppendRowCount, writeAPIActivationRowCount, filterTablesOnDefaultDataset, sslTrustStorePath, and sslTrustStorePassword to retrieve values from the DataSource object, applying default values where necessary.
  • google-cloud-bigquery-jdbc/src/main/java/com/google/cloud/bigquery/jdbc/BigQueryJdbcUrlUtility.java
    • Removed the java.util.stream.Collectors import.
    • Added java.util.TreeMap import for case-insensitive property handling.
    • Modified parseUriProperty to delegate to the new parseUrlProperties method for parsing the URI.
    • Deleted numerous static helper methods responsible for parsing specific property types (e.g., convertIntToBoolean, parseQueryProperties, parseLabels, parseStringProperty, parseStringListProperty, parseIntProperty, parseLongProperty, parseBooleanProperty, parseJobCreationMode, parseBYOIDProperty, getConnectionPropertyDefaultValue, parseRetryTimeoutInSecs, parseJobTimeout, parseRetryInitialDelayInSecs, parseRetryMaxDelayInSecs, parseConnectionPoolSize, parseListenerPoolSize, parseMaximumBytesBilled, parsePropertiesMap).
    • Refactored parseOverrideProperties into parseOverridePropertiesString, which now directly accepts the override properties string instead of extracting it from the URL.
    • Introduced a new private helper method convertStrToIntBoolean for boolean conversion.
    • Added parseUrlProperties to parse the entire connection URL into a Map<String, String> using a case-insensitive TreeMap.
    • Implemented PROPERTY_SETTERS, a static map of BiConsumer functions, and setDataSourceProperties to dynamically set properties on a DataSource object based on the parsed URL properties. This method also includes validation for unknown properties.
  • google-cloud-bigquery-jdbc/src/main/java/com/google/cloud/bigquery/jdbc/DataSource.java
    • Expanded the class by adding new private fields for a wide range of connection properties, including labels, requestReason, maximumBytesBilled, retryTimeoutInSecs, retryInitialDelayInSecs, retryMaxDelayInSecs, jobTimeout, httpConnectTimeout, httpReadTimeout, swaAppendRowCount, swaActivationRowCount, connectionPoolSize, and listenerPoolSize.
    • Added corresponding public getter and setter methods for all newly introduced fields.
    • Modified the return types of existing getters for enableHighThroughputAPI, highThroughputMinTableSize, highThroughputActivationRatio, unsupportedHTAPIFallback, and enableSession from primitive types (boolean, int) to their wrapper classes (Boolean, Integer) to allow for null values, indicating properties not explicitly set.
    • Updated the createProperties method to include the new properties when generating the Properties object for the connection.
  • google-cloud-bigquery-jdbc/src/test/java/com/google/cloud/bigquery/jdbc/BigQueryJdbcUrlUtilityTest.java
    • Removed numerous static imports from org.junit.Assert and other utility classes (com.google.cloud.bigquery.exception.BigQueryJdbcRuntimeException, com.google.common.collect.Maps, java.util.Arrays, java.util.Collections, java.util.HashMap, java.util.List) that are no longer relevant due to the removal of the methods they tested.
    • Updated existing tests for testConnectionPropertiesFromURIMultiline and testConnectionPropertiesFromURIMultilineNoSemicolon to reflect the change in property name from "Multiline" to "AdditionalProjects".
    • Refactored tests related to override properties (testOverridePropertiesFromURICompatibility, testOverridePropertiesDoesNotAffectOriginalParsersAtEnd, testOverridePropertiesDoesNotParseOutsideOfPrivateServiceConnectUris, testOverridePropertiesDoesNotParserPropertiesInMiddle, testOverridePropertyBeforeProceedingOverrideParameterDoesNotParse, testOverridePropertiesFromURIGoogleExperience, testAllOverridePropertiesFromURIGoogleExperience, testCaseSensitivityOverridePropertiesFromURI) to now use the new parseOverridePropertiesString method and its expected behavior.
    • Removed all tests associated with the deleted parsing methods (e.g., testParseJobCreationMode, testGetConnectionPropertyDefaultValue, testParseRetryTimeoutInSecs, testParseJobTimeout, testParseRetryInitialDelayInSecs, testParseRetryMaxDelayInSecs, testParseRequestGoogleDriveScope_Default, testParseConnectionPoolSize, testParseListenerPoolSize, testParseStringListProperty, testParseIntProperty, testParseMaxBytesBilled, testParseLabels, testParseHttpConnectTimeout, testParseHttpReadTimeout, testParseRequestReason).
Activity
  • No specific activity (comments, reviews, etc.) has been recorded for this pull request.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the handling of JDBC connection properties by centralizing them into a DataSource object. While this aims to improve code structure, it has introduced critical security regressions, including the failure to apply proxy and SSL configurations to the BigQuery connection and a risk of CRLF injection due to unsanitized properties in HTTP headers. Additionally, the implementation contains a potential NullPointerException, incorrect OAuthType logic, broken service account impersonation, and incorrect serialization of map-like properties like labels. The removal of a significant number of tests also leaves the new property parsing logic largely untested.

Comment on lines +155 to 158
if (ds.getOAuthType() != 0) {
this.authProperties.put(
"OAuthType", BigQueryJdbcOAuthUtility.AuthType.values()[ds.getOAuthType()].name());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This block has a potential NullPointerException and incorrect logic for handling OAuthType.

  1. ds.getOAuthType() can throw an NPE if oAuthType is not set in the connection string. This is because DataSource.getOAuthType() returns int but the underlying field is a nullable Integer. To fix this, you should change the return type of DataSource.getOAuthType() to Integer, which would be consistent with other getters changed in this PR (e.g., getEnableHighThroughputAPI).
  2. The condition if (ds.getOAuthType() != 0) incorrectly skips OAuthType=0 (GOOGLE_SERVICE_ACCOUNT), which will cause getCredentials() to fail later as it requires the OAuthType to be present in authProperties.
  3. Based on the previous implementation, OAuthType is a required property. The logic should enforce this.

After changing DataSource.getOAuthType() to return Integer, I suggest replacing this block with the following logic:

    Integer oAuthType = ds.getOAuthType();
    if (oAuthType == null) {
      throw new BigQueryJdbcRuntimeException("Connection property 'OAuthType' is required.");
    }
    this.authProperties.put(
        "OAuthType", BigQueryJdbcOAuthUtility.AuthType.values()[oAuthType].name());

Comment on lines +856 to +869
boolean found =
BYOID_PROPERTIES.stream().anyMatch(key::equalsIgnoreCase)
|| OVERRIDE_PROPERTIES.stream().anyMatch(key::equalsIgnoreCase);
if (!found
&& !key.equalsIgnoreCase("Location")
&& !key.equalsIgnoreCase("OAuthType")
&& !key.equalsIgnoreCase("ProjectId")
&& !key.equalsIgnoreCase("ServiceAccountImpersonationEmail")
&& !key.equalsIgnoreCase("ServiceAccountImpersonationScopes")
&& !key.equalsIgnoreCase("EndpointOverrides")
&& !key.equalsIgnoreCase("ServiceAccountImpersonationTokenLifetime")
&& !key.equalsIgnoreCase("universeDomain")) {
throw new IllegalArgumentException("Unknown connection property: " + key);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This logic for handling unknown properties has two critical issues:

  1. Broken Service Account Impersonation: Properties required for service account impersonation (e.g., ServiceAccountImpersonationEmail, ServiceAccountImpersonationScopes) are whitelisted here but are not actually processed. They are not added to the DataSource object and are not retrieved in BigQueryConnection to be passed to the credential creation logic. This effectively breaks the service account impersonation feature.

  2. Confusing Whitelist: The whitelist contains properties like Location, OAuthType, and ProjectId which are already handled by the PROPERTY_SETTERS map. This is confusing and makes the code harder to maintain.

To fix the impersonation issue, you should add the impersonation properties to the DataSource class, create setters for them in the PROPERTY_SETTERS map, and then retrieve them in BigQueryConnection to populate the authProperties map correctly.

Comment on lines +150 to +152
Map<String, String> parsedProperties = BigQueryJdbcUrlUtility.parseUrlProperties(url);
DataSource ds = new DataSource();
BigQueryJdbcUrlUtility.setDataSourceProperties(ds, parsedProperties);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The refactoring of the BigQueryConnection constructor has introduced a significant security regression. The new implementation fails to initialize critical security-related fields such as httpTransportOptions and transportChannelProvider. These fields are essential for configuring proxy settings (ProxyHost, ProxyPort, etc.) and SSL truststore settings (SSLTrustStore, SSLTrustStorePwd) for the BigQuery client.

Additionally, the new constructor fails to populate authProperties with BYOID_PROPERTIES and overrideProperties with EndpointOverrides.

As a result, any security or authentication configurations provided in the JDBC URL are silently ignored. The driver will connect directly to BigQuery using default JVM settings, bypassing mandatory proxies and custom truststores, and failing to use specified external account credentials or custom endpoints. This can lead to unauthorized data exposure or man-in-the-middle attacks if the application relies on these settings for security.

}
if (this.labels != null) {
connectionProperties.setProperty(
BigQueryJdbcUrlUtility.LABELS_PROPERTY_NAME, this.labels.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using this.labels.toString() is incorrect for serializing the labels map. It will produce a string like {key1=value1, key2=value2}, but the property parser expects a format like key1=value1,key2=value2. This will break the functionality of passing labels via the DataSource.

You should manually format the map into a comma-separated string of key-value pairs. For example:

String labelsString = this.labels.entrySet().stream()
    .map(e -> e.getKey() + "=" + e.getValue())
    .collect(java.util.stream.Collectors.joining(","));
connectionProperties.setProperty(
    BigQueryJdbcUrlUtility.LABELS_PROPERTY_NAME, labelsString);

Note that you will need to add an import for java.util.stream.Collectors.

BigQueryJdbcUrlUtility.DEFAULT_METADATA_FETCH_THREAD_COUNT_VALUE;
}

this.requestReason = ds.getRequestReason();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The RequestReason property from the JDBC URL is used to populate the x-goog-request-reason HTTP header without proper sanitization. Since the value is URL-decoded in BigQueryJdbcUrlUtility.parseUrlProperties, an attacker can inject CRLF sequences (%0D%0A) into the JDBC URL to perform HTTP header injection.

Similarly, the PartnerToken property (parsed in BigQueryJdbcUrlUtility.parsePartnerTokenProperty) is included in the user-agent header and can contain literal newlines if they are present in the JDBC URL.

Attacker-controlled headers can be used to bypass security controls or manipulate API behavior.

if (eqIdx != -1) {
String key = part.substring(0, eqIdx).trim();
String val = part.substring(eqIdx + 1).trim();
properties.put(key, CharEscapers.decodeUriPath(val));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The parseUrlProperties method decodes property values using CharEscapers.decodeUriPath. When these decoded values are subsequently used in HTTP headers (such as RequestReason being used for the x-goog-request-reason header), it enables CRLF injection if the values are not sanitized. Any property intended for use in an HTTP header must be validated to ensure it does not contain control characters like \r or \n.

Comment on lines +301 to +305
if (ds.getHttpConnectTimeout() != null) {
this.httpConnectTimeout = ds.getHttpConnectTimeout();
} else {
this.httpConnectTimeout = null;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This if-else block is redundant. You can directly assign the value from ds.getHttpConnectTimeout() to this.httpConnectTimeout. The same applies to httpReadTimeout in the next block.

    this.httpConnectTimeout = ds.getHttpConnectTimeout();

Comment on lines +607 to +611
for (Map.Entry<String, String> entry : props.entrySet()) {
if (entry.getKey().equalsIgnoreCase(property)) {
return entry.getValue();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The parseUrlProperties method returns a TreeMap that uses String.CASE_INSENSITIVE_ORDER. Therefore, you can directly use props.get(property) instead of iterating through the map to find the key in a case-insensitive manner. This will make the code more concise and efficient.

      return props.get(property);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: bigquery Issues related to the googleapis/java-bigquery API. size: xl Pull request size is extra large.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant