Skip to content

Core: Expose HostnameVerificationPolicy in TLSConfigurer#15500

Open
adutra wants to merge 5 commits intoapache:mainfrom
adutra:http-client-hostname-verification-policy
Open

Core: Expose HostnameVerificationPolicy in TLSConfigurer#15500
adutra wants to merge 5 commits intoapache:mainfrom
adutra:http-client-hostname-verification-policy

Conversation

@adutra
Copy link
Contributor

@adutra adutra commented Mar 3, 2026

Fixes #15598.

This change exposes HostnameVerificationPolicy in TLSConfigurer. The default policy is set to CLIENT.

I also added a unit test that intentionally uses a custom hostname verifier that is incompatible with JSSE hostname verification strategies. Thus, the test cannot pass unless the verification policy is set to CLIENT.

@adutra
Copy link
Contributor Author

adutra commented Mar 3, 2026

FYI @bryanck

@adutra adutra force-pushed the http-client-hostname-verification-policy branch from ad082ad to 2858de7 Compare March 3, 2026 11:17
@adutra adutra changed the title Expose HostnameVerificationPolicy in TLSConfigurer Core: Expose HostnameVerificationPolicy in TLSConfigurer Mar 3, 2026
@singhpk234
Copy link
Contributor

This component is particularly useful when attempting to bypass hostname verification, e.g. by using the NoopHostnameVerifier.

Alex, why would we allow such a verification to be bypassed, do you have some real world use case for this ?

awssdk-s3accessgrants = { module = "software.amazon.s3.accessgrants:aws-s3-accessgrants-java-plugin", version.ref = "awssdk-s3accessgrants" }
azuresdk-bom = { module = "com.azure:azure-sdk-bom", version.ref = "azuresdk-bom" }
bson = { module = "org.mongodb:bson", version.ref = "bson-ver"}
bouncycastle-bcpkix = { module = "org.bouncycastle:bcpkix-jdk18on", version.ref = "bouncycastle" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is new dependency, do we have to add the licence of this dependency to one of the LICENSE files in Iceberg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, since it's used only in tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's good because bouncycastle complicates export rules, or at least due diligence

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumes the bouncycastle bcpkix is a transitive dep already pulled in by the mockerserver lib. is the purpose here just to upgrade the version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good question! Short answer: yes.

MockServer does bring BouncyCastle transitively — bcprov, bcpkix, and bcutil – but at version 1.72, which is somewhat old.

The problem is that other deps (Hadoop) bring bcprov 1.82, and Gradle resolves the conflict by upgrading bcprov to 1.82. But bcpkix and bcutil stay at 1.72, since nothing else declares a higher version. This creates a mismatch: bcpkix 1.72 + bcprov 1.82 = NoClassDefFoundError.

I only upgraded bcpkix here, because that is enough for running the TLS tests. If you prefer, I can also explicitly upgrade bcutil.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is probably good to just ensure the same version for bouncycastle family of jars. That is the current practice. E.g., there is just one version number for the two jetty jars in the toml file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right: I implemented your suggestion.

@adutra
Copy link
Contributor Author

adutra commented Mar 4, 2026

Alex, why would we allow such a verification to be bypassed, do you have some real world use case for this ?

You could find yourself in the situation where the catalog server has TLS enabled, and its certificate shows a SAN of e.g. catalog.bigcorp.com; but if the client/engine is in the same cluster/network, it could actually be contacting the catalog through its internal IP instead, e.g. https://1.2.3.4:8181/api/catalog. In that case, the hostname verification will fail.

(Please note: I mentioned NoopHostnameVerifier just as an example of possible usage of HostnameVerificationPolicy.)

@adutra
Copy link
Contributor Author

adutra commented Mar 4, 2026

And I forgot to mention: NoopHostnameVerifier is also quite useful in tests :-)

@adutra
Copy link
Contributor Author

adutra commented Mar 8, 2026

@bryanck since you introduced TLSConfigurer: are you OK with the changes in this PR?

adutra added 2 commits March 10, 2026 18:45
Apache HttpClient 5.4 introduced a new component: `HostnameVerificationPolicy`, which determines whether hostname verification is done by the JSSE provider (at socket level, during TLS handshake), the HttpClient (after TLS handshake), or both.

This change exposes `HostnameVerificationPolicy` in `TLSConfigurer`. This component is particularly useful when attempting to bypass hostname verification, e.g. by using the `NoopHostnameVerifier`. The default policy is set to `BOTH`, which produces the same result as before.
@adutra adutra force-pushed the http-client-hostname-verification-policy branch from 2858de7 to 6e3cfb8 Compare March 10, 2026 17:46
@adutra
Copy link
Contributor Author

adutra commented Mar 10, 2026

Update: I was able to find the exact change that caused this regression:

In 1.10 we use httpclient5 version 5.5, where the DefaultClientTlsStrategy constructor used by Iceberg is as follows:

    public DefaultClientTlsStrategy(
            final SSLContext sslContext,
            final String[] supportedProtocols,
            final String[] supportedCipherSuites,
            final SSLBufferMode sslBufferManagement,
            final HostnameVerifier hostnameVerifier) {
        this(sslContext, supportedProtocols, supportedCipherSuites, sslBufferManagement, HostnameVerificationPolicy.CLIENT, hostnameVerifier);
    }

https://github.com/apache/httpcomponents-client/blob/c5bd9af6a47af3f2683209f0b818f1cf109026f6/httpclient5/src/main/java/org/apache/hc/client5/http/ssl/DefaultClientTlsStrategy.java#L124-L131

But in 1.11 we upgraded httpclient5 to version 5.6, where the same constructor becomes:

    public DefaultClientTlsStrategy(
            final SSLContext sslContext,
            final String[] supportedProtocols,
            final String[] supportedCipherSuites,
            final SSLBufferMode sslBufferManagement,
            final HostnameVerifier hostnameVerifier) {
        this(sslContext, supportedProtocols, supportedCipherSuites, sslBufferManagement, null, hostnameVerifier);
    }

https://github.com/apache/httpcomponents-client/blob/cee67d86809aa23577968f9e7e7bf922a9892512/httpclient5/src/main/java/org/apache/hc/client5/http/ssl/DefaultClientTlsStrategy.java#L127

Passing null instead of HostnameVerificationPolicy.CLIENT is not the same when there is a non-null hostnameVerifier:

https://github.com/apache/httpcomponents-client/blob/master/httpclient5/src/main/java/org/apache/hc/client5/http/ssl/AbstractClientTlsStrategy.java#L101

So, this is imho clearly a regression, and in fact the default value for TLSConfigurer.hostnameVerificationPolicy() should be CLIENT, not BOTH if we want to restore the 1.10 behavior:

default HostnameVerificationPolicy hostnameVerificationPolicy() {
    return HostnameVerificationPolicy.CLIENT;
  }

I will change that.

@singhpk234 could you please add this to the 1.11 milestone? Now I'm really convinced it's a regression.

@danielcweeks danielcweeks self-requested a review March 11, 2026 16:25
@singhpk234 singhpk234 self-requested a review March 11, 2026 16:27
@adutra
Copy link
Contributor Author

adutra commented Mar 12, 2026

@danielcweeks and @singhpk234 : as requested, I created an issue to better explain why I think we have a regression and how this PR fixes it: #15598. PTAL 🙏

@stevenzwu
Copy link
Contributor

You could find yourself in the situation where the catalog server has TLS enabled, and its certificate shows a SAN of e.g. catalog.bigcorp.com; but if the client/engine is in the same cluster/network, it could actually be contacting the catalog through its internal IP instead, e.g. https://1.2.3.4:8181/api/catalog. In that case, the hostname verification will fail.

(Please note: I mentioned NoopHostnameVerifier just as an example of possible usage of HostnameVerificationPolicy.)

I am wondering about the default value of HostnameVerificationPolicy.CLIENT. I know the purpose is maintain the same behavior. Unit test can configure NoopHostnameVerifier to bypass the check or mapping from ip to the hostname.

Is HostnameVerificationPolicy.BOTH the safer config for prod env and used as the default? if prod env also wants IP address to work, the safer practice is to add the IP address to the Subject Alternative Name (SAN) .

@stevenzwu stevenzwu added this to the Iceberg 1.11.0 milestone Mar 20, 2026
exclude group: 'junit'
}
testImplementation libs.awaitility
testImplementation libs.bouncycastle.bcpkix
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth at least adding a comment on why these dependencies are being added, since it's not obvious

@steveloughran
Copy link
Contributor

Is HostnameVerificationPolicy.BOTH the safer config for prod env and used as the default?

Best to lock down and require relaxation in deployments than to unlock and regret it.

More formally CWE-1188: Initialization of a Resource with an Insecure Default

@adutra
Copy link
Contributor Author

adutra commented Mar 20, 2026

I am wondering about the default value of HostnameVerificationPolicy.CLIENT.

BOTH is safer, but also introduces a behavioral change compared to 1.10. Are we OK with that?

@stevenzwu
Copy link
Contributor

I am wondering about the default value of HostnameVerificationPolicy.CLIENT.

BOTH is safer, but also introduces a behavioral change compared to 1.10. Are we OK with that?

Can you send an email to dev@ to get broader feedback? It might be ok introduce a slight behavior change. Most production use cases probably should use DNS anyway. For IP address usage, users can add the IP address to the cert SAN.

This is the only question I have for this PR. Exposing hostnameVerificationPolicy in the TLS configurer makes sense. Unit test can configure it to noop to use the loopback address.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HTTPClient: regression in TLS hostname verification

6 participants