Skip to content

[Feature] Add FE constant folding for cosine_similarity and standardize test patterns#60403

Merged
zclllyybb merged 4 commits intomasterfrom
copilot/implement-cosine-similarity-function
Feb 24, 2026
Merged

[Feature] Add FE constant folding for cosine_similarity and standardize test patterns#60403
zclllyybb merged 4 commits intomasterfrom
copilot/implement-cosine-similarity-function

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Jan 31, 2026

Adds FE constant folding support for cosine_similarity and replaces try-catch blocks in regression tests with the standard test{sql, exception} pattern.

FE Constant Folding

  • New ArrayArithmetic.java executable class with @ExecFunction annotation
  • Registered in ExpressionEvaluator.java for constant folding during query optimization
  • Implements null element validation, array size checks, and zero vector handling
@ExecFunction(name = "cosine_similarity")
public static Expression cosineSimilarity(ArrayLiteral array1, ArrayLiteral array2) {
    // Computes: dot(x, y) / (||x|| * ||y||)
    // Returns FloatLiteral
}

Test Pattern Standardization

Replaced 9 try-catch blocks with test{sql, exception} pattern in test_array_distance_functions.groovy:

// Before
try {
    sql "SELECT cosine_similarity([1, 2], [1, 2, 3])"
} catch (Exception ex) {
    assert("${ex}".contains("different input element sizes"))
}

// After
test {
    sql "SELECT cosine_similarity([1, 2], [1, 2, 3])"
    exception "function cosine_similarity have different input element sizes"
}

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copilot AI and others added 2 commits January 31, 2026 08:31
Co-authored-by: zclllyybb <61408379+zclllyybb@users.noreply.github.com>
Co-authored-by: zclllyybb <61408379+zclllyybb@users.noreply.github.com>
Copilot AI changed the title [WIP] Add Array function cosine_similarity implementation [Feature] Implement cosine_similarity array function Jan 31, 2026
Copilot AI requested a review from zclllyybb January 31, 2026 08:36
@zclllyybb
Copy link
Copy Markdown
Contributor

run buildall

@doris-robot
Copy link
Copy Markdown

BE UT Coverage Report

Increment line coverage 100.00% (15/15) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.52% (19289/36724)
Line Coverage 36.00% (179249/497974)
Region Coverage 32.41% (139048/429043)
Branch Coverage 33.35% (60154/180353)

@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 31615 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ee407798f640e58bcf140c4eeeed1363113ff6cf, data reload: false

------ Round 1 ----------------------------------
q1	17620	5381	5052	5052
q2	2067	354	191	191
q3	10154	1288	738	738
q4	10230	852	312	312
q5	7532	2150	1881	1881
q6	188	178	149	149
q7	881	748	621	621
q8	9260	1427	1090	1090
q9	5246	4843	4802	4802
q10	6827	1945	1550	1550
q11	490	293	266	266
q12	344	371	223	223
q13	17797	4076	3236	3236
q14	250	241	224	224
q15	909	808	819	808
q16	706	669	623	623
q17	651	831	436	436
q18	6787	6604	6426	6426
q19	1235	982	604	604
q20	384	342	230	230
q21	2575	1979	1891	1891
q22	355	315	262	262
Total cold run time: 102488 ms
Total hot run time: 31615 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5336	5318	5306	5306
q2	267	326	253	253
q3	2172	2681	2274	2274
q4	1352	1737	1313	1313
q5	4242	4188	4082	4082
q6	226	186	139	139
q7	2017	2102	1916	1916
q8	2710	2451	2397	2397
q9	7465	7444	7555	7444
q10	2817	3023	2661	2661
q11	559	474	460	460
q12	761	746	600	600
q13	3881	4432	3467	3467
q14	290	352	331	331
q15	891	881	809	809
q16	653	739	659	659
q17	1179	1617	1390	1390
q18	8144	8096	7970	7970
q19	864	840	804	804
q20	2147	2142	2057	2057
q21	4739	4149	4170	4149
q22	545	571	508	508
Total cold run time: 53257 ms
Total hot run time: 50989 ms

@doris-robot
Copy link
Copy Markdown

ClickBench: Total hot run time: 28.07 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ee407798f640e58bcf140c4eeeed1363113ff6cf, data reload: false

query1	0.05	0.05	0.04
query2	0.09	0.04	0.05
query3	0.26	0.08	0.09
query4	1.60	0.11	0.11
query5	0.27	0.24	0.26
query6	1.16	0.68	0.67
query7	0.03	0.02	0.02
query8	0.06	0.04	0.05
query9	0.56	0.50	0.50
query10	0.55	0.54	0.55
query11	0.14	0.10	0.09
query12	0.14	0.10	0.11
query13	0.64	0.61	0.63
query14	1.07	1.07	1.05
query15	0.88	0.86	0.88
query16	0.38	0.39	0.41
query17	1.15	1.12	1.11
query18	0.23	0.20	0.21
query19	2.10	1.88	2.08
query20	0.02	0.01	0.01
query21	15.44	0.23	0.14
query22	5.07	0.05	0.04
query23	15.73	0.27	0.10
query24	2.98	0.63	0.27
query25	0.08	0.11	0.08
query26	0.14	0.13	0.13
query27	0.06	0.05	0.06
query28	4.60	1.13	0.97
query29	12.55	3.98	3.16
query30	0.27	0.13	0.14
query31	2.81	0.66	0.40
query32	3.23	0.60	0.49
query33	3.26	3.23	3.37
query34	16.24	5.42	4.69
query35	4.74	4.74	4.83
query36	0.65	0.52	0.49
query37	0.11	0.07	0.06
query38	0.07	0.04	0.04
query39	0.05	0.03	0.03
query40	0.18	0.17	0.16
query41	0.09	0.04	0.03
query42	0.04	0.03	0.03
query43	0.05	0.04	0.03
Total cold run time: 99.82 s
Total hot run time: 28.07 s

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (15/15) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.58% (25767/35999)
Line Coverage 54.21% (269290/496797)
Region Coverage 51.85% (224741/433453)
Branch Coverage 53.16% (96262/181085)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 92.31% (12/13) 🎉
Increment coverage report
Complete coverage report

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Feb 3, 2026

PR approved by anyone and no changes requested.

@linrrzqqq
Copy link
Copy Markdown
Contributor

does this need constant folding?

…ion} pattern

Co-authored-by: zclllyybb <61408379+zclllyybb@users.noreply.github.com>
Copilot AI changed the title [Feature] Implement cosine_similarity array function [Feature] Add FE constant folding for cosine_similarity and standardize test patterns Feb 9, 2026
@zclllyybb
Copy link
Copy Markdown
Contributor

run buildall

@zclllyybb zclllyybb marked this pull request as ready for review February 10, 2026 01:45
@doris-robot
Copy link
Copy Markdown

TPC-H: Total hot run time: 30306 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 191b1a3a4412dbbed15fca78baed6e62621d2935, data reload: false

------ Round 1 ----------------------------------
q1	17602	4691	4342	4342
q2	2007	356	238	238
q3	10153	1276	744	744
q4	10196	775	307	307
q5	7534	2184	1925	1925
q6	194	175	145	145
q7	884	728	625	625
q8	9298	1420	1186	1186
q9	4674	4655	4581	4581
q10	6766	1930	1508	1508
q11	504	306	277	277
q12	335	369	224	224
q13	17779	4086	3220	3220
q14	242	234	216	216
q15	918	817	811	811
q16	676	659	619	619
q17	697	782	547	547
q18	6337	5745	5804	5745
q19	1097	968	611	611
q20	494	496	375	375
q21	2494	1830	1777	1777
q22	360	328	283	283
Total cold run time: 101241 ms
Total hot run time: 30306 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4376	4346	4370	4346
q2	260	332	263	263
q3	2094	2657	2191	2191
q4	1348	1700	1272	1272
q5	4289	4175	4289	4175
q6	211	174	137	137
q7	1884	1784	1662	1662
q8	2418	2709	2442	2442
q9	7580	7655	7459	7459
q10	2936	3108	2648	2648
q11	563	483	468	468
q12	718	810	663	663
q13	4108	4381	3704	3704
q14	417	327	288	288
q15	873	799	800	799
q16	689	712	863	712
q17	1181	1287	1334	1287
q18	8216	7972	7841	7841
q19	877	850	862	850
q20	2152	2145	2040	2040
q21	4880	4476	4090	4090
q22	596	542	523	523
Total cold run time: 52666 ms
Total hot run time: 49860 ms

@doris-robot
Copy link
Copy Markdown

ClickBench: Total hot run time: 28.46 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 191b1a3a4412dbbed15fca78baed6e62621d2935, data reload: false

query1	0.05	0.05	0.05
query2	0.09	0.05	0.05
query3	0.26	0.09	0.09
query4	1.60	0.11	0.11
query5	0.26	0.24	0.27
query6	1.16	0.68	0.68
query7	0.03	0.03	0.02
query8	0.05	0.03	0.04
query9	0.56	0.49	0.51
query10	0.55	0.56	0.55
query11	0.13	0.09	0.10
query12	0.14	0.11	0.11
query13	0.64	0.61	0.61
query14	1.07	1.04	1.06
query15	0.87	0.87	0.87
query16	0.40	0.39	0.41
query17	1.09	1.17	1.15
query18	0.23	0.21	0.21
query19	2.11	1.98	2.01
query20	0.02	0.02	0.02
query21	15.44	0.26	0.15
query22	5.45	0.06	0.05
query23	16.30	0.27	0.11
query24	0.95	0.42	0.39
query25	0.13	0.07	0.07
query26	0.15	0.14	0.14
query27	0.06	0.06	0.05
query28	3.98	1.14	0.97
query29	12.55	3.97	3.18
query30	0.28	0.13	0.12
query31	2.82	0.66	0.41
query32	3.25	0.60	0.51
query33	3.29	3.21	3.25
query34	16.09	5.30	4.67
query35	4.80	4.79	4.81
query36	0.63	0.51	0.48
query37	0.12	0.07	0.07
query38	0.07	0.05	0.04
query39	0.05	0.03	0.03
query40	0.20	0.17	0.16
query41	0.09	0.03	0.02
query42	0.04	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 98.09 s
Total hot run time: 28.46 s

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 2.38% (1/42) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (15/15) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.72% (19443/36883)
Line Coverage 36.21% (180997/499916)
Region Coverage 32.58% (140487/431178)
Branch Coverage 33.62% (60864/181055)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (15/15) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.35% (26508/36139)
Line Coverage 56.39% (281221/498678)
Region Coverage 54.00% (235206/435560)
Branch Coverage 55.66% (101165/181759)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 95.24% (40/42) 🎉
Increment coverage report
Complete coverage report

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Feb 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@zclllyybb zclllyybb merged commit 579dd10 into master Feb 24, 2026
31 of 33 checks passed
@zclllyybb zclllyybb deleted the copilot/implement-cosine-similarity-function branch February 24, 2026 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.x dev/4.0.x-conflict reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants