[Feature] Implement string function ord with unicode alias following DuckDB semantics#60409
[Feature] Implement string function ord with unicode alias following DuckDB semantics#60409
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Co-authored-by: zclllyybb <61408379+zclllyybb@users.noreply.github.com>
Co-authored-by: zclllyybb <61408379+zclllyybb@users.noreply.github.com>
|
run buildall |
TPC-H: Total hot run time: 31748 ms |
ClickBench: Total hot run time: 28.61 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
| void register_function_string(SimpleFunctionFactory& factory) { | ||
| factory.register_function<FunctionStringParseDataSize>(); | ||
| factory.register_function<FunctionStringASCII>(); | ||
| factory.register_function<FunctionStringOrd>(); |
There was a problem hiding this comment.
in duckdb, ord is just an alias for unicode, detail see duckdb.string_function.
So I think we could add an unicode alias here
Co-authored-by: zclllyybb <61408379+zclllyybb@users.noreply.github.com>
|
Add constant folding implementation to |
Co-authored-by: zclllyybb <61408379+zclllyybb@users.noreply.github.com>
|
run buildall |
TPC-H: Total hot run time: 30407 ms |
ClickBench: Total hot run time: 28.33 s |
FE UT Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by anyone and no changes requested. |
|
PR approved by at least one committer and no changes requested. |
|
感觉实现的不太好啊 |
|
|
||
| public static final List<FunctionSignature> SIGNATURES = ImmutableList.of( | ||
| FunctionSignature.ret(BigIntType.INSTANCE).args(VarcharType.SYSTEM_DEFAULT), | ||
| FunctionSignature.ret(BigIntType.INSTANCE).args(StringType.INSTANCE) |
There was a problem hiding this comment.
为啥是返回int64呢?我看实现好像int32就可以了啊。
There was a problem hiding this comment.
ScalarFunction UnicodeFun::GetFunction() {
return ScalarFunction({LogicalType::VARCHAR}, LogicalType::INTEGER,
ScalarFunction::UnaryFunction<string_t, int32_t, UnicodeOperator>);
}duckdb是返回int32的
Implements
ord(string)function that returns the Unicode code point of the first character, following DuckDB semantics. Addsunicodeas an alias.Backend
StringOrdinfunction_string.cppwith proper UTF-8 decoding (1-4 byte sequences)Int64to accommodate full Unicode range (U+0000 to U+10FFFF)unicodealias viaregister_alias()Frontend
Ord.javascalar function returningBigIntTypeordandunicodeinBuiltinScalarFunctions.javaScalarFunctionVisitor.javaFE Constant Folding
@ExecFunctionimplementations inStringArithmetic.javafor compile-time evaluationTests
query_p0andnereids_p0suitesfold_constant_string_arithmatic.groovyExample
Key difference from
ascii()ascii()returns the first byte value;ord()/unicode()decodes UTF-8 and returns the actual Unicode code point.💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.