[FLINK-14925][table] Support precision-aware TO_TIMESTAMP with format-based inference#27793
[FLINK-14925][table] Support precision-aware TO_TIMESTAMP with format-based inference#27793raminqaf wants to merge 6 commits intoapache:masterfrom
Conversation
...able-planner/src/test/java/org/apache/flink/table/planner/functions/TimeFunctionsITCase.java
Show resolved
Hide resolved
...ntime/src/main/java/org/apache/flink/table/runtime/functions/scalar/ToTimestampFunction.java
Outdated
Show resolved
Hide resolved
...ntime/src/main/java/org/apache/flink/table/runtime/functions/scalar/ToTimestampFunction.java
Outdated
Show resolved
Hide resolved
| try { | ||
| return parseTimestampData(timestamp.toString()); | ||
| } catch (DateTimeException e) { | ||
| return null; | ||
| } |
There was a problem hiding this comment.
@twalthr & @snuyanzin
I have added this because of this test:
Currently for the TO_TIMESTAMP_LTZ('abc') function we are not returning null but throw an exception. Should this be handled to return null? If yes, I can make a followup issue/PR
docs/data/sql_functions_zh.yml
Outdated
|
|
||
| - string1: the timestamp string to parse | ||
| - string2: the format pattern (default 'yyyy-MM-dd HH:mm:ss'). The pattern follows Java's DateTimeFormatter syntax, where 'S' represents fractional seconds (e.g., 'SSS' for milliseconds, 'SSSSSSSSS' for nanoseconds). | ||
| - string2: the format pattern (default 'yyyy-MM-dd HH:mm:ss'). The pattern follows Java's [DateTimeFormatter](https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html) syntax, where 'S' represents fractional seconds (e.g., 'SSS' for milliseconds, 'SSSSSSSSS' for nanoseconds). |
There was a problem hiding this comment.
this seems not fixed yet
| "type" : "TIMESTAMP(3)" | ||
| }, | ||
| "serializableString" : "TO_TIMESTAMP(`c`)" | ||
| "serializableString" : "`TO_TIMESTAMP`(`c`)" |
There was a problem hiding this comment.
what is the reason of such change?
Usually we don't need to quote function names
There was a problem hiding this comment.
This change is a side effect of migrating TO_TIMESTAMP from the legacy FlinkSqlOperatorTable + StringCallGen code generation path to the modern BridgingSqlFunction + runtimeClass pattern (matching how TO_TIMESTAMP_LTZ is implemented).
The serializableString is produced by CallExpression.asSerializableString(). When the function is resolved through FunctionCatalog (the bridging path), the serialization goes through getSerializableFunctionName() → EncodingUtils.escapeIdentifier(), which wraps identifiers in backticks.
What is the purpose of the change
This pull request makes the
TO_TIMESTAMPfunction precision-aware when a format pattern is provided. Previously,TO_TIMESTAMPalways returnedTIMESTAMP(3)regardless of the format pattern's fractional second precision, which forced users to lose sub-millisecond data. This is theTO_TIMESTAMPcounterpart to theTO_TIMESTAMP_LTZprecision support added in FLINK-39244.The output type for the 1-arg variant remains
TIMESTAMP(3)for backward compatibility. For the 2-arg variant, precision isinferred from the format pattern's trailing
Scount (e.g.,SSSSSS→TIMESTAMP(6)), with a minimum of 3.As part of this change,
TO_TIMESTAMPis migrated from the legacy Calcite-native function pattern (FlinkSqlOperatorTable + StringCallGen codegen) to the modern bridging function pattern (BuiltInFunctionDefinition + runtimeClass), matching howTO_TIMESTAMP_LTZis implemented. This was made possible by fixing the function name from camelCase"toTimestamp"to"TO_TIMESTAMP", which allowsCoreModuleto resolve it correctly for SQL queries without needing a separateFlinkSqlOperatorTableentry.Brief change log
ToTimestampTypeStrategy): New output type strategy that returnsTIMESTAMP(3)for the 1-arg variantand
TIMESTAMP(max(sCount, 3))for the 2-arg variant, wheresCountis inferred from the format pattern's trailingScharacters.
ToTimestampFunction): New runtime class witheval(StringData)andeval(StringData, StringData)methods. The 2-arg variant passes
precisionFromFormat(format)toparseTimestampDatafor precision-aware parsing.BuiltInFunctionDefinitions): Changed name from"toTimestamp"to"TO_TIMESTAMP"(removing the need for explicitsqlName), addedruntimeClass, and switched output type strategy toSpecificTypeStrategies.TO_TIMESTAMP.FlinkSqlOperatorTable.TO_TIMESTAMP,DirectConvertRulemapping,StringCallGencases, and
BuiltInMethods.STRING_TO_TIMESTAMP/STRING_TO_TIMESTAMP_WITH_FORMAT— all superseded by the bridging function mechanism.sql_functions.yml,sql_functions_zh.yml, and Pythonexpressions.py/expression.pydocstrings with precision-dependent output types and examples.
Verifying this change
This change added tests and can be verified as follows:
ToTimestampTypeStrategyTestcovering 1-arg default precision, 2-arg format-basedprecision (SSS/SSSSSS/SSSSSSSSS/no-S), invalid argument types, and argument count validation.
TimeFunctionsITCasefor 1-arg truncation to precision 3, 2-arg precision 6/9 from format, SSS format staying at precision 3, fewer input digits than format precision, unparsable string, and null input.TemporalTypesTest.scalathat are now covered by the newTimeFunctionsITCasetests.Does this pull request potentially affect one of the following parts:
@Public(Evolving): noToTimestampFunction.eval()methods call the sameDateTimeUtils.parseTimestampDatamethods as before.Documentation