Add file_extension, enable_legacy_filename fields to BlobType#3
Merged
ddl-rliu merged 1 commit into1.16.4-dominofrom Mar 26, 2026
Merged
Add file_extension, enable_legacy_filename fields to BlobType#3ddl-rliu merged 1 commit into1.16.4-dominofrom
ddl-rliu merged 1 commit into1.16.4-dominofrom
Conversation
Add a new `file_extension` string field to the BlobType protobuf message, allowing FlyteFile to optionally specify a file extension (e.g. "csv") that flytecopilot appends when writing blobs to local disk during the download phase (e.g. "data.csv"). When empty (the default), behavior is unchanged. Add a new `enable_legacy_filename` bool field to the BlobType protobuf message, allowing FlyteFile to optionally specify whether to preserve backward compatibility for tasks that read from the extensionless path. Regenerated all protobuf bindings (Go, Python, JS, Rust, ES, Swagger). Signed-off-by: ddl-rliu <richard.liu@dominodatalab.com>
This was referenced Mar 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Upstream flyteorg#7009
Tracking issue
Closes flyteorg#7024 [BUG] [copilot] File extensions are missing when copilot downloads Blob/FlyteFile inputs
Why are the changes needed?
We'll assume the above behavior is not a bug (it is long-standing, and a "bugfix" will likely break existing workflows). Instead, this PR proposes enhancements to FlyteFile/Blob to support writing workflow inputs with the file extension. This PR includes several changes across flyteidl and flytecopilot.
The enhancements allow workflows to be flexible when writing blobs. Specifically, the existing behavior where flytecopilot writes blobs during the download phase without the file extension (e.g. "inputs/data") can now be enhanced so that the file extension is included (e.g. "inputs/data.csv").
What changes were proposed in this pull request?
[flyteidl]
Add a new
file_extensionstring field to the BlobType protobuf message, allowing FlyteFile to optionally specify a file extension (e.g. "csv") that flytecopilot appends when writing blobs during the download phase (e.g. "data.csv"). When empty (the default), behavior is unchanged.Add a new
enable_legacy_filenamebool field to the BlobType protobuf message, allowing FlyteFile to optionally specify whether to preserve backward compatibility for tasks that read from the extensionless path.Regenerated all protobuf bindings (Go, Python, JS, Rust, ES, Swagger).
[flytecopilot]
The copilot download phase infers the desired download behavior(s) from the input interface.
[flytekit] PR: flyteorg/flytekit#3406
Alternatives considered
The PR's approach configures the file download behaviors at the BlobType level (e.g. per FlyteFile). This has several pros (granularity, explicitness).
But, one con is that unlike
BlobType.format(which can be inferred from the output filename e.g. "data.csv" ->format: "csv"), the new fieldsBlobType.file_extension, BlobType.enable_legacy_filenamecould not be inferred from the output filename (does "data.csv" match tofile_extension: "csv"?). Ultimately, this seems like it is introducing a minor inconsistency, but acceptable given the benefits. (It also seems like this is all hypothetical - copilot upload does not actually infer the format from the filename, so this concern may be somewhat moot?)Here are the other approaches I considered:
1. New flyte-copilot CLI flags
--file_extension_config ENUM=(disabled, enabled, legacy)file_extension_config = disabled (by default) - Same behavior as today (
data: FlyteFile[csv]is written tooutputs/data)file_extension_config = enabled - New behavior as today (
data: FlyteFile[csv]is written tooutputs/data.csv)file_extension_config = legacy - New behavior but backwards compatible, (
data: FlyteFile[csv]is written tooutputs/dataandoutputs/data.csv)Cons:
2. No changes, user code should be modified to read from the existing path e.g.
dataand add the extension itselfCons:
How was this patch tested?
Labels
Please add one or more of the following labels to categorize your PR:
This is important to improve the readability of release notes.
Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link