Skip to content

[oneshot] SS-49 Add validation for NOT NULL columns to COPY FROM query planning#34862

Merged
patrickwwbutler merged 9 commits intoMaterializeInc:mainfrom
patrickwwbutler:patrick/oneshot-not-null
Feb 19, 2026
Merged

[oneshot] SS-49 Add validation for NOT NULL columns to COPY FROM query planning#34862
patrickwwbutler merged 9 commits intoMaterializeInc:mainfrom
patrickwwbutler:patrick/oneshot-not-null

Conversation

@patrickwwbutler
Copy link
Copy Markdown
Contributor

@patrickwwbutler patrickwwbutler commented Jan 29, 2026

Currently, if a NOT NULL column in the destination table does not have a default, and is not specified to be copied into by the COPY FROM statement, materialize will panic at the time of Row encoding, which is very undesirable behavior. By checking in the adapter sequencing logic, we can reject the COPY FROM statement earlier on with a descriptive error message.

Motivation

  • This PR fixes a recognized bug.

https://github.com/MaterializeInc/database-issues/issues/9886

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@patrickwwbutler patrickwwbutler requested review from a team and mtabebe January 29, 2026 17:15
@patrickwwbutler
Copy link
Copy Markdown
Contributor Author

this PR is "stacked" on top of #34849, so it technically includes the changes from there, as well.

@patrickwwbutler patrickwwbutler changed the title [oneshot] Add validation for NOT NULL columns to COPY FROM query planning [oneshot] SS-49 Add validation for NOT NULL columns to COPY FROM query planning Jan 29, 2026
@patrickwwbutler patrickwwbutler force-pushed the patrick/oneshot-not-null branch from bffa533 to c9096b1 Compare February 9, 2026 16:19
@patrickwwbutler patrickwwbutler marked this pull request as ready for review February 9, 2026 21:51
Copy link
Copy Markdown
Contributor

@ggevay ggevay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote some comments.

Also, it seems that the data that we read from the file can also have nulls, in which case we still crash. For example, adding the following to copy-from-s3-minio.td crashes (hello,,world means 3 fields, of which the middle one is null):

> CREATE TABLE t5 (a text, b text not null, c text)

$ s3-file-upload bucket=copytos3 key=csv/with_null.csv
hello,,world

$ s3-set-presigned-url bucket=copytos3 key=csv/with_null.csv var-name=nulls_in_csv_url

> COPY INTO t5 (a,b,c) FROM '${nulls_in_csv_url}' (FORMAT CSV);

So, it seems we should do some validation when we have the actual data at hand. (In which case we might not even need a validation in planning or sequencing at all, since the later validation could cover also those nulls that come from default values.)

Comment thread src/adapter/src/coord/sequencer/inner/copy_from.rs
Comment thread src/adapter/src/coord/sequencer/inner/copy_from.rs Outdated
Comment thread src/adapter/src/coord/sequencer/inner/copy_from.rs Outdated
Copy link
Copy Markdown
Contributor

@ggevay ggevay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from the Adapter side, but maybe wait for a S&S team review as well.

Copy link
Copy Markdown
Contributor

@ublubu ublubu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks like two independent bugfixes.

I've looked at the "check the nullness in each row" and left one comment. That bugfix seems pretty straightforward. Idk if you want to pull that out and merge it separately.

The other bugfix will take some thought. I'll look through it next.

match maybe_row {
// Happy path, add the Row to our batch!
Ok(row) => {
// Happy path, add the Row to our batch !
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did that space get in there?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm it appears I accidentally pressed the space bar with my cursor there, will fix lol

Comment on lines +594 to +606
_ => {
let err =
StorageErrorXKind::invalid_record_batch("invalid row for collection");
// Clean up our in-progress batch so we don't leak data.
let batch = batch_builder
.finish(upper)
.await
.expect("failed to cleanup batch");
batch.delete().await;
proto_batch_handle
.give(&proto_batch_cap, Err(err).context("stage batches"));
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a lot like the Err(err) => ... branch.

What if, instead of just match maybe_row {, we did a:

maybe_row.and_then(|row| { /* validate(...) -> "invalid row for collection" Err */ })

@patrickwwbutler
Copy link
Copy Markdown
Contributor Author

This PR looks like two independent bugfixes.

I've looked at the "check the nullness in each row" and left one comment. That bugfix seems pretty straightforward. Idk if you want to pull that out and merge it separately.

The other bugfix will take some thought. I'll look through it next.

They are really more like 2 different tactics for the same bug, first to check at the query sequencing stage if the "schema" of the copy target is valid, and second (the S&S specific one) is checking each row for validity (not something we can see from schema in the case of csv) to ensure no nulls are going to be inserted to non null columns. @ggevay had a look at the sequencer changes, so no need to dig too deep there.

@patrickwwbutler patrickwwbutler merged commit 37878bb into MaterializeInc:main Feb 19, 2026
133 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants