Add 'include_column_list' parameter#58
Add 'include_column_list' parameter#58lucagiovagnoli merged 5 commits intospark-redshift-community:masterfrom eeshugerman:column-list
Conversation
Codecov Report
@@ Coverage Diff @@
## master #58 +/- ##
==========================================
+ Coverage 73.92% 74.09% +0.16%
==========================================
Files 15 15
Lines 767 772 +5
Branches 103 106 +3
==========================================
+ Hits 567 572 +5
Misses 200 200 |
lucagiovagnoli
left a comment
There was a problem hiding this comment.
This looks great! Thanks for the pull request. Can you please:
- address comments
- Bump version and changelog (patch version should be ok as this is backwards compatible)
There was a problem hiding this comment.
why is the schema null here? (in the Map object MAP[Table, StructType]). Should we use the TestUtils.testSchema ?
There was a problem hiding this comment.
Not sure, but I'm guessing the original author copied it from the test above this one, which for some reason uses null. I've changed it to TestUtils.testSchema.
There was a problem hiding this comment.
Could you please write a second small test near this one showing the difference when include_column_list = false ?
Co-authored-by: kyrozetera <jason@koddi.com>
also some minor formatting tweaks for the same test
|
Thanks for reviewing @lucagiovagnoli! I've made the requested changes to the tests.
I bumped the minor version instead, since this PR adds functionality. From https://semver.org:
But I can switch to patch if you prefer. |
lucagiovagnoli
left a comment
There was a problem hiding this comment.
minor works! I was exactly trying to follow semver but got confused.
…n-list Add 'include_column_list' parameter
I found this un-merged patch by @kyrozetera on the old databricks repo (link), and it solves an issue I'm facing.
My use case
Many of the Redshift tables that I'm working with have
created_dtcolumns, like so:created_dt timestamp not null default current_timeThese columns are intended to be left unspecified on inserts/copies, so that the
defaultis used. But for this to work, a column list must be included in the COPY statement, or else I get an error:Missing data for not-null fieldAuthor's use case
It appears the author's motivation for this patch is related to, but different than my own: databricks#340