MDEV-38079: Crash recovery fails after ALTER TABLE…PAGE_COMPRESSED=1#4586
MDEV-38079: Crash recovery fails after ALTER TABLE…PAGE_COMPRESSED=1#4586
Conversation
| case log_phys_t::APPLIED_TO_FSP_HEADER: | ||
| s->flags = mach_read_from_4( | ||
| case log_phys_t::APPLIED_TO_FSP_HEADER: { | ||
| uint32_t flags = mach_read_from_4( |
There was a problem hiding this comment.
we don't need this temporary variable "flags".
| auto it = space_flags_lsn.find(space->id); | ||
| if (it == space_flags_lsn.end() || lsn >= it->second) | ||
| { | ||
| space->flags = flags; |
There was a problem hiding this comment.
There shouldn't be space before "="
| FSP_HEADER_OFFSET + FSP_FREE | ||
| + FLST_LEN + frame); | ||
| break; | ||
| } |
There was a problem hiding this comment.
why do we need this curly brace?
There was a problem hiding this comment.
This brace introduce a local scope, so flags is kept limited to the APPLIED_TO_FSP_HEADER case.
Thirunarayanan
left a comment
There was a problem hiding this comment.
Make sure to address the review comments.
5768c02 to
2393f5c
Compare
| FSP_HEADER_OFFSET | ||
| + FSP_SPACE_FLAGS + frame); | ||
| recv_sys.update_space_flags(s, | ||
| mach_read_from_4(FSP_HEADER_OFFSET |
There was a problem hiding this comment.
recv_sys.update_space_flags(
s, mach_read_from_4(FSP_HEADER_OFFSET +
FSP_SPACE_FLAGS + frame),
l->start_lsn);
There was a problem hiding this comment.
recv_sys.update_space_flags(
s, mach_read_from_4(FSP_HEADER_OFFSET +
FSP_SPACE_FLAGS + frame),
l->start_lsn);
Thirunarayanan
left a comment
There was a problem hiding this comment.
Approved with the code format changes
c0df4c5 to
df9d316
Compare
df9d316 to
b53d241
Compare
dr-m
left a comment
There was a problem hiding this comment.
This is a step to the right direction, but I think that it needs a little more work. When it comes to changing FSP_SIZE, we can exploit the fact that the size never shrinks, except in undo tablespace truncation, whose recovery is handled separately. I believe that the LSN tracking must be kept strictly related to changes of FSP_SPACE_FLAGS.
| /** Set the recovered size of a tablespace in pages. | ||
| @param id tablespace ID | ||
| @param size recovered size in pages | ||
| @param flags tablespace flags */ | ||
| @param flags tablespace flags | ||
| @param lsn lsn of the redo log */ | ||
| void fil_space_set_recv_size_and_flags(ulint id, uint32_t size, | ||
| uint32_t flags) noexcept; | ||
| uint32_t flags, lsn_t lsn) noexcept; |
There was a problem hiding this comment.
The @param lsn comment is not adding any value. Which specific LSN does it refer to? Also the size and flags could be described more accurately. Let us use spaces instead of TAB also in the comment:
/** Set the recovered size and flags of a tablespace.
@param id tablespace ID
@param size recovered FSP_SIZE, or a less-than-equal value if no change
@param flags recovered FSP_SPACE_FLAGS if applicable
@param flags_lsn start LSN of modifying flags, or 0 */This comment would also answer the question what would happen or how we would identify the situation when a mini-transaction is modifying only one of FSP_SIZE or FSP_SPACE_FLAGS.
There was a problem hiding this comment.
Made these changes. Thanks.
| /** Last applied LSN for space->flags updates (used to prevent stale updates) */ | ||
| std::map<uint32_t, lsn_t> space_flags_lsn; |
There was a problem hiding this comment.
This could significantly increase the memory footprint. Do we really need an ordered map? We already store similar information in file_name_t linked from recv_spaces.
Could we slightly repurpose fil_space_t::create_lsn for storing this information? I think that the name fil_space_t::format_lsn would also cover the meaning of create_lsn, since the format is usually determined at creation.
If the information is being read before a fil_space_t object is available, then it would seem possible to add file_name_t::format_lsn for storing this.
There was a problem hiding this comment.
Removed this map. And yes it was kind of unnecessary.
I don't think we need to refer to recv_spaces or file_name_t.
Whenever we are updating space->flags, we are guaranteed to have fil_space_t structure as well.
Instead added a member variable fil_space_t::flags_lsn to track it.
Yes, maybe we can repurpose to use fil_space_t::create_lsn..
Haven't done that in this patch.
| /** Update space->flags only if lsn >= last recorded space-flags LSN */ | ||
| void update_space_flags(fil_space_t *space, uint32_t flags, lsn_t lsn); |
There was a problem hiding this comment.
Which LSN? This comment should also somehow refer to fil_space_set_recv_size_and_flags().
There was a problem hiding this comment.
Updated the comments.
| auto it= space_flags_lsn.find(space->id); | ||
| if (it == space_flags_lsn.end() || lsn >= it->second) | ||
| { | ||
| space->flags= flags; | ||
| space_flags_lsn[space->id]= lsn; | ||
| } |
There was a problem hiding this comment.
Why not it->lsn= lsn;?
We are missing mysql_mutex_assert_owner(&recv_sys.mutex) here. The access to a shared data structure needs to be protected somehow.
There was a problem hiding this comment.
Now we don't have map in current patch.
Instead added a separate function fil_space_update_flags_recv()
It gets called from
fil_space_set_recv_size_and_flags(): which already hasmysql_mutex_assert_owner(&recv_sys.mutex);recv_recover_page(): The very first line it has is something oppositemysql_mutex_assert_not_owner(&recv_sys.mutex);
| if (f.size | ||
| || f.flags != f.initial_flags) { | ||
| fil_space_set_recv_size_and_flags( | ||
| space->id, f.size, f.flags); | ||
| space->id, f.size, f.flags, lsn); | ||
| } |
There was a problem hiding this comment.
This is passing the start LSN of the mini-transaction of a FILE_ record, instead of the start LSN of the mini-transaction that actually modified FSP_SPACE_FLAGS.
There was a problem hiding this comment.
Made changes as per the comment
@param flags_lsn start LSN of modifying flags, or 0 */
There was a problem hiding this comment.
This is still passing 0 or the start LSN of a FILE_ record, instead of passing the start LSN of the WRITE mini-transaction based on which we stored or updated f.flags.
| it->second.flags= flags; | ||
| } | ||
| fil_space_set_recv_size_and_flags(space_id, size, flags); | ||
| fil_space_set_recv_size_and_flags(space_id, size, flags, start_lsn); |
There was a problem hiding this comment.
We may have to pass has_flags ? start_lsn : 0 to distinguish the case where the flags were not changed. Maybe the file_name_t::initial_flags could be removed altogether, now that we would keep track of the start_lsn of the mini-transaction that modifed FSP_SPACE_FLAGS, in a new field it->second.flags_lsn or it->second.format_lsn.
| case log_phys_t::APPLIED_TO_FSP_HEADER: | ||
| s->flags = mach_read_from_4( | ||
| FSP_HEADER_OFFSET | ||
| + FSP_SPACE_FLAGS + frame); | ||
| recv_sys.update_space_flags( | ||
| s, mach_read_from_4(FSP_HEADER_OFFSET + | ||
| FSP_SPACE_FLAGS + frame), | ||
| l->start_lsn); |
There was a problem hiding this comment.
Could this code be moved to log_phys_t::apply() and the special return value APPLIED_TO_FSP_HEADER removed? That return value does not currently distinguish changes to FSP_SPACE_FLAGS and FSP_SIZE. That is, if a mini-transaction only modifies FSP_SIZE but not FSP_SPACE_FLAGS, we would execute some unnecessary code here.
There was a problem hiding this comment.
I think that the semantics of log_phys_t::apply() returning where it applied the log. And then we performing actions based on the returned value is simpler.
What can be done is we can add a special return value APPLIED_TO_FSP_FLAGS.
There was a problem hiding this comment.
Which value should be returned if both FSP_SPACE_FLAGS and FSP_SIZE are modified by the same record (a WRITE spanning both fields)? I would like to see how the logic would look like when pushed down to that function.
b53d241 to
9184196
Compare
| void fil_space_update_flags_recv(fil_space_t *space, uint32_t flags, | ||
| lsn_t flags_lsn) |
There was a problem hiding this comment.
The second line is indented incorrectly, and the function is missing noexcept.
Why is this function not a member function of fil_space_t?
| /** LSN of undo tablespace creation or 0; protected by latch */ | ||
| lsn_t create_lsn= 0; | ||
|
|
||
| /** Start LSN of mtr record that updates space->flags or 0 */ | ||
| lsn_t flags_lsn= 0; |
There was a problem hiding this comment.
In my comment 3 weeks ago, I asked you to rename create_lsn to format_lsn and to use that field for both purposes.
We must keep in mind that InnoDB will retain in memory a fil_space_t object for every tablespace that it knows of. There could be millions of tablespsaces. Adding a 64-bit field would increase the memory consumption of a million tables by at least 8 MiB, possibly much more, depending on alignment losses as well as internal and external fragmentation in the memory allocator.
| /** Set flags_lsn. */ | ||
| inline void set_flags_lsn(lsn_t lsn) noexcept | ||
| { | ||
| flags_lsn= lsn; | ||
| } |
There was a problem hiding this comment.
The keyword inline is redundant in an inline member function definition.
This function is at a too low level. The LSN should be recorded together with the flags. That is, the non-member function fil_space_update_flags_recv() should be replaced with a member function.
| /** Set the recovered size and flags of a tablespace. | ||
| @param id tablespace ID | ||
| @param size recovered FSP_SIZE, or a less-than-equal value if no change | ||
| @param flags recovered FSP_SPACE_FLAGS if applicable | ||
| @param flags_lsn start LSN of modifying flags, or 0 */ | ||
| void fil_space_set_recv_size_and_flags(ulint id, uint32_t size, | ||
| uint32_t flags) noexcept; | ||
| uint32_t flags, lsn_t flags_lsn) noexcept; |
There was a problem hiding this comment.
The "less-than-equal" in the comment of size is confusing. Can we just say "recovered FSP_SIZE, or any smaller value"?
The comment fails to say when the value 0 is appropriate for flags_lsn.
| if (f.size | ||
| || f.flags != f.initial_flags) { | ||
| fil_space_set_recv_size_and_flags( | ||
| space->id, f.size, f.flags); | ||
| space->id, f.size, f.flags, lsn); | ||
| } |
There was a problem hiding this comment.
This is still passing 0 or the start LSN of a FILE_ record, instead of passing the start LSN of the WRITE mini-transaction based on which we stored or updated f.flags.
| } | ||
| fil_space_set_recv_size_and_flags(space_id, size, flags); | ||
| fil_space_set_recv_size_and_flags(space_id, size, flags, | ||
| has_flags ? start_lsn : 0); |
There was a problem hiding this comment.
We are missing the following here, which I suggested in my comment 3 weeks ago:
it->second.format_lsn= start_lsn;Why do we invoke fil_space_set_recv_size_and_flags() in the case we do not know the tablespace metadata yet (!it->second.space)? Can that function be replaced with a member function of fil_space_t?
| case log_phys_t::APPLIED_TO_FSP_HEADER: | ||
| s->flags = mach_read_from_4( | ||
| FSP_HEADER_OFFSET | ||
| + FSP_SPACE_FLAGS + frame); | ||
| recv_sys.update_space_flags( | ||
| s, mach_read_from_4(FSP_HEADER_OFFSET + | ||
| FSP_SPACE_FLAGS + frame), | ||
| l->start_lsn); |
There was a problem hiding this comment.
Which value should be returned if both FSP_SPACE_FLAGS and FSP_SIZE are modified by the same record (a WRITE spanning both fields)? I would like to see how the logic would look like when pushed down to that function.
9184196 to
08504cf
Compare
dr-m
left a comment
There was a problem hiding this comment.
The commit message does not mention the FSP_SPACE_FLAGS field or the exact root cause of the problem. It should be along the following lines:
- There are pending writes to a table
t. ALTER TABLE t page_compressed=1is executed.- Some of the pending writes are executed in the
page_compressedformat. - The server is killed.
- Crash recovery may use incorrect
FSP_SPACE_FLAGSand fail because it expects the table to be inpage_compressed=0format.
We go for the simple fix of requiring the table to be rebuilt when the page_compressed attribute is changed, because:
- The
page_compressedoption is used very rarely. - The only motivation of
page_compressed=1is to save space, and space may only be saved when all pages of the table are rewritten (the table is rebuilt).
|
|
||
| --echo # | ||
| --echo # MDEV-37886 PAGE_COMPRESSED ALTER TABLE operations | ||
| --echo # inconsistent with innodb_file_per_table setting | ||
| --echo # | ||
| SET GLOBAL innodb_file_per_table=0; | ||
| CREATE TABLE t1(a INT PRIMARY KEY) ENGINE=InnoDB; | ||
| SET DEBUG_SYNC='before_lock_tables_takes_lock SIGNAL stuck WAIT_FOR go'; | ||
| send ALTER TABLE t1 PAGE_COMPRESSED=1, ALGORITHM=INSTANT; | ||
|
|
||
| --connect con1,localhost,root | ||
| SET DEBUG_SYNC='now WAIT_FOR stuck'; | ||
| SET GLOBAL innodb_file_per_table=1; | ||
| SET DEBUG_SYNC='now SIGNAL go'; | ||
| --disconnect con1 | ||
| --connection default | ||
| --error ER_ILLEGAL_HA_CREATE_OPTION | ||
| --reap | ||
| DROP TABLE t1; |
There was a problem hiding this comment.
The commit message fails to explain why this test was removed. It would be better to preserve the relevant part of the test, something like this:
SET GLOBAL innodb_file_per_table=0;
CREATE TABLE t1(a INT PRIMARY KEY) ENGINE=InnoDB;
--error ER_ALTER_OPERATION_NOT_SUPPORTED_REASON
ALTER TABLE t1 PAGE_COMPRESSED=1, ALGORITHM=INSTANT;
DROP TABLE t1;08504cf to
f7637ba
Compare
| CREATE TABLE t1 (c CHAR(1)) ENGINE=InnoDB; | ||
| SET GLOBAL INNODB_DEFAULT_ROW_FORMAT=compact; | ||
| SELECT TABLE_NAME, ROW_FORMAT, CREATE_OPTIONS | ||
| FROM information_schema.TABLES | ||
| WHERE TABLE_NAME = 't1'; | ||
| --error ER_ILLEGAL_HA_CREATE_OPTION | ||
| ALTER TABLE t1 PAGE_COMPRESSED=1; | ||
| SET GLOBAL INNODB_DEFAULT_ROW_FORMAT=compact; | ||
| # Updates the row-format during rebuild to compatible format | ||
| ALTER TABLE t1 PAGE_COMPRESSED=1; |
There was a problem hiding this comment.
The comment is somewhat misleading. There should be no logic that would change ROW_FORMAT to a compatible value. Instead, the table rebuild would use the current value of innodb_default_row_format. This is because no ROW_FORMAT was explicitly specified in either CREATE TABLE or ALTER TABLE.
If your intention is to demonstrate that ha_innobase::inplace_alter_table() will not be invoked, then I think that you should duplicate both ALTER TABLE statements for both code paths:
ALTER TABLE t1 PAGE_COMPRESSED=1, ALGORITHM=INPLACE;
ALTER TABLE t1 PAGE_COMPRESSED=1, ALGORITHM=COPY;This would more explicitly demonstrate that the code that you removed from ha_innobase::prepare_inplace_alter_table() is not reachable, as well as show the error message that is being executed in each code path.
| ERROR 0A000: ALGORITHM=INSTANT is not supported. Reason: Changing table options requires the table to be rebuilt. Try ALGORITHM=INPLACE | ||
| DROP TABLE t1; | ||
| SET @@global.INNODB_FILE_PER_TABLE = @file_per_table; |
There was a problem hiding this comment.
The ALTER TABLE should fail in the same way for any value of innodb_file_per_table. Therefore, the changes of innodb_file_per_table should be removed. Due to 09aacdd, they will issue deprecation warnings in later major versions, and therefore they should be avoided.
Issue:
- Instant enabling of page_compressed may cause crash recovery failure.
- FSP_SPACE_FLAGS may not match the actual page format during recovery.
Root Cause:
- Table t has pending writes.
- ALTER TABLE t PAGE_COMPRESSED=1 is executed.
- Some pending pages are written in compressed format, but the server may
crash before all pages are rewritten.
- During recovery, FSP_SPACE_FLAGS may indicate non-compressed while some
pages are compressed which results in space ID mismatch.
Fix:
- We go for the simple fix of requiring the table to be rebuilt when the
page_compressed attribute is changed, because:
- The page_compressed option is used very rarely.
- The only motivation of page_compressed=1 is to save space, and space may
only be saved when all pages of the table are rewritten (table is rebuilt).
- alter_options_need_rebuild(): Enabling page_compressed will require a
full table rebuild.
f7637ba to
e520abd
Compare
Description
MDEV-38079: Crash recovery fails after ALTER TABLE…PAGE_COMPRESSED=1
Issue:
Root Cause:
crash before all pages are rewritten.
pages are compressed which results in space ID mismatch.
Fix:
We go for the simple fix of requiring the table to be rebuilt when the
page_compressed attribute is changed, because:
only be saved when all pages of the table are rewritten (table is rebuilt).
alter_options_need_rebuild(): Enabling page_compressed will require a
full table rebuild.
How can this PR be tested?
./mtr innodb.alter_algorithm
./mtr innodb.alter_crash
./mtr innodb.table_flags
Basing the PR against the correct MariaDB version