Skip to content

MDEV-38079: Crash recovery fails after ALTER TABLE…PAGE_COMPRESSED=1#4586

Merged
dr-m merged 1 commit into10.6from
10.6-MDEV-38079
Mar 4, 2026
Merged

MDEV-38079: Crash recovery fails after ALTER TABLE…PAGE_COMPRESSED=1#4586
dr-m merged 1 commit into10.6from
10.6-MDEV-38079

Conversation

@mariadb-TafzeelShams
Copy link
Copy Markdown
Contributor

@mariadb-TafzeelShams mariadb-TafzeelShams commented Jan 23, 2026

  • The Jira issue number for this PR is: MDEV-38079

Description

MDEV-38079: Crash recovery fails after ALTER TABLE…PAGE_COMPRESSED=1

Issue:

  • Instant enabling of page_compressed may cause crash recovery failure.
  • FSP_SPACE_FLAGS may not match the actual page format during recovery.

Root Cause:

  • Table t has pending writes.
  • ALTER TABLE t PAGE_COMPRESSED=1 is executed.
  • Some pending pages are written in compressed format, but the server may
    crash before all pages are rewritten.
  • During recovery, FSP_SPACE_FLAGS may indicate non-compressed while some
    pages are compressed which results in space ID mismatch.

Fix:

  • We go for the simple fix of requiring the table to be rebuilt when the
    page_compressed attribute is changed, because:

    • The page_compressed option is used very rarely.
    • The only motivation of page_compressed=1 is to save space, and space may
      only be saved when all pages of the table are rewritten (table is rebuilt).
  • alter_options_need_rebuild(): Enabling page_compressed will require a
    full table rebuild.

How can this PR be tested?

./mtr innodb.alter_algorithm
./mtr innodb.alter_crash
./mtr innodb.table_flags

Basing the PR against the correct MariaDB version

  • This is a new feature and the PR is based against the latest MariaDB development branch
  • This is a bug fix and the PR is based against the earliest branch in which the bug can be reproduced

Comment thread storage/innobase/log/log0recv.cc Outdated
case log_phys_t::APPLIED_TO_FSP_HEADER:
s->flags = mach_read_from_4(
case log_phys_t::APPLIED_TO_FSP_HEADER: {
uint32_t flags = mach_read_from_4(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need this temporary variable "flags".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread storage/innobase/log/log0recv.cc Outdated
auto it = space_flags_lsn.find(space->id);
if (it == space_flags_lsn.end() || lsn >= it->second)
{
space->flags = flags;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be space before "="

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread storage/innobase/log/log0recv.cc Outdated
FSP_HEADER_OFFSET + FSP_FREE
+ FLST_LEN + frame);
break;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this curly brace?

Copy link
Copy Markdown
Contributor Author

@mariadb-TafzeelShams mariadb-TafzeelShams Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This brace introduce a local scope, so flags is kept limited to the APPLIED_TO_FSP_HEADER case.

Copy link
Copy Markdown
Member

@Thirunarayanan Thirunarayanan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to address the review comments.

Comment thread storage/innobase/log/log0recv.cc Outdated
FSP_HEADER_OFFSET
+ FSP_SPACE_FLAGS + frame);
recv_sys.update_space_flags(s,
mach_read_from_4(FSP_HEADER_OFFSET
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recv_sys.update_space_flags(
   s, mach_read_from_4(FSP_HEADER_OFFSET +
                       FSP_SPACE_FLAGS + frame),
   l->start_lsn);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recv_sys.update_space_flags(
  s, mach_read_from_4(FSP_HEADER_OFFSET +
                      FSP_SPACE_FLAGS + frame),
  l->start_lsn);

Copy link
Copy Markdown
Member

@Thirunarayanan Thirunarayanan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with the code format changes

Copy link
Copy Markdown
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a step to the right direction, but I think that it needs a little more work. When it comes to changing FSP_SIZE, we can exploit the fact that the size never shrinks, except in undo tablespace truncation, whose recovery is handled separately. I believe that the LSN tracking must be kept strictly related to changes of FSP_SPACE_FLAGS.

Comment thread storage/innobase/include/fil0fil.h Outdated
Comment on lines +1672 to +1678
/** Set the recovered size of a tablespace in pages.
@param id tablespace ID
@param size recovered size in pages
@param flags tablespace flags */
@param flags tablespace flags
@param lsn lsn of the redo log */
void fil_space_set_recv_size_and_flags(ulint id, uint32_t size,
uint32_t flags) noexcept;
uint32_t flags, lsn_t lsn) noexcept;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @param lsn comment is not adding any value. Which specific LSN does it refer to? Also the size and flags could be described more accurately. Let us use spaces instead of TAB also in the comment:

/** Set the recovered size and flags of a tablespace.
@param id         tablespace ID
@param size       recovered FSP_SIZE, or a less-than-equal value if no change
@param flags      recovered FSP_SPACE_FLAGS if applicable
@param flags_lsn  start LSN of modifying flags, or 0 */

This comment would also answer the question what would happen or how we would identify the situation when a mini-transaction is modifying only one of FSP_SIZE or FSP_SPACE_FLAGS.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made these changes. Thanks.

Comment thread storage/innobase/include/log0recv.h Outdated
Comment on lines +271 to +272
/** Last applied LSN for space->flags updates (used to prevent stale updates) */
std::map<uint32_t, lsn_t> space_flags_lsn;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could significantly increase the memory footprint. Do we really need an ordered map? We already store similar information in file_name_t linked from recv_spaces.

Could we slightly repurpose fil_space_t::create_lsn for storing this information? I think that the name fil_space_t::format_lsn would also cover the meaning of create_lsn, since the format is usually determined at creation.

If the information is being read before a fil_space_t object is available, then it would seem possible to add file_name_t::format_lsn for storing this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this map. And yes it was kind of unnecessary.

I don't think we need to refer to recv_spaces or file_name_t.
Whenever we are updating space->flags, we are guaranteed to have fil_space_t structure as well.
Instead added a member variable fil_space_t::flags_lsn to track it.

Yes, maybe we can repurpose to use fil_space_t::create_lsn..
Haven't done that in this patch.

Comment thread storage/innobase/include/log0recv.h Outdated
Comment on lines +356 to +357
/** Update space->flags only if lsn >= last recorded space-flags LSN */
void update_space_flags(fil_space_t *space, uint32_t flags, lsn_t lsn);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which LSN? This comment should also somehow refer to fil_space_set_recv_size_and_flags().

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comments.

Comment thread storage/innobase/log/log0recv.cc Outdated
Comment on lines +1099 to +1104
auto it= space_flags_lsn.find(space->id);
if (it == space_flags_lsn.end() || lsn >= it->second)
{
space->flags= flags;
space_flags_lsn[space->id]= lsn;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not it->lsn= lsn;?

We are missing mysql_mutex_assert_owner(&recv_sys.mutex) here. The access to a shared data structure needs to be protected somehow.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we don't have map in current patch.
Instead added a separate function fil_space_update_flags_recv()

It gets called from

  1. fil_space_set_recv_size_and_flags() : which already has mysql_mutex_assert_owner(&recv_sys.mutex);
  2. recv_recover_page() : The very first line it has is something opposite mysql_mutex_assert_not_owner(&recv_sys.mutex);

Comment on lines 1345 to 1349
if (f.size
|| f.flags != f.initial_flags) {
fil_space_set_recv_size_and_flags(
space->id, f.size, f.flags);
space->id, f.size, f.flags, lsn);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is passing the start LSN of the mini-transaction of a FILE_ record, instead of the start LSN of the mini-transaction that actually modified FSP_SPACE_FLAGS.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made changes as per the comment
@param flags_lsn start LSN of modifying flags, or 0 */

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still passing 0 or the start LSN of a FILE_ record, instead of passing the start LSN of the WRITE mini-transaction based on which we stored or updated f.flags.

Comment thread storage/innobase/log/log0recv.cc Outdated
Comment on lines +2730 to +2737
it->second.flags= flags;
}
fil_space_set_recv_size_and_flags(space_id, size, flags);
fil_space_set_recv_size_and_flags(space_id, size, flags, start_lsn);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may have to pass has_flags ? start_lsn : 0 to distinguish the case where the flags were not changed. Maybe the file_name_t::initial_flags could be removed altogether, now that we would keep track of the start_lsn of the mini-transaction that modifed FSP_SPACE_FLAGS, in a new field it->second.flags_lsn or it->second.format_lsn.

Comment thread storage/innobase/log/log0recv.cc Outdated
Comment on lines +3094 to +3098
case log_phys_t::APPLIED_TO_FSP_HEADER:
s->flags = mach_read_from_4(
FSP_HEADER_OFFSET
+ FSP_SPACE_FLAGS + frame);
recv_sys.update_space_flags(
s, mach_read_from_4(FSP_HEADER_OFFSET +
FSP_SPACE_FLAGS + frame),
l->start_lsn);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this code be moved to log_phys_t::apply() and the special return value APPLIED_TO_FSP_HEADER removed? That return value does not currently distinguish changes to FSP_SPACE_FLAGS and FSP_SIZE. That is, if a mini-transaction only modifies FSP_SIZE but not FSP_SPACE_FLAGS, we would execute some unnecessary code here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the semantics of log_phys_t::apply() returning where it applied the log. And then we performing actions based on the returned value is simpler.

What can be done is we can add a special return value APPLIED_TO_FSP_FLAGS.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which value should be returned if both FSP_SPACE_FLAGS and FSP_SIZE are modified by the same record (a WRITE spanning both fields)? I would like to see how the logic would look like when pushed down to that function.

Comment thread storage/innobase/fil/fil0fil.cc Outdated
Comment on lines +1069 to +1070
void fil_space_update_flags_recv(fil_space_t *space, uint32_t flags,
lsn_t flags_lsn)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second line is indented incorrectly, and the function is missing noexcept.

Why is this function not a member function of fil_space_t?

Comment thread storage/innobase/include/fil0fil.h Outdated
Comment on lines +436 to +440
/** LSN of undo tablespace creation or 0; protected by latch */
lsn_t create_lsn= 0;

/** Start LSN of mtr record that updates space->flags or 0 */
lsn_t flags_lsn= 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my comment 3 weeks ago, I asked you to rename create_lsn to format_lsn and to use that field for both purposes.

We must keep in mind that InnoDB will retain in memory a fil_space_t object for every tablespace that it knows of. There could be millions of tablespsaces. Adding a 64-bit field would increase the memory consumption of a million tables by at least 8 MiB, possibly much more, depending on alignment losses as well as internal and external fragmentation in the memory allocator.

Comment thread storage/innobase/include/fil0fil.h Outdated
Comment on lines +461 to +465
/** Set flags_lsn. */
inline void set_flags_lsn(lsn_t lsn) noexcept
{
flags_lsn= lsn;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The keyword inline is redundant in an inline member function definition.

This function is at a too low level. The LSN should be recorded together with the flags. That is, the non-member function fil_space_update_flags_recv() should be replaced with a member function.

Comment thread storage/innobase/include/fil0fil.h Outdated
Comment on lines +1692 to +1698
/** Set the recovered size and flags of a tablespace.
@param id tablespace ID
@param size recovered FSP_SIZE, or a less-than-equal value if no change
@param flags recovered FSP_SPACE_FLAGS if applicable
@param flags_lsn start LSN of modifying flags, or 0 */
void fil_space_set_recv_size_and_flags(ulint id, uint32_t size,
uint32_t flags) noexcept;
uint32_t flags, lsn_t flags_lsn) noexcept;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "less-than-equal" in the comment of size is confusing. Can we just say "recovered FSP_SIZE, or any smaller value"?

The comment fails to say when the value 0 is appropriate for flags_lsn.

Comment on lines 1345 to 1349
if (f.size
|| f.flags != f.initial_flags) {
fil_space_set_recv_size_and_flags(
space->id, f.size, f.flags);
space->id, f.size, f.flags, lsn);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still passing 0 or the start LSN of a FILE_ record, instead of passing the start LSN of the WRITE mini-transaction based on which we stored or updated f.flags.

Comment thread storage/innobase/log/log0recv.cc Outdated
Comment on lines +2721 to +2729
}
fil_space_set_recv_size_and_flags(space_id, size, flags);
fil_space_set_recv_size_and_flags(space_id, size, flags,
has_flags ? start_lsn : 0);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are missing the following here, which I suggested in my comment 3 weeks ago:

it->second.format_lsn= start_lsn;

Why do we invoke fil_space_set_recv_size_and_flags() in the case we do not know the tablespace metadata yet (!it->second.space)? Can that function be replaced with a member function of fil_space_t?

Comment thread storage/innobase/log/log0recv.cc Outdated
Comment on lines +3094 to +3098
case log_phys_t::APPLIED_TO_FSP_HEADER:
s->flags = mach_read_from_4(
FSP_HEADER_OFFSET
+ FSP_SPACE_FLAGS + frame);
recv_sys.update_space_flags(
s, mach_read_from_4(FSP_HEADER_OFFSET +
FSP_SPACE_FLAGS + frame),
l->start_lsn);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which value should be returned if both FSP_SPACE_FLAGS and FSP_SIZE are modified by the same record (a WRITE spanning both fields)? I would like to see how the logic would look like when pushed down to that function.

@mariadb-TafzeelShams mariadb-TafzeelShams changed the title MDEV-38079: Crash Recovery Fails After ALTER TABLE…PAGE_COMPRESSED=1 MDEV-38079: Do not allow INSTANT ALTER to enable page_compressed Feb 26, 2026
@mariadb-TafzeelShams mariadb-TafzeelShams marked this pull request as draft February 26, 2026 05:13
Copy link
Copy Markdown
Contributor

@dr-m dr-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message does not mention the FSP_SPACE_FLAGS field or the exact root cause of the problem. It should be along the following lines:

  1. There are pending writes to a table t.
  2. ALTER TABLE t page_compressed=1 is executed.
  3. Some of the pending writes are executed in the page_compressed format.
  4. The server is killed.
  5. Crash recovery may use incorrect FSP_SPACE_FLAGS and fail because it expects the table to be in page_compressed=0 format.

We go for the simple fix of requiring the table to be rebuilt when the page_compressed attribute is changed, because:

  1. The page_compressed option is used very rarely.
  2. The only motivation of page_compressed=1 is to save space, and space may only be saved when all pages of the table are rewritten (the table is rebuilt).

Comment on lines -244 to -262

--echo #
--echo # MDEV-37886 PAGE_COMPRESSED ALTER TABLE operations
--echo # inconsistent with innodb_file_per_table setting
--echo #
SET GLOBAL innodb_file_per_table=0;
CREATE TABLE t1(a INT PRIMARY KEY) ENGINE=InnoDB;
SET DEBUG_SYNC='before_lock_tables_takes_lock SIGNAL stuck WAIT_FOR go';
send ALTER TABLE t1 PAGE_COMPRESSED=1, ALGORITHM=INSTANT;

--connect con1,localhost,root
SET DEBUG_SYNC='now WAIT_FOR stuck';
SET GLOBAL innodb_file_per_table=1;
SET DEBUG_SYNC='now SIGNAL go';
--disconnect con1
--connection default
--error ER_ILLEGAL_HA_CREATE_OPTION
--reap
DROP TABLE t1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commit message fails to explain why this test was removed. It would be better to preserve the relevant part of the test, something like this:

SET GLOBAL innodb_file_per_table=0;
CREATE TABLE t1(a INT PRIMARY KEY) ENGINE=InnoDB;
--error ER_ALTER_OPERATION_NOT_SUPPORTED_REASON
ALTER TABLE t1 PAGE_COMPRESSED=1, ALGORITHM=INSTANT;
DROP TABLE t1;

Comment thread mysql-test/suite/innodb/r/table_flags.result
Comment thread mysql-test/suite/innodb/t/alter_algorithm.test
Comment thread storage/innobase/handler/handler0alter.cc
Comment thread storage/innobase/handler/handler0alter.cc
@mariadb-TafzeelShams mariadb-TafzeelShams changed the title MDEV-38079: Do not allow INSTANT ALTER to enable page_compressed MDEV-38079: Crash recovery fails after ALTER TABLE…PAGE_COMPRESSED=1 Feb 26, 2026
@mariadb-TafzeelShams mariadb-TafzeelShams marked this pull request as ready for review February 26, 2026 08:08
Comment on lines +262 to +270
CREATE TABLE t1 (c CHAR(1)) ENGINE=InnoDB;
SET GLOBAL INNODB_DEFAULT_ROW_FORMAT=compact;
SELECT TABLE_NAME, ROW_FORMAT, CREATE_OPTIONS
FROM information_schema.TABLES
WHERE TABLE_NAME = 't1';
--error ER_ILLEGAL_HA_CREATE_OPTION
ALTER TABLE t1 PAGE_COMPRESSED=1;
SET GLOBAL INNODB_DEFAULT_ROW_FORMAT=compact;
# Updates the row-format during rebuild to compatible format
ALTER TABLE t1 PAGE_COMPRESSED=1;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is somewhat misleading. There should be no logic that would change ROW_FORMAT to a compatible value. Instead, the table rebuild would use the current value of innodb_default_row_format. This is because no ROW_FORMAT was explicitly specified in either CREATE TABLE or ALTER TABLE.

If your intention is to demonstrate that ha_innobase::inplace_alter_table() will not be invoked, then I think that you should duplicate both ALTER TABLE statements for both code paths:

ALTER TABLE t1 PAGE_COMPRESSED=1, ALGORITHM=INPLACE;
ALTER TABLE t1 PAGE_COMPRESSED=1, ALGORITHM=COPY;

This would more explicitly demonstrate that the code that you removed from ha_innobase::prepare_inplace_alter_table() is not reachable, as well as show the error message that is being executed in each code path.

Comment on lines +253 to +255
ERROR 0A000: ALGORITHM=INSTANT is not supported. Reason: Changing table options requires the table to be rebuilt. Try ALGORITHM=INPLACE
DROP TABLE t1;
SET @@global.INNODB_FILE_PER_TABLE = @file_per_table;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ALTER TABLE should fail in the same way for any value of innodb_file_per_table. Therefore, the changes of innodb_file_per_table should be removed. Due to 09aacdd, they will issue deprecation warnings in later major versions, and therefore they should be avoided.

Issue:
- Instant enabling of page_compressed may cause crash recovery failure.
- FSP_SPACE_FLAGS may not match the actual page format during recovery.

Root Cause:
- Table t has pending writes.
- ALTER TABLE t PAGE_COMPRESSED=1 is executed.
- Some pending pages are written in compressed format, but the server may
  crash before all pages are rewritten.
- During recovery, FSP_SPACE_FLAGS may indicate non-compressed while some
  pages are compressed which results in space ID mismatch.

Fix:
- We go for the simple fix of requiring the table to be rebuilt when the
  page_compressed attribute is changed, because:
  - The page_compressed option is used very rarely.
  - The only motivation of page_compressed=1 is to save space, and space may
    only be saved when all pages of the table are rewritten (table is rebuilt).

- alter_options_need_rebuild(): Enabling page_compressed will require a
  full table rebuild.
@dr-m dr-m merged commit e520abd into 10.6 Mar 4, 2026
13 checks passed
@dr-m dr-m deleted the 10.6-MDEV-38079 branch March 4, 2026 08:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants