[DPE-9049] feat: implement automatic cleanup of lost+found storage directories#1612
[DPE-9049] feat: implement automatic cleanup of lost+found storage directories#1612marceloneppel wants to merge 1 commit into16/edgefrom
Conversation
The `lost+found` directory, automatically generated by certain storage substrates (like LXD block-pool volumes), can interfere with PostgreSQL's initialization and operations by introducing unexpected files in the data roots. Signed-off-by: Marcelo Henrique Neppel <marcelo.neppel@canonical.com>
| shutil.rmtree(lost_and_found_path) | ||
| except OSError: | ||
| logger.exception(f"Failed to remove {lost_and_found_path}") | ||
|
|
There was a problem hiding this comment.
For the history, we cannot rename/move even the PostgreSQL does ccare about dot_files and lost+found here: https://github.com/postgres/postgres/blob/66ad764c8d517f59577d41ac3dad786729c9e10e/src/port/pgcheckdir.c#L60
it still aborts basebackup execution in https://github.com/postgres/postgres/blob/66ad764c8d517f59577d41ac3dad786729c9e10e/src/bin/pg_basebackup/pg_basebackup.c#L771-L772
This is a new Juju volume which obviously has empty lost+found folder, so we are cleaning it to have deployment fixed. The folder will be re-created automatically.
delgod
left a comment
There was a problem hiding this comment.
I think we can be smarter here, clean up the directory only if initdb is run nessesary.
The problem right now is that if we have a healthy PostgreSQL and the kernel crashes, and the machine is restarted, it can get a valid lost+found folder needed for recovery.
Issue
When deploying Charmed PostgreSQL with a dedicated storage volume (e.g., NVMe drives formatted with ext4), the filesystem automatically creates a
lost+founddirectory at the root of the mount point.initdbtreats the storage directory as non-empty and refuses to initialise the database cluster, resulting in an error such as:This prevents the PostgreSQL cluster from ever forming without manual intervention.
Solution
The charm now automatically removes any
lost+founddirectory found at the root of each storage path (data, archive, logs, and temp) during theinstallandstartevents, eliminating the need for external workarounds.Tested with ext4 volumes in a Testflinger machine (it's needed to use LXD VMs instead of containers for it to work).
Checklist
Fixes #1336.