Skip to content

Conversation

@jeltz
Copy link
Collaborator

@jeltz jeltz commented Nov 13, 2025

As reported by Roman Leventov in percona/postgres#582 we had an issue where we easily could leak temporary files and directories when on failure, especially when the wrapped restore or archive command fails.

The solution in this commit is to move the cleanup of temporary files and directories to an atexit() callback plus add a signal handler for SIGINT and SIGTERM. The signal handler makes sure to kill itself after running cleanup so the parent process gets the right error.

We have a tiny race condition between creating the temporary directory and registering the signal handler but that does not seem worth fixing since it would just leak a directory, not the WAL file, the race conditions being tiny.

With the current hardcoded path of /dev/shm and the TAP test framework testing this seems a pain so no tests are included.

Question: Should I add TAP tests or is it ok that I skipped them?

@jeltz jeltz force-pushed the archive-restore-file-leak branch 2 times, most recently from 7b2fcfb to c6a89ac Compare November 13, 2025 05:27
@jeltz jeltz changed the title Make sure archive and restore comamnds do not leak files PG-1932 Make sure archive and restore comamnds do not leak files Nov 13, 2025
@codecov-commenter
Copy link

codecov-commenter commented Nov 13, 2025

Codecov Report

❌ Patch coverage is 42.85714% with 12 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.66%. Comparing base (2f0315d) to head (78700ca).
⚠️ Report is 4 commits behind head on main.

❌ Your project status has failed because the head coverage (59.66%) is below the target coverage (90.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #467      +/-   ##
==========================================
- Coverage   59.69%   59.66%   -0.03%     
==========================================
  Files          67       67              
  Lines       10471    10481      +10     
  Branches     1813     1809       -4     
==========================================
+ Hits         6251     6254       +3     
- Misses       3524     3532       +8     
+ Partials      696      695       -1     
Components Coverage Δ
access 84.89% <ø> (ø)
catalog 87.93% <ø> (ø)
common 77.77% <ø> (ø)
encryption 71.56% <ø> (ø)
keyring 73.54% <ø> (ø)
src 94.20% <ø> (ø)
smgr 94.06% <ø> (ø)
transam ∅ <ø> (∅)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@dAdAbird dAdAbird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. The only thing, maybe it worth putting a comment about that race somewhere in the code? So it'd be less questions and investigation when someone looks at the code later on.

As reported by Roman Leventov in
percona/postgres#582 we had an issue where we
easily could leak temporary files and directories when on failure,
especially when the wrapped restore or archive command fails.

The solution in this commit is to move the cleanup of temporary files
and directories to and atexit() callback plus add a signal handler for
SIGINT and SIGTERM. The signal handler makes sure to kill itself after
running cleanup so the parent process gets the right error.

We have a tiny race condition between creating the temporary directory
and registering the signal handler but that does not seem worth fixing
since it would just leak a directory, not the WAL file, the race
conditions being tiny.

With the current hardcoded path of /dev/shm and the TAP test framework
testing this seems a pain so no tests are included.
@jeltz jeltz force-pushed the archive-restore-file-leak branch from ff8e70b to 78700ca Compare November 13, 2025 18:11
@jeltz
Copy link
Collaborator Author

jeltz commented Nov 13, 2025

Thanks, added comments.

@jeltz jeltz merged commit bf683f7 into percona:main Nov 13, 2025
17 checks passed
@jeltz jeltz deleted the archive-restore-file-leak branch November 14, 2025 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants