Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error removing file over SFTP but SSH permissions are OK #541

Closed
f18m opened this issue Feb 24, 2025 · 26 comments · Fixed by #548
Closed

Error removing file over SFTP but SSH permissions are OK #541

f18m opened this issue Feb 24, 2025 · 26 comments · Fixed by #548
Labels
bug Something isn't working support Support Requests

Comments

@f18m
Copy link

f18m commented Feb 24, 2025

Hi,
thanks for this great software!

I've set 7 days of retention for my backups over SSH/SFTP and the backups are correctly and regularly created. However docker-volume-backup is unable to erase old backups: here's a snippet of the log files (repeated many times):

time=2025-02-19T02:30:20.279Z level=ERROR msg="Unexpected error running schedule 30 2 * * *: sftp: \"Failure\" (SSH_FX_FAILURE)" error="main.runScript.func4: error running script: main.(*script).pruneBackups: error pruning backups: ssh.(*sshStorage).Prune.func1: error removing file: sftp: \"Failure\" (SSH_FX_FAILURE)"
time=2025-02-20T02:30:00.133Z level=INFO msg="Now running script on schedule 30 2 * * *"
time=2025-02-20T02:30:08.775Z level=INFO msg="Created backup of `/backup` at `/tmp/backup-2025-02-20T02-30-00.tar.gz`."
time=2025-02-20T02:30:15.762Z level=INFO msg="Uploaded a copy of backup `/tmp/backup-2025-02-20T02-30-00.tar.gz` to '192.168.1.5' at path '/mnt/mediadownloader-backup/'." storage=SSH
time=2025-02-20T02:30:16.610Z level=INFO msg="Removed tar file `/tmp/backup-2025-02-20T02-30-00.tar.gz`."
time=2025-02-20T02:30:16.620Z level=ERROR msg="Unexpected error running schedule 30 2 * * *: sftp: \"Failure\" (SSH_FX_FAILURE)" error="main.runScript.func4: error running script: main.(*script).pruneBackups: error pruning backups: ssh.(*sshStorage).Prune.func1: error removing file: sftp: \"Failure\" (SSH_FX_FAILURE)"

The thing is: if I log via SSH using exactly

  • same RSA private key
  • SSH host/user/pw
    provided to docker-volume-backup, I can remove old backups without any issue with "rm"...

Any hint/idea?

@m90
Copy link
Member

m90 commented Feb 25, 2025

This is very hard to tell. SSH_FX_FAILURE seems to be a rather generic error code that can occur on different conditions. Googling shows me a few people seeing this error on disks that are full, but I doubt this is your problem as you just uploaded a new archive successfully.

Your configuration is correct in any case.

@m90 m90 added the support Support Requests label Feb 25, 2025
@m90
Copy link
Member

m90 commented Feb 25, 2025

This could be related pkg/sftp#137

@f18m
Copy link
Author

f18m commented Feb 25, 2025

Yeah I'm not sure. Perhaps I can try to create a minimal go program loosely based on https://github.com/offen/docker-volume-backup/blob/main/internal/storage/ssh/ssh.go to reproduce the problem.
Now I tried to login to my remote repository with the sftp program and I could rm successfully an old backup...

@m90
Copy link
Member

m90 commented Feb 25, 2025

I'd be interested in seeing what happens when you use a different sftp client and try to delete a file.

@m90
Copy link
Member

m90 commented Feb 25, 2025

Digging through the source code of the sftp package I found the following comment above the code that is used for deleting files:

// Sync requires the server to support the [email protected] extension.

Can you check if your remote supports this?

@f18m
Copy link
Author

f18m commented Feb 26, 2025

@m90 I'm really not sure how to find out if that fsync extension is supported or not.
I guess the answer is no. The remote system is a NAS running a custom OS (TerraMaster TOS) which is a derivative of OpenWrt.
I see the SFTP server is provided by the following packages

openssh-client - 7.1p2-1
openssh-client-utils - 7.1p2-1
openssh-keygen - 7.1p2-1
openssh-server - 7.1p2-1
openssh-server-pam - 7.1p2-1
openssh-sftp-client - 7.1p2-1
openssh-sftp-server - 7.1p2-1
openssl-util - 1.0.2j-1

However my take on this is: I understand that the ability to force an fsync can be useful, but it's far from mandatory IMHO.
Do you think it would be possible to fallback to a simple "rm" (without fsync) in case SSH_FX_FAILURE is raised ?

@m90
Copy link
Member

m90 commented Feb 26, 2025

Do you think it would be possible to fallback to a simple "rm"

I'm not sure such a fallback would really qualify as "simple" as it would require setting up yet another way of connecting to storage servers. In any case, I think this should not be solved here, but upstream in package sftp. I'll open an issue over there and see If there is any feedback.

@puellanivis
Copy link

puellanivis commented Feb 26, 2025

I’m not really sure that fsync is at all involved here. pkg/sftp does not issue an fsync extended packet for sftp.Client.Remove() and the code in this repo isn’t calling sftp.File.Sync() either. 🤔

One thing that could be happening is that sftp.Remove first tries to delete a file, and then unconditionally returns the error from a sftp.Client.RemoveDirectory. So, the file remove could be returning a more informative error, which is being lost because the rmdir obviously isn’t working.

P.S. openssh-sftp-server - 7.1p2-1 this most certainly provides support for fsync has been in openssh-sftp-server since v_6_5_P1.

@puellanivis
Copy link

puellanivis commented Feb 26, 2025

Random question… you wouldn’t happen to be running this on Windows?

(Why I ask, in internal/storage/ssh/ssh.go, there is a lot of use of filepath.Join that should be path.Join. This could potentially cause issues.)

@m90
Copy link
Member

m90 commented Feb 26, 2025

Why I ask, in internal/storage/ssh/ssh.go, there is a lot of use of filepath.Join that should be path.Join

I'm not entirely sure I understand yet. So the code in internal/storage/ssh/ssh.go is expected to run in a Docker container based off alpine always, which makes me think it wouldn't really make a difference whether filepath.Join or path.Join is used as the separator will always be /, even if the Docker host is a Windows machine (then again my understanding of Docker on Windows might be way outdated).

Then again a line like this

destination, err := b.sftpClient.Create(filepath.Join(b.DestinationPath, name))

means that a path for use on the remote is being constructed, using the separator suitable for the client. Would it even be possible the remote is using \ separators when the client is using /?

Why do you think filepath.Join should be replaced with path.Join?


That being said, I don't think this problem is related to the issue itself as this would also make file creation fail, but it seems to work just fine. The SSH_FX_FAILURE is only being returned on calling Remove on the client.

@m90
Copy link
Member

m90 commented Feb 26, 2025

Thanks for your thorough feedback on this by the way @puellanivis - that's much appreciated.

@puellanivis
Copy link

Why do you think filepath.Join should be replaced with path.Join?

From the SFTP spec:

6.2 File Names

This protocol represents file names as strings. File names are assumed to use the slash ('/') character as a directory separator.

Then again a line like this … means that a path for use on the remote is being constructed, using the separator suitable for the client.

Yeah, this kind of precludes the issue. It was a long shot anyways.

Would it even be possible the remote is using \ separators when the client is using /?

No. Everyone has settled on using /, because it’s the assumed path separator for all clients and servers. I spotted it just because the pkg/sftp package has had to be very pedantic about the usage. But this is pretty much identical to the URL specification, where / is the only valid path separator, so path vs filepath is a fun thing to always have ticking in the back of your mind. 🤪

@f18m
Copy link
Author

f18m commented Feb 27, 2025

Not sure if it helps but none of the systems involved with this issue are running Windows. They're all Linux-based.
The system doing the backups using offen/docker-volume-backup is a Debian 12 Bookworm. The NAS exposing the SFTP server where backups are being stored is a custom OS (TerraMaster TOS) which is a derivative of OpenWrt.

@puellanivis do you know if there is any SFTP client out there already available built using the sftp package so that I could check if that SFTP client is also reporting problems when trying to delete a file from my SFTP server ?
thanks

m90 added a commit that referenced this issue Feb 27, 2025
Currently, filepath.Join is used to join path fragments that are supposed
to be created on remote storage backends. This could theoretically cause
problems when the separators used by the client and the remotes do not match.
It's unlikely this causes problems right now, but it's definitely better to
rectify it before it causes further confusion.

This was raised in #541
@puellanivis
Copy link

puellanivis commented Feb 27, 2025

Not off hand, no I don’t know any. 🤔 But you should be able to adapt any of the examples, like examples/streaming-read-benchmark pretty quick… just remove from the c.Open and replace it with a c.Remove("/path/to/file").

Another good idea would be to add the filename into the error text of

return errwrap.Wrap(err, "error removing file")
🤔 It might be something unusual is happening, and it’s building an unexpected filename. (The dev-v2 branch prototype for a v2 of the library actually switches to returning fs.PathErrors that include the filenames, not just the status packet.)

m90 added a commit that referenced this issue Feb 27, 2025
…547)

Currently, filepath.Join is used to join path fragments that are supposed
to be created on remote storage backends. This could theoretically cause
problems when the separators used by the client and the remotes do not match.
It's unlikely this causes problems right now, but it's definitely better to
rectify it before it causes further confusion.

This was raised in #541
@m90
Copy link
Member

m90 commented Feb 27, 2025

@f18m In case your setup allows you to use an image you build yourself, here's a branch https://github.com/offen/docker-volume-backup/tree/failure-filename that does:

  • use path.Join where appropriate
  • includes the path it's failing to delete in the error message

Maybe this gives us an additional hint here?

@puellanivis
Copy link

puellanivis commented Feb 27, 2025

I also opened up a PR for pkg/sftp so that we return OP_UNSUPPORTED if fsync support wasn’t announced, and as well, to have Remove return a PathError when an error occurs, so hopefully you won’t need to add the path in the future.

@m90
Copy link
Member

m90 commented Feb 27, 2025

I updated the code in https://github.com/offen/docker-volume-backup/tree/failure-filename to use the PR containing the backports. @f18m are you able to build the image from that branch and run it on your setup?

@f18m
Copy link
Author

f18m commented Feb 27, 2025

Hi @puellanivis , @m90 , thank you so much for all the support.
I'm testing now the failure-filename branch... will report ASAP

@f18m
Copy link
Author

f18m commented Feb 27, 2025

Thanks to the new logging I think the error is clear:

time=2025-02-27T21:20:40.953Z level=INFO msg="Created backup of `/backup` at `/tmp/backup-2025-02-27T21-20-33.tar.gz`."
time=2025-02-27T21:20:48.859Z level=INFO msg="Uploaded a copy of backup `/tmp/backup-2025-02-27T21-20-33.tar.gz` to '192.168.1.5' at path '/mnt/mediadownloader-backup/'." storage=SSH
time=2025-02-27T21:20:49.284Z level=INFO msg="Removed tar file `/tmp/backup-2025-02-27T21-20-33.tar.gz`."
time=2025-02-27T21:20:49.284Z level=ERROR msg="Fatal error running command: sftp: \"Failure\" (SSH_FX_FAILURE)" error="main.(*command).runAsCommand: error running script: main.runScript.func4: error running script: main.(*script).pruneBackups: error pruning backups: ssh.(*sshStorage).Prune.func1: error removing file /mnt/mediadownloader-backup/#recycle: remove /mnt/mediadownloader-backup/#recycle: sftp: \"Failure\" (SSH_FX_FAILURE)"

Indeed there is a "#recycle" folder inside the destination folder for the backups.
I'm not sure which OS created such folder (I access that Samba share both from Linux and MacOS). Or perhaps it's always been there as it's created by the TerraMaster TOS system (the OS of the NAS).

I guess a folder like that cannot be deleted via SFTP because of its name?
Anyway I think docker-volume-backup is stopping there, instead of logging the error and keep going (=deleting other files that represent old backups)

@f18m
Copy link
Author

f18m commented Feb 27, 2025

Indeed it seems I cannot delete the '#recycle' folder via SFTP:

sftp> ls -l
drwxrwxrwx    3 root     root         4096 Feb 12 01:13 #recycle
-rw-rw-rw-    1 f18m     allusers 407961408 Feb 14 03:30 backup-2025-02-14T02-30-00.tar.gz
-rw-rw-rw-    1 f18m     allusers 410740865 Feb 18 03:30 backup-2025-02-18T02-30-00.tar.gz
-rw-rw-rw-    1 f18m     allusers 410858781 Feb 19 03:30 backup-2025-02-19T02-30-00.tar.gz
-rw-rw-rw-    1 f18m     allusers 412245131 Feb 20 03:30 backup-2025-02-20T02-30-00.tar.gz
-rw-rw-rw-    1 f18m     allusers 412205153 Feb 21 03:30 backup-2025-02-21T02-30-00.tar.gz
-rw-rw-rw-    1 f18m     allusers 412035890 Feb 22 03:30 backup-2025-02-22T02-30-00.tar.gz
-rw-rw-rw-    1 f18m     allusers 415530365 Feb 23 03:30 backup-2025-02-23T02-30-00.tar.gz
-rw-rw-rw-    1 f18m     allusers 419872964 Feb 24 03:30 backup-2025-02-24T02-30-00.tar.gz
-rw-rw-rw-    1 f18m     allusers 425403968 Feb 25 03:30 backup-2025-02-25T02-30-00.tar.gz
-rw-rw-rw-    1 f18m     allusers 425577567 Feb 26 03:30 backup-2025-02-26T02-30-00.tar.gz
-rw-rw-rw-    1 f18m     allusers 425739881 Feb 27 03:30 backup-2025-02-27T02-30-00.tar.gz
-rw-rw-rw-    1 f18m     allusers 426489700 Feb 27 22:20 backup-2025-02-27T21-20-33.tar.gz
sftp> rmdir '#recycle'
remote rmdir "/mnt/md0/mediadownloader-backup/#recycle": Failure

(this is using the OpenSSH 'sftp' client)

@m90
Copy link
Member

m90 commented Feb 27, 2025

Interesting. I think I'd call the code trying to delete a directory a bug that needs to be fixed here. This behavior just wouldn't make sense in any scenario.

In the meantime you should be able to work around this by setting a BACKUP_PRUNING_PREFIX in your configuration that matches your backups, e.g. backup-.

Thanks for your help in debugging this.

@m90 m90 added bug Something isn't working and removed upstream labels Feb 27, 2025
@f18m
Copy link
Author

f18m commented Feb 27, 2025

It seems like that '#recycle' folder is part of the (automatic) Samba configuration on my NAS:

[mediadownloader-backup]
path = /mnt/md0/mediadownloader-backup
browsable = yes
writeable = yes
inherit permissions = yes
public = yes
vfs object = recycle
recycle:repository = ./#recycle
recycle:keeptree = Yes
recycle:directory_mode = 0777
recycle:versions = Yes
recycle:maxsixe = 0
recycle:exclude_dir = #recycle

(note that my NAS is enabling both SFTP and SAMBA access to the shares I define via some UI).

Anyway reading the docs I see there is the possibility to specify a "BACKUP_PRUNING_PREFIX" option... going to try this one.

@f18m
Copy link
Author

f18m commented Feb 27, 2025

Thanks @m90.
Indeed I was somehow assuming that docker-volume-backup was only looking at files, not folders.
But let's say that I create (by mistake perhaps) a file named '#whatever' in the backup folder. Is that enough to prevent the backup pruning logic to remove old backups?
I think it's fine to print out an error but the software should keep going and skip the "non-deletable" file...

Anyway using BACKUP_PRUNING_PREFIX=backup-, it now works fine!!

time=2025-02-27T21:36:43.503Z level=INFO msg="Created backup of `/backup` at `/tmp/backup-2025-02-27T21-36-37.tar.gz`."
time=2025-02-27T21:36:52.155Z level=INFO msg="Uploaded a copy of backup `/tmp/backup-2025-02-27T21-36-37.tar.gz` to '192.168.1.5' at path '/mnt/mediadownloader-backup/'." storage=SSH
time=2025-02-27T21:36:52.649Z level=INFO msg="Pruned 4 out of 14 backups as they were older than the given deadline of 2025-02-20T21:37:52.156169344Z." storage=SSH
time=2025-02-27T21:36:52.702Z level=INFO msg="Removed tar file `/tmp/backup-2025-02-27T21-36-37.tar.gz`."

I do have one more question though: by looking at https://offen.github.io/docker-volume-backup/reference/ I was expecting "BACKUP_PRUNING_PREFIX=backup-" to be the default value.
My impression was that BACKUP_PRUNING_PREFIX had to be explicitly given as option in the case the option BACKUP_FILENAME was used to change the default naming scheme (but I'm using the default naming scheme instead).
Is this a documentation bug?

Last, as suggestion, I think it would be possible to automatically set the BACKUP_PRUNING_PREFIX to be the portion of the BACKUP_FILENAME string up to the first % sign...

@puellanivis
Copy link

You should be able to add an if candidate.IsDir() { continue } to the loop here: https://github.com/offen/docker-volume-backup/blob/failure-filename/internal/storage/ssh/ssh.go#L166-L173 That will skip any sub-directories in the directory.

An empty directory shouldn’t stop the pruning, because sftp.Client.Remove will remove an empty directory. (Same as os.Remove.)

It would need to be either a protected directory, or a non-empty directory.

@m90
Copy link
Member

m90 commented Feb 28, 2025

Last, as suggestion, I think it would be possible to automatically set the BACKUP_PRUNING_PREFIX to be the portion of the BACKUP_FILENAME string up to the first % sign...

That would make sense, yes, but it's also a breaking change in regards to the current behavior, so I can't just do it whenever. I'll put it on the list of things to consider for a v3 though.

@m90
Copy link
Member

m90 commented Feb 28, 2025

This is now fixed in v2.43.3

Thanks again @f18m and @puellanivis for your help in debugging this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working support Support Requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants