Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Version 3.15.n seems to be stuck in an infinite loop during "Checking folder changes". HDD is running hot. #7613

Open
5 of 8 tasks
chris-blues opened this issue Dec 8, 2024 · 55 comments · May be fixed by #7745
Open
5 of 8 tasks

Comments

@chris-blues
Copy link

chris-blues commented Dec 8, 2024

⚠️ Before submitting, please verify the following: ⚠️

Bug description

Debian 12 Linux, Gnome DE, Nextcloud Desktop client 3.15.0

Version 3.15.0 seems to be stuck in an infinite loop during "Checking folder changes". HDD is running hot. I've let it run for 30 minutes and more, just to see if it eventually stops at some point. It doesn't.

General startup of the client seems fine. File sync runs through fine. After that it goes into "Checking folder changes" - that's where it gets stuck in the loop. Reaction times get very slow on every action I take in the client. Gnome offers generic "App not responding, should I kill it or wait" warning. With every older Client (e.g. 3.14.3) it works just fine. So there seems to be some regression introduced in 3.15.0...

Edit: seems to be related to the fact, that there is a NTFS partition involved...

If there's any more info I can provide, please let me know.

Steps to reproduce

Startup Nextcloud-3.15.0-x86_64.AppImage
Wait for the app to start up...

Expected behavior

normal startup without trashing my HDD...

Which files are affected by this bug

folder

Operating system

Linux

Which version of the operating system you are running.

Debian 12

Package

Official Linux AppImage

Nextcloud Server version

30.0.3

Nextcloud Desktop Client version

3.15.0

Is this bug present after an update or on a fresh install?

Updated to a major version (3.14.3 to 3.15.0)

Are you using the Nextcloud Server Encryption module?

Encryption is Disabled

Are you using an external user-backend?

  • Default internal user-backend
  • LDAP/ Active Directory
  • SSO - SAML
  • Other

Nextcloud Server logs

Seems to be internal Desktop Client issue

Additional info

No response

@chris-blues
Copy link
Author

@Thorsten42
Copy link

I have the same issue on arch linux also with gnome. My nextcloud folder is located on a ntfs drive (windows dual boot) and mounted using ntfs-3g. I've reverted to 3.14.3 and everything worked again

@chris-blues
Copy link
Author

chris-blues commented Dec 10, 2024 via email

@MadFatVlad
Copy link

Same issue with NTFS partition on Arch Linux when using 3.15.0
3.14.3 works fine

@stefan-franz
Copy link

Same Problem on Linux Mint 22 Cinnamon with 3.15 Flatpak

Hope this will fixed fast.

@chris-blues
Copy link
Author

Desktop Client version 3.15.1 still has same issue.

Nextcloud_debug_archive.zip

@chris-blues
Copy link
Author

Desktop Client version 3.15.2 still has same issue.

Nextcloud_debug_archive.zip

@blwh
Copy link

blwh commented Dec 19, 2024

Can confirm I experience the same issues running 3.15.2 on Arch using KDE, ext4 formatted drives.

I tried downgrading to 3.14.3 but the problem still persists. If I restart nextcloud it will sync fine until I change a file - then it hangs.

@alraban
Copy link

alraban commented Dec 23, 2024

I am also experiencing the same hanging issue on Arch Linux with only ext4 formatted drives. Downgrading to 3.14.1 seems to resolve the issue, but every version after that one leads to UI hangs and seemingly indefinite CPU usage for me. It sounds like the issue only cropped up after 3.14.3 for some users, but I definitely see the same issue with that build, and it sounds like some others here also still saw issues with 3.14.3.

If others here could confirm whether 3.14.1 does or doesn't show the same issue, that might help bisect the issue.

@stefan-franz
Copy link

3.13.4 Appimage is running on my Linux Mint 22 the last, that runs normal.
3.14 is not startable as an Appimage - wants passwords busshit
3.15.2 als flatpak (normal i use flatpak) has also the bug that it syncs all time, despite nothing is changed on the file system.
3.14.x as flatpak had run fine. 3.15.x brought the bug

@petrkr
Copy link

petrkr commented Dec 23, 2024

Just updated on arch... Same result. CPU 100%, flooding logs.

I already deleted package cache. How I can download previous version at arch linux? On mirror is already new version.

@alraban
Copy link

alraban commented Dec 23, 2024

Check the arch linux archive for prior versions: https://archive.archlinux.org/

@petrkr
Copy link

petrkr commented Dec 23, 2024

Check the arch linux archive for prior versions: https://archive.archlinux.org/

Thanks https://archive.archlinux.org/repos/2024/11/09/extra/os/x86_64/nextcloud-client-2%3A3.14.3-1-x86_64.pkg.tar.zst

I tried to compile it from sources, but after hour it seg faulted in some test

@blwh
Copy link

blwh commented Dec 24, 2024

I am also experiencing the same hanging issue on Arch Linux with only ext4 formatted drives. Downgrading to 3.14.1 seems to resolve the issue, but every version after that one leads to UI hangs and seemingly indefinite CPU usage for me. It sounds like the issue only cropped up after 3.14.3 for some users, but I definitely see the same issue with that build, and it sounds like some others here also still saw issues with 3.14.3.

If others here could confirm whether 3.14.1 does or doesn't show the same issue, that might help bisect the issue.

This did not resolve the issue for me.

I did some testing. I find that if I create an empty file (touch foo for example) in the Nextcloud root folder or in a folder with fewer files there is no issue. However, if I create a file in a specific folder (a 130 GB one with many files and git repos) Nextcloud hangs as described by this issue.

@mbiebl
Copy link

mbiebl commented Dec 26, 2024

I can confirm the problem (on Debian sid using 3.15.0).
My nextcloud folder is also on an NTFS partition and after the upgrade I saw a high CPU load of the nextcloud binary and the ntfs-3g fuse driver.
I ran git bisect which showed 5b2af16 as the first faulty commit.
I suppose that trying to apply the file permissions does not work on NTFS so nextcloud tries it over and over again.

@mbiebl
Copy link

mbiebl commented Dec 26, 2024

@mgallien ^ can you please have a look

@mbiebl
Copy link

mbiebl commented Dec 26, 2024

I've rebuilt 3.15.2 with 5b2af16 reverted and nextcloud-desktop is working properly again.

@chris-blues
Copy link
Author

chris-blues commented Dec 26, 2024 via email

@mbiebl
Copy link

mbiebl commented Dec 26, 2024

For completeness sake, the NTFS partition is mounted like this:

$ grep data /etc/fstab
LABEL=Data      /mnt/data       ntfs-3g auto            0       0

$ findmnt /mnt/data 
TARGET    SOURCE          FSTYPE  OPTIONS
/mnt/data /dev/nvme0n1p11 fuseblk rw,relatime,user_id=0,group_id=0,allow_other,blksize=4096

@chris-blues chris-blues changed the title [Bug]: Version 3.15.0 seems to be stuck in an infinite loop during "Checking folder changes". HDD is running hot. [Bug]: Version 3.15.n seems to be stuck in an infinite loop during "Checking folder changes". HDD is running hot. Dec 28, 2024
@pati-ni
Copy link

pati-ni commented Jan 2, 2025

I have the same issues using the same account in two different systems having the latest nextcloud client. htop reports 100% CPU while the UI is unresponsive. I have had this behavior in debian and arch linux.

My account has around 150000 files

This causes disk IO on my desktop to run Debian, as I can hear the drive needle of my HDD working. In Debian, the system is NTFS. In my laptop running arch I am using ext4 storage and get the same behavior. This drains the battery of my laptop and causes it to run hot.

To make things worse, the service was auto spawned from the bus service, and I could not even kill the client without it restarting.

Can you please tell us what information we need to upload to get help with this issue?

@mbiebl
Copy link

mbiebl commented Jan 6, 2025

I've filed a downstream bug report at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1091614 and the Debian maintainer was so kind to revert the problematic commit in the Debian package until a proper fix is found.

@thieneret
Copy link

3.15.50 version from daily channel still has this bug. (OS: Linux Mint 22)

@petrkr
Copy link

petrkr commented Jan 6, 2025

I've filed a downstream bug report at https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1091614 and the Debian maintainer was so kind to revert the problematic commit in the Debian package until a proper fix is found.

Is possible do that same for other distros too?

Everytime when I update arch I have to downgrade afterwards nextcloud client, mostly I remember right after 100% cpu and battery drain after 1 hour instead 6h.

@Rello Rello moved this to 🏗️ In progress in 💻 Desktop Clients team Jan 7, 2025
@chris-blues
Copy link
Author

chris-blues commented Jan 10, 2025

No problem. Does it appear later on that page?

Edit:
Ok, got it. #7745 (comment)
It still seems to be stuck on "Checking folder changes". And I still hear the HDD working. But CPU usage has gone down. Both AppRun.wrapped and mount.ntfs go with ca 20% CPU usage.
I'll let it do it's thing and wait, if it stops...

Edit 2:
Naa, CPU usage goes between 20 and 100%... UI grows very slow and laggy.

@chris-blues
Copy link
Author

Finally it crashed...

Nextcloud_debug_archive.zip

@mbiebl
Copy link

mbiebl commented Jan 10, 2025

I can confirm. The AppImage still shows the same buggy behaviour

@mbiebl
Copy link

mbiebl commented Jan 10, 2025

I can confirm. The AppImage still shows the same buggy behaviour

My NTFS file system type is reported as fuseblk , not NTFS

@petrkr
Copy link

petrkr commented Jan 10, 2025

I can confirm. The AppImage still shows the same buggy behaviour

My NTFS file system type is reported as fuseblk , not NTFS

Also fuseblk can be anything baded on FUSE.

Like sshfs, ftpfs, webdavfs, etc...

@mbiebl
Copy link

mbiebl commented Jan 10, 2025

It's probably better to check for required features then explicit file system types.

@petrkr
Copy link

petrkr commented Jan 10, 2025

But someone reported bug also on ext4 in comnents.

So I will not go only for ntfs-3g.

Later in week I can test it on second laptop, where I did not updated yet and nextcloud storage is on btrfs with same content as on other laptop, where it is on ntfs-3g, but snce it was first updated system, i did not updated others.

@mbiebl
Copy link

mbiebl commented Jan 10, 2025

But someone reported bug also on ext4 in comnents.

Maybe different mount options or maybe it's a different issue which only looks like this one.

@chris-blues
Copy link
Author

In my /etc/fstab it says ntfs, if that should be relevant.

@mbiebl
Copy link

mbiebl commented Jan 10, 2025

So I will not go only for ntfs-3g.

I fear you might be right here. He have a host of fuse based file systems, file systems with non-UNIX semantics (FAT, NTFS etc). So having a blacklist / whitelist as in #7745 (review) is probably not gonna be maintainable.

@petrkr
Copy link

petrkr commented Jan 10, 2025

In my /etc/fstab it says ntfs, if that should be relevant.

I have ntfs-3g in fstab

When I'll get home I'll test update on that laptop with btrfs

@alraban
Copy link

alraban commented Jan 10, 2025

But someone reported bug also on ext4 in comnents.

Maybe different mount options or maybe it's a different issue which only looks like this one.

There are several other issues reporting the desktop client freezing in a loop with high CPU usage on Linux desktop for several versions. I have been seeing the issue on every client version after 3.14.1 and only have ext4 file systems. Downgrading to 3.14.1 or below fixes it entirely for me. I think this issue is more than just a file-system specific issue (or if not, there are two issues that present almost identically).

See:
#7729
#7697
#7456

@chris-blues
Copy link
Author

chris-blues commented Jan 10, 2025

For the record:
On Debian systems the mount option 'ntfs' also loads the ntfs-3g driver since Squeeze.

Edit:
Source: https://wiki.debian.org/NTFS#A.2Fetc.2Ffstab

@mbiebl
Copy link

mbiebl commented Jan 10, 2025

There are several other issues reporting the desktop client freezing in a loop with high CPU usage on Linux desktop for several versions. I have been seeing the issue on every client version after 3.14.1 and only have ext4 file systems. Downgrading to 3.14.1 or below fixes it entirely for me. I think this issue is more than just a file-system specific issue (or if not, there are two issues that present almost identically).

If you can reliably reproduce those issues, you can do it like me and run a git bisect to pinpoint the specific change.
Your issue seems to have similar symptoms but is likely caused by something else. It might be better to follow-up on those other bug reports that were reported for earlier versions.

@alraban
Copy link

alraban commented Jan 10, 2025

To be clear, the first of those bug reports was filed against the current version, not a previous version so the issue very much still exists. I'm unconvinced that the issues in those threads aren't directly related to the same root issue here. I don't have the technical skills to bisect the client, I just wanted to provide the devs with relevant contextual information.

@blwh
Copy link

blwh commented Jan 10, 2025

But someone reported bug also on ext4 in comnents.

Maybe different mount options or maybe it's a different issue which only looks like this one.

There are several other issues reporting the desktop client freezing in a loop with high CPU usage on Linux desktop for several versions. I have been seeing the issue on every client version after 3.14.1 and only have ext4 file systems. Downgrading to 3.14.1 or below fixes it entirely for me. I think this issue is more than just a file-system specific issue (or if not, there are two issues that present almost identically).

See: #7729 #7697 #7456

This is exactly my case as well. I am running 3.14.0 and it works just fine on my machines, but not 3.14.1 and above.

@chris-blues
Copy link
Author

chris-blues commented Jan 10, 2025

I too believe, we're dealing with multiple issues. This issue about NTFS drives clearly started with version 3.15.0. One of the other issues started earlier.

Then again, I'm not familiar with the code in question, so dismiss this as you see fit!

Edit:
Having some experience with programming myself, I know that high CPU usage simply indicates a loop running Amok. This was quite normal in the 90s, that programs ran at full speed in order to be as fast as possible on slow machines. On modern systems you'll write your programs to do the work as it arises, e.g. using asyncronous await functions. Or loops with timed work management...

@mbiebl
Copy link

mbiebl commented Jan 10, 2025

I too believe, we're dealing with multiple issues. This issue about NTFS drives clearly started with version 3.15.0. One of the other issues started earlier.

The problematic commit I could identify is 5b2af16 which you can see is part of
v3.15.0 which matches what you are seeing. This particular commit has not been backported to the stable-3.14 branch. So yes, I think those other bug reports that were mentioned above have a different root cause.

@pati-ni
Copy link

pati-ni commented Jan 10, 2025

Still the same on ext4 with the latest AppImage on arch linux. After some syncing, CPU gets to 100% and the client blocks (also the UI). Is the UI blocked for everyone else?

EDIT: UI occasionally unblocks but is in practice unusable.

@chris-blues
Copy link
Author

chris-blues commented Jan 10, 2025

Yeah, very laggy after a short while.

Edit: But quite normal for an infinite loop...

@petrkr
Copy link

petrkr commented Jan 10, 2025

But someone reported bug also on ext4 in comnents.

Maybe different mount options or maybe it's a different issue which only looks like this one.

Ok updated to 3.15.3 on arch with btrfs and seems it's fine. But I tested it over RDP and fluxbox, so I can not see system tray icon to open client, only reading logs and watch top

@mgallien
Copy link
Collaborator

It's probably better to check for required features then explicit file system types.

That was my first idea and I now get why I did the wrong choice (was not expecting fuse to be in use for example).

I will for now white list a couple of file system types that are matching the most common use cases for which the change was introduced. I will try to update the patch with some feature tests later (not sure when I will be able to introduce that).

@mgallien
Copy link
Collaborator

mgallien commented Jan 10, 2025

people please keep this on topic.
This issue is about some linux file system types that have a weird semantic with UNIX style file permissions and the client trying in loop to fix them.
Other unrelated topics should go into a different issue.

@mbiebl
Copy link

mbiebl commented Jan 10, 2025

I will for now white list a couple of file system types that are matching the most common use cases for which the change was introduced. I will try to update the patch with some feature tests later (not sure when I will be able to introduce that).

That seems like a very sensible approach: get a fix out soon with a whitelist for known working file systems (I expect this includes ext* and maybe btrfs and xfs) and have a more elaborate solution for later. Thanks @mgallien !

@petrkr
Copy link

petrkr commented Jan 10, 2025

I will for now white list a couple of file system types that are matching the most common use cases for which the change was introduced. I will try to update the patch with some feature tests later (not sure when I will be able to introduce that).

That seems like a very sensible approach: get a fix out soon with a whitelist for known working file systems (I expect this includes ext* and maybe btrfs and xfs) and have a more elaborate solution for later. Thanks @mgallien !

As I tested, btrfs seems be fine.

@chris-blues
Copy link
Author

It seems a solution is well on the way! Nice work everyone! And thanks!

@pati-ni
Copy link

pati-ni commented Jan 13, 2025

people please keep this on topic.
This issue is about some linux file system types that have a weird semantic with UNIX style file permissions and the client trying in loop to fix them.

How do we know which issue affects us? It seems we have similar symptoms across different filesystems. Is an NTFS partition enough to contaminate the account and affect other filesystems that use the same account?

@petrkr
Copy link

petrkr commented Jan 13, 2025

people please keep this on topic.
This issue is about some linux file system types that have a weird semantic with UNIX style file permissions and the client trying in loop to fix them.

How do we know which issue affects us? It seems we have similar symptoms across different filesystems. Is an NTFS partition enough to contaminate the account and affect other filesystems that use the same account?

Probably not.

I have same account on two laptops. Both are Arch linux.
One use btrfs and 3.15.3 is fine
Other ude ntfs-3g and that does not work. But 3.14.3 is fine

@chris-blues
Copy link
Author

chris-blues commented Jan 13, 2025

Is an NTFS partition enough to contaminate the account and affect other filesystems that use the same account?

This particular bug didn't compromise anything. It just didn't stop running the harddisk pretty hard, which is bad for the hardware in the long run. So, no data contamination etc whatsoever. About any other bug, I just don't know.

Edit:
NTFS uses very different file permissions, compared to the unix-style file-systems.

@grossardt
Copy link

grossardt commented Jan 15, 2025

I had the same(?) issue (non-responsive nextcloud-client using 100% cpu of one core, syncs fine after killing and restarting but goes back into hanging after the sync is complete) with 3.15.3 on arch. Nextcloud folder is on an ext4 VFS partition inside a dm-crypt, if that helps. No NTFS. Downgrade to 3.14.3 seems to have fixed the issue.
(May also be a case of #7729 if this is actually a different bug.)

Edit: 3.14.3 is having the same issue, it just lasts a bit longer until it hangs. Downgraded to 3.14.1 now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: 🏗️ In progress
Development

Successfully merging a pull request may close this issue.