fix broken gzip file produced sometimes #82

nit23uec · 2019-10-23T14:02:42Z

Thanks for contributing to Logstash! If you haven't already signed our CLA, here's a handy link: https://www.elastic.co/contributor-agreement/

ghost · 2019-11-10T16:30:28Z

Hi @nit23uec, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in your Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile?

nit23uec · 2019-11-15T13:34:11Z

Hi @jsvd , @colinsurprenant - can you please review this PR?
Summary:
This PR addresses the issue highlighted in #79
The strategy is basically to write continuous events to a temp file and then move the contents of temp file to the original gzip file on the flush interval. As this requires (at flush interval) - reading temp file, copying its contents to gzip file and then truncating the temp file, so the operation is non-atomic, but still it reduces the probability of getting a broken gzip by a considerable percentage.

Thanks.

nit23uec · 2019-11-20T06:05:56Z

Hi, any word on this? CLA is signed btw.

cswingler · 2020-07-21T19:45:17Z

Hey there!

I recently stumbled across this issue. We have a relatively busy logstash server and I'm currently going through and calculating the exploded size of all of our outputted gzip files from this plugin, so I'll have some good stats on how frequently we run into it. So far it seems to be about 1%, but very well may happen specifically if Logstash is restarted.

Is there anything I can help with to get this merged, tested, and deployed?

colinsurprenant · 2020-10-02T17:57:24Z

@nit23uec Thanks for your contribution and sorry for the delay in following up.
I looked at the PR and I have a few observations:

In my previous comment in broken gzip file produced sometimes. #79 I noted that re-opening a zipped file to continue appending to it might be a problem, after further testing locally it seems to be correctly supported pretty much seamlessly. I am unsure and don't remember the exact tests I did at the time and if the underlying JRuby zip library changed since (now JZlib) but in any case, this seems to be working correctly.
Assuming that the above works correctly (re-opening a zipped file to continue appending) there is no reason that the current file output strategy not work for zipped files GIVEN that the Zlib::GZipFile object correctly and always be close so that the footer is written.

Given the above I am not sure we actually need the tmp fille writing strategy here and possibly only more-or-less make sure that close is always called on the Zlib::GZipFile object.

There is however a deeper problem at play which I also talked about in #79 which is that there is currently no way to safely consume a file produced by the file output while logstash is running because there is no way to know when a file is finished writing to and done with for good. This is a larger issue and is not only related to zipped files and the only way currently to safely consume a file from the file output is by shutting down logstash. Note that this problem might not be relevant for (text) file that are consumed in a tailing/streaming way but this is not applicable to zipped files that cannot be consumed as they are written to.

LMKWYT.

makefu · 2022-02-09T13:12:29Z

Hello! We've also stumbled upon corrupted gzip outputs. Is there any chance to get this Pull Request Merged?

andsel · 2022-02-10T08:39:23Z

Hi @makefu I'll take care of this

andsel · 2022-02-14T15:23:51Z

I think that this PR doesn't match the original fix suggestion defined in #73.
This PR create a plain file, with temporary extension, and when it's flushed then the file is gzipped to its final position, while as I understand the original proposition was to work on a gzip temp file. When the file is going to be closed then move, which is an atomic operation respect to the filesystem, to it's final name, removing the temporary extension.

makefu · 2022-02-14T22:17:24Z

Right now however it seems like file corruption is somewhat worse than a workaround. It seems like for our stuff we will also have to shift to writing files in plain text and using logrotate to gzip the files periodically.
Of course it would be much better to have a real solution for the issue.
cheers!

andsel · 2022-02-16T09:23:25Z

I agree with you @makefu however this PR is more complicated than the suggestion that was originally proposed in #73. I would like to know if @nit23uec is still engaged in moving this forward or if I can takeover it, maybe in another PR.

roaksoax · 2022-10-17T19:29:49Z

Closes #79

nit23uec force-pushed the master branch from 10f6745 to ab9a990 Compare October 24, 2019 10:28

nit23uec mentioned this pull request Oct 31, 2019

broken gzip file produced sometimes. #79

Open

nit23uec force-pushed the master branch 3 times, most recently from 1cdfb49 to 4409544 Compare November 1, 2019 06:13

fix broken gzip file produced sometimes

285371b

nit23uec force-pushed the master branch from 4409544 to 285371b Compare November 15, 2019 13:29

roaksoax requested a review from andsel September 29, 2020 13:01

elasticsearch-bot self-assigned this Sep 29, 2020

roaksoax unassigned elasticsearch-bot Sep 29, 2020

roaksoax removed the request for review from andsel September 29, 2020 14:28

elasticsearch-bot self-assigned this Sep 29, 2020

colinsurprenant assigned colinsurprenant and unassigned elasticsearch-bot Sep 29, 2020

roaksoax added the status:needs-triage label Apr 6, 2021

andsel assigned andsel and unassigned colinsurprenant Feb 9, 2022

andsel added status:changes-requested and removed status:needs-triage labels Mar 23, 2022

roaksoax added the status:needs-triage label Mar 23, 2022

roaksoax removed the status:changes-requested label Mar 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix broken gzip file produced sometimes #82

fix broken gzip file produced sometimes #82

nit23uec commented Oct 23, 2019

ghost commented Nov 10, 2019

nit23uec commented Nov 15, 2019

nit23uec commented Nov 20, 2019

cswingler commented Jul 21, 2020

colinsurprenant commented Oct 2, 2020 •

edited

Loading

makefu commented Feb 9, 2022

andsel commented Feb 10, 2022

andsel commented Feb 14, 2022

makefu commented Feb 14, 2022

andsel commented Feb 16, 2022

roaksoax commented Oct 17, 2022

fix broken gzip file produced sometimes #82

Are you sure you want to change the base?

fix broken gzip file produced sometimes #82

Conversation

nit23uec commented Oct 23, 2019

ghost commented Nov 10, 2019

nit23uec commented Nov 15, 2019

nit23uec commented Nov 20, 2019

cswingler commented Jul 21, 2020

colinsurprenant commented Oct 2, 2020 • edited Loading

makefu commented Feb 9, 2022

andsel commented Feb 10, 2022

andsel commented Feb 14, 2022

makefu commented Feb 14, 2022

andsel commented Feb 16, 2022

roaksoax commented Oct 17, 2022

colinsurprenant commented Oct 2, 2020 •

edited

Loading