Fix OOM issues with super large iCalendar files #92

ArnyminerZ · 2025-09-28T13:05:28Z

Context

Right now we are loading the whole iCalendar into memory when applying the preprocessors, which breaks the whole point to use Readers in the first place. This can lead to Out Of Memory exceptions on super large iCalendar documents, or in devices with limited memory.

More info and reproduction in #90

Changes

Added a new test (ICalPreprocessorInstrumentedTest) that generates a very large iCalendar file.
With Int.MAX_VALUE events.
Changed the way ICalPreprocessor.preprocessStream works:
- Instead of applying the StreamPreprocessors on the whole document at once, they are applied line-wise.
  Note: to avoid having to apply regex conditions (which can get expensive) hundred of thousands of times, the lines are chunked in groups of 1000 lines (arbitrary, can be adjusted).
- If the Reader given to the ICalPreprocessor support reset, the result of this function will be a SequenceReader. This is a new class, that converts sequences into a Reader. Note that this function does not support reset, by the way sequences work in Kotlin.
- If the Reader doesn't support reset, it will have to be loaded fully into memory anyway.
Since StreamPreprocessor is now simpler (most logic has been moved into ICalPreprocessor), they are now interfaces.

Note

It is possible to run the pre-processors in a line-basis is because they are applied line-wise. If at some point we require a preprocessor that needs to fix multiple lines at once (maybe description fixes? which allow multi-line), we might have to re-do this.

Copilot

Pull Request Overview

This PR addresses Out Of Memory (OOM) issues when processing super large iCalendar files by implementing chunked processing instead of loading entire files into memory. The changes move from a Reader-based preprocessing approach to a line-by-line chunked processing system that processes iCalendar data in groups of 1000 lines to maintain memory efficiency while preserving functionality.

Key changes:

Refactored stream preprocessing to process data in configurable chunks rather than loading entire files
Converted StreamPreprocessor from abstract class to interface with simplified API
Added SequenceReader utility class to convert processed line sequences back to Reader interface

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
SequenceReader.kt	New utility class that converts String sequences into Reader interface
StreamPreprocessor.kt	Simplified from abstract class to interface, removing preprocessing logic
ICalPreprocessor.kt	Refactored to implement chunked processing with configurable chunk sizes
FixInvalidUtcOffsetPreprocessor.kt	Updated to implement StreamPreprocessor interface
FixInvalidDayOffsetPreprocessor.kt	Updated to implement StreamPreprocessor interface
ICalPreprocessorInstrumentedTest.kt	Added test with large iCalendar file generator to verify memory efficiency

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

lib/src/main/kotlin/at/bitfire/synctools/utils/SequenceReader.kt

lib/src/main/kotlin/at/bitfire/synctools/icalendar/validation/ICalPreprocessor.kt

...oidTest/kotlin/at/bitfire/synctools/icalendar/validation/ICalPreprocessorInstrumentedTest.kt

ArnyminerZ · 2025-09-28T13:46:19Z

@bitfireAT/app-dev should be ready :)

A bit ugly, but I don't know how to simplify it to be honest. Mainly SequenceReader is a bit verbose.

sunkup

Pretty cool, but unfortunately not very necessary and might introduce new problems. The current focus for synctools is the refactoring for stability, which is already breaking things a bit ... So it's not really a good time to be adding this. I will do an actual review if @rfc2822 thinks we should add it to synctools now anyways.

rfc2822

Because the large file / OOM was a real problem, I think it's a good idea to make the preprocessors more stable, too.

Some comments.

lib/src/main/kotlin/at/bitfire/synctools/icalendar/validation/StreamPreprocessor.kt

lib/src/main/kotlin/at/bitfire/synctools/icalendar/validation/ICalPreprocessor.kt

rfc2822 · 2025-10-24T10:57:13Z

lib/src/main/kotlin/at/bitfire/synctools/icalendar/validation/ICalPreprocessor.kt

+                .lineSequence()
+                .chunked(chunkSize)
+                .map { chunk -> applyPreprocessors(chunk.joinToString("\n")) }
+            return SequenceReader(chunkedFixedLines)


I think we can use Guava's CharSource like that:

BufferedReader(original).use { reader -> val chunkedFixedLines = reader.lineSequence() .chunked(chunkSize) .map { chunk -> val fixed = applyPreprocessors(chunk.joinToString("\n")) CharSource.wrap(fixed) // String to CharSource }.asIterable() return CharSource.concat(chunkedFixedLines).openStream() }

Then we could avoid creating a custom reader.

Because we need the Iterable for CharSource, it may be advantageous to use Stream/Iterable instead of Sequence, but I don't know whether there's a chunk() then. Maybe also in Guava.

But please double-check @ArnyminerZ, also that not all strings are kept in memory at once (but only one String at a time).

I'd say this works, because the test that failed with huge iCalendars is passing.

See my comment in the test: it only tests whether the input is processed as a stream, but not whether it's actually processed (correctly).

lib/src/main/kotlin/at/bitfire/synctools/utils/SequenceReader.kt

Signed-off-by: Arnau Mora <[email protected]>

rfc2822

Some more comments :)

rfc2822 · 2025-10-25T16:44:37Z

...oidTest/kotlin/at/bitfire/synctools/icalendar/validation/ICalPreprocessorInstrumentedTest.kt

+    }
+
+    @Test
+    fun testParse_SuperLargeFiles() {


What does this test (class) do?

As I understand it, this test basically tests whether the input is put into the memory as one chunk (which would fail with OOM) or whether it's processed in streaming mode so that it doesn't load too big files into the memory.

So I think there are two things that can be tested:

Whether the preprocessor does anything when its result Reader is not consumed / being read. I think this is what testParse_SuperLargeFiles currently does: it verifies that preprocessStream doesn't actually load the input as long as the resulting Reader is not consumed.

However, when the input is not read at all, why then generate an iCalendar at all? We could as well pass an infinite stream of the same character.

Whether the resulting Reader returns a correct result when consumed. I think we can also use a fake input string and then check with mockk that applyPreprocessors is called. We can return another fake value and verify that it's what we want.

This is currently not done by the tests. I noticed it because if we pass Int.MAX_VALUE events to it, it should take quite a time to process 2147483647 events – however the test returns immediately.

And everything for the two cases that

the input stream supports reset(), and

that it doesn't support reset().

rfc2822 · 2025-10-25T16:49:01Z

...oidTest/kotlin/at/bitfire/synctools/icalendar/validation/ICalPreprocessorInstrumentedTest.kt

As there's no Android dependency, it should be possible to make this test a unit test (faster and easier to run than an instrumentation test).

rfc2822 · 2025-10-25T16:50:23Z

...src/main/kotlin/at/bitfire/synctools/icalendar/validation/FixInvalidDayOffsetPreprocessor.kt

+    @VisibleForTesting
+    val regexpForProblem = Regex(


Would it be enough to use internal? I usually combine @VisibleForTesting with internal so that tests can access the field, but other packages can't.

rfc2822 · 2025-10-25T16:52:53Z

lib/src/main/kotlin/at/bitfire/synctools/icalendar/validation/StreamPreprocessor.kt

     *
-     * @param original The complete iCalendar string
-     * @return The complete iCalendar string, but fixed
+     * @param lines The iCalendar lines to fix. Those may be the full iCalendar file, or just a part of it.


We should explicitly mention that lines must only contain complete lines. Because when we have to add another StreamPreprocessor at some day, we won't know anymore what we did now, but we still need to know whether our input (lines) can contain parts of lines or only complete lines.

rfc2822 · 2025-10-25T17:00:03Z

lib/src/main/kotlin/at/bitfire/synctools/icalendar/validation/ICalPreprocessor.kt

+                .lineSequence()
+                .chunked(chunkSize)
+                .map { chunk -> applyPreprocessors(chunk.joinToString("\n")) }
+            return SequenceReader(chunkedFixedLines)


See my comment in the test: it only tests whether the input is processed as a stream, but not whether it's actually processed (correctly).

ArnyminerZ added 2 commits September 28, 2025 14:57

Fix OOM issues

af8f8f6

Remove reset

9692929

ArnyminerZ self-assigned this Sep 28, 2025

ArnyminerZ added the refactoring Quality improvement of existing functions label Sep 28, 2025

ArnyminerZ requested a review from a team as a code owner September 28, 2025 13:05

ArnyminerZ linked an issue Sep 28, 2025 that may be closed by this pull request

OOM Exception with large icalendar files #90

Open

ArnyminerZ requested a review from Copilot September 28, 2025 13:07

Copilot AI reviewed Sep 28, 2025

View reviewed changes

ArnyminerZ marked this pull request as draft September 28, 2025 13:08

ArnyminerZ added 4 commits September 28, 2025 15:11

Get rid of Android's Log

84e33f3

Fix function calling

fd65367

Ignore exception in try-catch

dad980b

Fix tests

6f52535

ArnyminerZ marked this pull request as ready for review September 28, 2025 13:46

sunkup reviewed Oct 1, 2025

View reviewed changes

rfc2822 reviewed Oct 24, 2025

View reviewed changes

ArnyminerZ added 12 commits October 25, 2025 09:39

Merge branch 'main' into 90-oom-exception-with-large-icalendar-files

beaf42b

Get rid of regexpForProblem in StreamPreprocessor

73380e7

Improve KDoc

0da87e2

Rename argument

4ef91b5

Move warning

bb47bc0

Closing readers

1b17033

Make it clear why we use LF

fca4cc1

Use Guava's CharSource and get rid of SequenceReader

20483d9

Optimize imports

6652463

Signed-off-by: Arnau Mora <[email protected]>

Disable inspection

3910eef

Signed-off-by: Arnau Mora <[email protected]>

Added Spotbugs annotations

a257c2c

Signed-off-by: Arnau Mora <[email protected]>

Add annotations

7948036

Signed-off-by: Arnau Mora <[email protected]>

rfc2822 reviewed Oct 25, 2025

View reviewed changes

Fix OOM issues with super large iCalendar files #92

Are you sure you want to change the base?

Fix OOM issues with super large iCalendar files #92

Uh oh!

Conversation

ArnyminerZ commented Sep 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ArnyminerZ commented Sep 28, 2025

Uh oh!

sunkup left a comment

Choose a reason for hiding this comment

Uh oh!

rfc2822 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rfc2822 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ArnyminerZ commented Sep 28, 2025 •

edited

Loading