Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][improve][ci] Enable Netty leak detection in CI and make leaks fail the build #23956

Draft
wants to merge 32 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
841ca98
[improve][test] Enable Netty leak detection in tests
lhotari Feb 10, 2025
ebe95c2
Use unique names in int tests
lhotari Feb 11, 2025
40b5975
Don't depend on the leak detection level
lhotari Feb 10, 2025
f97cb1e
Fix HashedWheelTimer leak in PulsarService
lhotari Feb 10, 2025
ea462ca
Don't depend on the leak detection level
lhotari Feb 10, 2025
34ddda7
Fix a Timer leak and migrate to use Cleanup annotation for stopping H…
lhotari Feb 10, 2025
0a896b2
Ignore leaks in MessageImplTest
lhotari Feb 10, 2025
74ccd85
Ignore leaks in ProducerMemoryLimitTest
lhotari Feb 10, 2025
3899167
Fix leak in AbstractDeliveryTrackerTests
lhotari Feb 10, 2025
1fc84b9
Fix leak in ZKReconnectTest
lhotari Feb 10, 2025
2e308aa
Fix leaks in DelayedDeliveryTrackerFactoryTest
lhotari Feb 10, 2025
89d2b0a
Fix leaks in CommandUtilsTests
lhotari Feb 11, 2025
bf31943
Fix leaks in MarkersTest
lhotari Feb 11, 2025
4400d6b
Fix leak in CompressorCodecTest
lhotari Feb 11, 2025
fae09d8
Fix leaks in OffloadIndexV2Test
lhotari Feb 11, 2025
6c6dbac
Fix leaks in BlobStoreBackedInputStreamTest
lhotari Feb 12, 2025
833fc09
Fix more leaks in tiered-storage/jcloud tests
lhotari Feb 12, 2025
ff30589
Fix leak in PrometheusMetricsTest
lhotari Feb 12, 2025
df532a4
Fix leak in BucketDelayedDeliveryTrackerTest
lhotari Feb 12, 2025
48a40dd
Fix leaks in ManagedLedgerInterceptorImplTest
lhotari Feb 12, 2025
bcbb493
Fix leaks in AbstractBaseDispatcherTest
lhotari Feb 12, 2025
1c5c589
Fix leaks in ServerCnxTest
lhotari Feb 12, 2025
d2cf12e
Fix multiple leaks by making ByteBufPair.coalesce release the input B…
lhotari Feb 11, 2025
3f205e1
Fix leaks in CommandsTest
lhotari Feb 12, 2025
8248aa2
Fix bug in PersistentMessageFinderTest in reading ByteBuf to byte[]
lhotari Feb 12, 2025
4a9ef1e
Fix leaks in SharedConsumerAssignorTest
lhotari Feb 12, 2025
6e02044
Fix leaks in TopicPublishRateThrottleTest
lhotari Feb 12, 2025
b84a0a9
Fix leaks in MessageDuplicationTest
lhotari Feb 12, 2025
aab895c
Fix leaks in PersistentStickyKeyDispatcherMultipleConsumersClassicTest
lhotari Feb 12, 2025
783a50a
Fix leaks in PersistentStickyKeyDispatcherMultipleConsumersTest
lhotari Feb 12, 2025
fe0bf96
Fix leak in ReplicatedSubscriptionsSnapshotBuilderTest
lhotari Feb 12, 2025
e29941f
Fix leak in NamespaceStatsAggregatorTest
lhotari Feb 12, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .github/workflows/pulsar-ci-flaky.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,7 @@ jobs:
TRACE_TEST_RESOURCE_CLEANUP_DIR: ${{ github.workspace }}/target/trace-test-resource-cleanup
THREAD_LEAK_DETECTOR_WAIT_MILLIS: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.thread_leak_detector_wait_millis || 10000 }}
THREAD_LEAK_DETECTOR_DIR: ${{ github.workspace }}/target/thread-leak-dumps
NETTY_LEAK_DUMP_DIR: ${{ github.workspace }}/target/netty-leak-dumps
runs-on: ubuntu-22.04
timeout-minutes: 100
if: ${{ needs.preconditions.outputs.docs_only != 'true' }}
Expand Down Expand Up @@ -224,6 +225,14 @@ jobs:
cat threadleak*.txt | awk '/^Summary:/ {print "::warning::" $0 "\n"; next} {print}'
fi

- name: Report detected Netty leaks
if: ${{ always() }}
run: |
if [ -d "$NETTY_LEAK_DUMP_DIR" ]; then
cd "$NETTY_LEAK_DUMP_DIR"
cat netty_leak_*.txt
fi

- name: Create Jacoco reports
if: ${{ needs.preconditions.outputs.collect_coverage == 'true' }}
continue-on-error: true
Expand Down Expand Up @@ -266,6 +275,7 @@ jobs:
/tmp/*.hprof
**/hs_err_*.log
**/core.*
${{ env.NETTY_LEAK_DUMP_DIR }}/*
${{ env.TRACE_TEST_RESOURCE_CLEANUP_DIR }}/*
${{ env.THREAD_LEAK_DETECTOR_DIR }}/*
retention-days: 7
Expand Down
68 changes: 64 additions & 4 deletions .github/workflows/pulsar-ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -230,6 +230,7 @@ jobs:
TRACE_TEST_RESOURCE_CLEANUP_DIR: ${{ github.workspace }}/target/trace-test-resource-cleanup
THREAD_LEAK_DETECTOR_WAIT_MILLIS: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.thread_leak_detector_wait_millis || 10000 }}
THREAD_LEAK_DETECTOR_DIR: ${{ github.workspace }}/target/thread-leak-dumps
NETTY_LEAK_DUMP_DIR: ${{ github.workspace }}/target/netty-leak-dumps
runs-on: ubuntu-22.04
timeout-minutes: ${{ matrix.timeout || 60 }}
needs: ['preconditions', 'build-and-license-check']
Expand Down Expand Up @@ -347,6 +348,10 @@ jobs:
cat threadleak*.txt | awk '/^Summary:/ {print "::warning::" $0 "\n"; next} {print}'
fi

- name: Report detected Netty leaks
if: ${{ always() }}
run: $GITHUB_WORKSPACE/build/pulsar_ci_tool.sh report_netty_leaks

- name: Upload Surefire reports
uses: actions/upload-artifact@v4
if: ${{ !success() || env.TRACE_TEST_RESOURCE_CLEANUP != 'off' }}
Expand All @@ -364,6 +369,7 @@ jobs:
/tmp/*.hprof
**/hs_err_*.log
**/core.*
${{ env.NETTY_LEAK_DUMP_DIR }}/*
${{ env.TRACE_TEST_RESOURCE_CLEANUP_DIR }}/*
${{ env.THREAD_LEAK_DETECTOR_DIR }}/*
retention-days: 7
Expand Down Expand Up @@ -552,6 +558,7 @@ jobs:
PULSAR_TEST_IMAGE_NAME: apachepulsar/java-test-image:latest
DEVELOCITY_ACCESS_KEY: ${{ secrets.DEVELOCITY_ACCESS_KEY }}
CI_JDK_MAJOR_VERSION: ${{ needs.preconditions.outputs.jdk_major_version }}
NETTY_LEAK_DUMP_DIR: ${{ github.workspace }}/target/netty-leak-dumps
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -689,20 +696,37 @@ jobs:
report_paths: 'test-reports/TEST-*.xml'
annotate_only: 'true'

- name: Report detected Netty leaks
if: ${{ always() }}
run: $GITHUB_WORKSPACE/build/pulsar_ci_tool.sh report_netty_leaks

- name: Upload Surefire reports
uses: actions/upload-artifact@v4
if: ${{ !success() }}
with:
name: Integration-${{ matrix.group }}-surefire-reports
name: Integration-${{ matrix.name }}-surefire-reports
path: surefire-reports
retention-days: 7

- name: Upload possible heap dump, core dump or crash files
uses: actions/upload-artifact@v4
if: ${{ always() }}
with:
name: Integration-${{ matrix.name }}-dumps
path: |
/tmp/*.hprof
**/hs_err_*.log
**/core.*
${{ env.NETTY_LEAK_DUMP_DIR }}/*
retention-days: 7
if-no-files-found: ignore

- name: Upload container logs
uses: actions/upload-artifact@v4
if: ${{ !success() }}
continue-on-error: true
with:
name: Integration-${{ matrix.group }}-container-logs
name: Integration-${{ matrix.name }}-container-logs
path: tests/integration/target/container-logs
retention-days: 7

Expand Down Expand Up @@ -959,6 +983,7 @@ jobs:
PULSAR_TEST_IMAGE_NAME: apachepulsar/pulsar-test-latest-version:latest
DEVELOCITY_ACCESS_KEY: ${{ secrets.DEVELOCITY_ACCESS_KEY }}
CI_JDK_MAJOR_VERSION: ${{ needs.preconditions.outputs.jdk_major_version }}
NETTY_LEAK_DUMP_DIR: ${{ github.workspace }}/target/netty-leak-dumps
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -1066,6 +1091,10 @@ jobs:
report_paths: 'test-reports/TEST-*.xml'
annotate_only: 'true'

- name: Report detected Netty leaks
if: ${{ always() }}
run: $GITHUB_WORKSPACE/build/pulsar_ci_tool.sh report_netty_leaks

- name: Upload container logs
uses: actions/upload-artifact@v4
if: ${{ !success() }}
Expand All @@ -1083,6 +1112,19 @@ jobs:
path: surefire-reports
retention-days: 7

- name: Upload possible heap dump, core dump or crash files
uses: actions/upload-artifact@v4
if: ${{ always() }}
with:
name: System-${{ matrix.group }}-dumps
path: |
/tmp/*.hprof
**/hs_err_*.log
**/core.*
${{ env.NETTY_LEAK_DUMP_DIR }}/*
retention-days: 7
if-no-files-found: ignore

- name: Wait for ssh connection when build fails
# ssh access is enabled for builds in own forks
uses: ./.github/actions/ssh-access
Expand Down Expand Up @@ -1189,6 +1231,7 @@ jobs:
PULSAR_TEST_IMAGE_NAME: apachepulsar/pulsar-test-latest-version:latest
DEVELOCITY_ACCESS_KEY: ${{ secrets.DEVELOCITY_ACCESS_KEY }}
CI_JDK_MAJOR_VERSION: ${{ needs.preconditions.outputs.jdk_major_version }}
NETTY_LEAK_DUMP_DIR: ${{ github.workspace }}/target/netty-leak-dumps
strategy:
fail-fast: false
matrix:
Expand Down Expand Up @@ -1273,23 +1316,40 @@ jobs:
report_paths: 'test-reports/TEST-*.xml'
annotate_only: 'true'

- name: Report detected Netty leaks
if: ${{ always() }}
run: $GITHUB_WORKSPACE/build/pulsar_ci_tool.sh report_netty_leaks

- name: Upload container logs
uses: actions/upload-artifact@v4
if: ${{ !success() }}
continue-on-error: true
with:
name: System-${{ matrix.group }}-container-logs
name: Flaky-System-${{ matrix.group }}-container-logs
path: tests/integration/target/container-logs
retention-days: 7

- name: Upload Surefire reports
uses: actions/upload-artifact@v4
if: ${{ !success() }}
with:
name: System-${{ matrix.name }}-surefire-reports
name: Flaky-System-${{ matrix.name }}-surefire-reports
path: surefire-reports
retention-days: 7

- name: Upload possible heap dump, core dump or crash files
uses: actions/upload-artifact@v4
if: ${{ always() }}
with:
name: Flaky-System-${{ matrix.group }}-dumps
path: |
/tmp/*.hprof
**/hs_err_*.log
**/core.*
${{ env.NETTY_LEAK_DUMP_DIR }}/*
retention-days: 7
if-no-files-found: ignore

- name: Wait for ssh connection when build fails
# ssh access is enabled for builds in own forks
uses: ./.github/actions/ssh-access
Expand Down
31 changes: 31 additions & 0 deletions build/pulsar_ci_tool.sh
Original file line number Diff line number Diff line change
Expand Up @@ -579,6 +579,37 @@ ci_create_inttest_coverage_report() {
echo "::endgroup::"
}

ci_report_netty_leaks() {
if [ -z "$NETTY_LEAK_DUMP_DIR" ]; then
echo "NETTY_LEAK_DUMP_DIR isn't set"
return 0
fi
local temp_file=$(mktemp -t netty_leak.XXXX)
{
if [ -d "$NETTY_LEAK_DUMP_DIR" ]; then
find "$NETTY_LEAK_DUMP_DIR" -maxdepth 1 -type f -name "netty_leak_*.txt" -exec cat {} \;
fi
if [ -d tests/integration/target/container-logs ]; then
find tests/integration/target/container-logs -type f -name "*.tar.gz" -exec tar -Ozxvf {} --wildcards --wildcards-match-slash '*/netty_leak_*.txt' \;
fi
} > $temp_file
if [ -s $temp_file ]; then
{
echo "::warning::Netty leaks found"
echo "Test file locations in stack traces:"
grep -h -i test $temp_file | grep org.apache | sed 's/^[[:space:]]*//;s/[[:space:]]*$//;s/^Hint: //' | sort -u
echo
echo "Details:"
cat $temp_file
} | tee $NETTY_LEAK_DUMP_DIR/leak_report.txt
touch target/netty_leaks_found
else
echo "No netty leaks found."
touch target/netty_leaks_not_found
fi
rm $temp_file
}

if [ -z "$1" ]; then
echo "usage: $0 [ci_tool_function_name]"
echo "Available ci tool functions:"
Expand Down
9 changes: 7 additions & 2 deletions buildtools/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -141,12 +141,17 @@
</exclusion>
</exclusions>
</dependency>
<!-- for testing FastThreadLocalStateCleaner -->
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-common</artifactId>
<version>${netty.version}</version>
<scope>test</scope>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-buffer</artifactId>
<version>${netty.version}</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.mockito</groupId>
Expand Down
Loading
Loading